[ { "id": "--GJkm7nt0", "title": "Enhancing Visual Representations for Efficient Object Recognition during Online Distillation", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose ENVISE, an online distillation framework that ENhances VISual representations for Efficient object recognition. We are motivated by the observation that in many real-world scenarios, the probability of occurrence of all classes is not the same and only a subset of classes occur frequently. Exploiting this fact, we aim to reduce the computations of our framework by employing a binary student network (BSN) to learn the frequently occurring classes using the pseudo-labels generated by the teacher network (TN) on an unlabeled image stream. To maintain overall accuracy, the BSN must also accurately determine when a rare (or unknown) class is present in the image stream so that the TN can be used in such cases. To achieve this, we propose an attention triplet loss which ensures that the BSN emphasizes the same semantically meaningful regions of the image as the TN. When the prior class probabilities in the image stream vary, we demonstrate that the BSN adapts to the TN faster than the real-valued student network. We also introduce Gain in Efficiency (GiE), a new metric which estimates the relative reduction in FLOPS based on the number of times the BSN and TN are used to process the image stream. We benchmark CIFAR-100 and tiny-imagenet datasets by creating meaningful inlier (frequent) and outlier (rare) class pairs that mimic real-world scenarios. We show that ENVISE outperforms state-of-the-art (SOTA) outlier detection methods in terms of GiE, and also achieves greater separation between inlier and outlier classes in the feature space.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Shashanka Venkataramanan;Bruce W McIntosh;Abhijit Mahalanobis", "authorids": "~Shashanka_Venkataramanan1;~Bruce_W_McIntosh1;~Abhijit_Mahalanobis1", "gender": "M;M;M", "homepage": ";https://ece.engineering.arizona.edu/faculty-staff/faculty/abhijit-mahalanobis;https://shashankvkt.github.io/", "dblp": ";;218/8893", "google_scholar": ";L9Y5FbwAAAAJ;CbfH47IAAAAJ", "orcid": ";;", "linkedin": "https://www.linkedin.com/feed/;;shashank-venkataramanan-1b2b9993/", "or_profile": "~Bruce_W_McIntosh1;~Abhijit_Mahalanobis1;~Shashanka_Venkataramanan2", "aff": "University of Central Florida;University of Central Florida;INRIA", "aff_domain": "ucf.edu;ucf.edu;inria.fr", "position": "PhD student;Associate Professor;PhD student", "bibtex": "@misc{\nvenkataramanan2021enhancing,\ntitle={Enhancing Visual Representations for Efficient Object Recognition during Online Distillation},\nauthor={Shashanka Venkataramanan and Bruce W McIntosh and Abhijit Mahalanobis},\nyear={2021},\nurl={https://openreview.net/forum?id=--GJkm7nt0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=--GJkm7nt0", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;5;3;4", "wc_review": "424;205;357;377", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "815;472;791;506", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 340.75, 82.06209539123407 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 646.0, 157.68798305514596 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.7071067811865475, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:4xE7gqWrZVgJ:scholar.google.com/&scioq=Enhancing+Visual+Representations+for+Efficient+Object+Recognition+during+Online+Distillation&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of Central Florida;INRIA", "aff_unique_dep": ";", "aff_unique_url": "https://www.ucf.edu;https://www.inria.fr", "aff_unique_abbr": "UCF;INRIA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;France" }, { "title": "Meta-Learning of Structured Task Distributions in Humans and Machines", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3064", "id": "--gvHfE3Xf5", "poster": "", "openreview": "https://openreview.net/forum?id=--gvHfE3Xf5", "slides": "https://iclr.cc/virtual/2021/poster/3064", "video": "https://iclr.cc/virtual/2021/poster/3064", "author_site": "Sreejan Kumar, Ishita Dasgupta, Jonathan Cohen, Nathaniel Daw, Thomas L Griffiths", "tldr": "", "abstract": "In recent years, meta-learning, in which a model is trained on a family of tasks (i.e. a task distribution), has emerged as an approach to training neural networks to perform tasks that were previously assumed to require structured representations, making strides toward closing the gap between humans and machines. However, we argue that evaluating meta-learning remains a challenge, and can miss whether meta-learning actually uses the structure embedded within the tasks. These meta-learners might therefore still be significantly different from humans learners. To demonstrate this difference, we first define a new meta-reinforcement learning task in which a structured task distribution is generated using a compositional grammar. We then introduce a novel approach to constructing a \"null task distribution\" with the same statistical complexity as this structured task distribution but without the explicit rule-based structure used to generate the structured task. We train a standard meta-learning agent, a recurrent network trained with model-free reinforcement learning, and compare it with human performance across the two task distributions. We find a double dissociation in which humans do better in the structured task distribution whereas agents do better in the null task distribution -- despite comparable statistical complexity. This work highlights that multiple strategies can achieve reasonable meta-test performance, and that careful construction of control task distributions is a valuable way to understand which strategies meta-learners acquire, and how they might differ from humans. ", "keywords": "meta-learning;human cognition;reinforcement learning;compositionality", "primary_area": "", "supplementary_material": "/attachment/56b18093aa812df8d3743cb71d84ffa2cf535b30.zip", "author": "Sreejan Kumar;Ishita Dasgupta;Jonathan Cohen;Nathaniel Daw;Thomas Griffiths", "authorids": "~Sreejan_Kumar1;~Ishita_Dasgupta1;~Jonathan_Cohen1;~Nathaniel_Daw1;~Thomas_Griffiths1", "gender": ";;M;M;", "homepage": "http://www.sreejankumar.com;;https://jdc.princeton.edu;https://www.princeton.edu/~ndaw/;http://cocosci.princeton.edu/tom/", "dblp": "276/0083;169/6218;31/5509-3;38/929;34/4472", "google_scholar": "Hft2m4wAAAAJ;;https://scholar.google.com.tw/citations?user=NCkkQAMAAAAJ;BxlScrEAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": "0000-0003-1769-5147;;0000-0003-2316-0763;0000-0001-5029-1430;", "linkedin": "sreejan-kumar-14060b76/;idasgupta6/;;;", "or_profile": "~Sreejan_Kumar1;~Ishita_Dasgupta1;~Jonathan_Cohen1;~Nathaniel_Daw1;~Thomas_L._Griffiths1", "aff": "Princeton University;Google DeepMind;Princeton University;Princeton University;Princeton University", "aff_domain": "princeton.edu;deepmind.com;princeton.edu;princeton.edu;princeton.edu", "position": "PhD student;Researcher;Full Professor;Full Professor;Professor", "bibtex": "@inproceedings{\nkumar2021metalearning,\ntitle={Meta-Learning of Structured Task Distributions in Humans and Machines},\nauthor={Sreejan Kumar and Ishita Dasgupta and Jonathan Cohen and Nathaniel Daw and Thomas Griffiths},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=--gvHfE3Xf5}\n}", "github": "[![github](/images/github_icon.svg) sreejank/Compositional_MetaRL](https://github.com/sreejank/Compositional_MetaRL)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "3;4;5;3", "wc_review": "506;1122;1375;993", "wc_reply_reviewers": "0;0;525;29", "wc_reply_authors": "1189;2256;3595;687", "reply_reviewers": "0;0;2;1", "reply_authors": "2;5;7;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 999.0, 316.06565773585714 ], "wc_reply_reviewers_avg": [ 138.5, 223.45972791534496 ], "wc_reply_authors_avg": [ 1931.75, 1114.9684692851183 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 3.75, 2.384848003542364 ], "replies_avg": [ 23, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 23, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10148595521419901644&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=--gvHfE3Xf5", "email": "princeton.edu;deepmind.com;princeton.edu;princeton.edu;princeton.edu", "author_num": 5, "aff_unique_index": "0;1;0;0;0", "aff_unique_norm": "Princeton University;Google", "aff_unique_dep": ";Google DeepMind", "aff_unique_url": "https://www.princeton.edu;https://deepmind.com", "aff_unique_abbr": "Princeton;DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "--rcOeCKRh", "title": "CROSS-SUPERVISED OBJECT DETECTION", "track": "main", "status": "Reject", "tldr": "", "abstract": "After learning a new object category from image-level annotations (with no object bounding boxes), humans are remarkably good at precisely localizing those objects. However, building good object localizers (i.e., detectors) currently requires expensive instance-level annotations. While some work has been done on learning detectors from weakly labeled samples (with only class labels), these detectors do poorly at localization. In this work, we show how to build better object detectors from weakly labeled images of new categories by leveraging knowledge learned from fully labeled base categories. We call this learning paradigm cross-supervised object detection. While earlier works investigated this paradigm, they did not apply it to realistic complex images (e.g., COCO), and their performance was poor. We propose a unified framework that combines a detection head trained from instance-level annotations and a recognition head learned from image-level annotations, together with a spatial correlation module that bridges the gap between detection and recognition. These contributions enable us to better detect novel objects with image-level annotations in complex multi-object scenes such as the COCO dataset.", "keywords": "Object detection;weakly supervised;transfer leaning", "primary_area": "", "supplementary_material": "", "author": "Zitian Chen;Zhiqiang Shen;Jiahui Yu;Erik Learned-Miller", "authorids": "~Zitian_Chen1;~Zhiqiang_Shen1;~Jiahui_Yu1;~Erik_Learned-Miller2", "gender": "M;;M;", "homepage": "http://chenzt.net/;;http://jiahuiyu.com/;", "dblp": "218/6728;;185/1060;", "google_scholar": "n6rhKWQAAAAJ;;-CLCMk4AAAAJ;", "orcid": ";;;", "linkedin": ";;jiahuiyuu/;", "or_profile": "~Zitian_Chen1;~Zhiqiang_Shen1;~Jiahui_Yu1;~Erik_Learned-Miller2", "aff": "University of Massachusetts, Amherst;;Google Brain;", "aff_domain": "umass.edu;;google.com;", "position": "PhD student;;Research Scientist;", "bibtex": "@misc{\nchen2021crosssupervised,\ntitle={{\\{}CROSS{\\}}-{\\{}SUPERVISED{\\}} {\\{}OBJECT{\\}} {\\{}DETECTION{\\}}},\nauthor={Zitian Chen and Zhiqiang Shen and Jiahui Yu and Erik Learned-Miller},\nyear={2021},\nurl={https://openreview.net/forum?id=--rcOeCKRh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=--rcOeCKRh", "pdf_size": 0, "rating": "4;6;6;6", "confidence": "5;4;3;5", "wc_review": "642;900;501;295", "wc_reply_reviewers": "392;0;0;0", "wc_reply_authors": "2102;1043;282;558", "reply_reviewers": "1;0;0;0", "reply_authors": "3;2;1;1", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 584.5, 220.01647665572685 ], "wc_reply_reviewers_avg": [ 98.0, 169.74097914175 ], "wc_reply_authors_avg": [ 996.25, 694.0973905007855 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11695958391554514297&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of Massachusetts Amherst;Google", "aff_unique_dep": ";Google Brain", "aff_unique_url": "https://www.umass.edu;https://brain.google.com", "aff_unique_abbr": "UMass Amherst;Google Brain", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Amherst;Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Learning Invariant Representations for Reinforcement Learning without Reconstruction", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2863", "id": "-2FCwDKRREu", "poster": "", "openreview": "https://openreview.net/forum?id=-2FCwDKRREu", "slides": "https://iclr.cc/virtual/2021/poster/2863", "video": "https://iclr.cc/virtual/2021/poster/2863", "author_site": "Amy Zhang, Rowan T McAllister, Roberto Calandra, Yarin Gal, Sergey Levine", "tldr": "", "abstract": "We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction. Our goal is to learn representations that provide for effective downstream control and invariance to task-irrelevant details. Bisimulation metrics quantify behavioral similarity between states in continuous MDPs, which we propose using to learn robust latent representations which encode only the task-relevant information from observations. Our method trains encoders such that distances in latent space equal bisimulation distances in state space. We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks, where the background is replaced with moving distractors and natural videos, while achieving SOTA performance. We also test a first-person highway driving task where our method learns invariance to clouds, weather, and time of day. Finally, we provide generalization results drawn from properties of bisimulation metrics, and links to causal inference.", "keywords": "rich observations;bisimulation metrics;representation learning;state abstractions", "primary_area": "", "supplementary_material": "", "author": "Amy Zhang;Rowan Thomas McAllister;Roberto Calandra;Yarin Gal;Sergey Levine", "authorids": "~Amy_Zhang1;~Rowan_Thomas_McAllister1;~Roberto_Calandra1;~Yarin_Gal1;~Sergey_Levine1", "gender": "M;M;;M;F", "homepage": "https://rowanmcallister.github.io/;https://www.robertocalandra.com;http://www.cs.ox.ac.uk/people/yarin.gal/website//;https://people.eecs.berkeley.edu/~svlevine/;", "dblp": "123/6416;118/8239;67/9076;80/7594;43/2754", "google_scholar": "https://scholar.google.co.uk/citations?user=6uIhh6MAAAAJ;FdE3LOEAAAAJ;https://scholar.google.co.uk/citations?user=SIayDoQAAAAJ;8R35rCwAAAAJ;", "orcid": "0000-0002-9519-2345;0000-0001-9430-8433;;;", "linkedin": "rowantmcallister;rcalandra;;;", "or_profile": "~Rowan_Thomas_McAllister1;~Roberto_Calandra1;~Yarin_Gal1;~Sergey_Levine1;~Amy_Zhang2", "aff": "Toyota Research Institute;Meta Facebook;University of Oxford;Google;Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal", "aff_domain": "tri.global;fb.com;ox.ac.uk;google.com;mila.umontreal.ca", "position": "Machine Learning Scientist;Research Scientist;Associate Professor;Research Scientist;Researcher", "bibtex": "@inproceedings{\nzhang2021learning,\ntitle={Learning Invariant Representations for Reinforcement Learning without Reconstruction},\nauthor={Amy Zhang and Rowan Thomas McAllister and Roberto Calandra and Yarin Gal and Sergey Levine},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-2FCwDKRREu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "7;7;9", "confidence": "3;4;5", "wc_review": "544;287;1441", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "319;200;687", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 7.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 757.3333333333334, 494.67991895994953 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 402.0, 207.29849653740055 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 569, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12190335456477106043&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=-2FCwDKRREu", "email": "tri.global;fb.com;ox.ac.uk;google.com;mila.umontreal.ca", "author_num": 5, "aff_unique_index": "0;1;2;3;4", "aff_unique_norm": "Toyota Research Institute;Meta;University of Oxford;Google;University of Montreal", "aff_unique_dep": ";Meta Platforms, Inc.;;Google;Montreal Institute for Learning Algorithms", "aff_unique_url": "https://www.tri.global;https://meta.com;https://www.ox.ac.uk;https://www.google.com;https://www.umontreal.ca", "aff_unique_abbr": "TRI;Meta;Oxford;Google;UM", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Mountain View;Montreal", "aff_country_unique_index": "0;0;1;0;2", "aff_country_unique": "United States;United Kingdom;Canada" }, { "id": "-5VpoDCExrU", "title": "Log representation as an interface for log processing applications", "track": "main", "status": "Reject", "tldr": "", "abstract": "Log files are files that record events, messages, or transactions. Logs are rich containers of data because they can store a sequence of structured textual and numerical data. Many sequential forms of data including natural languages and temporal signals can be represented as logs.\n\nWe propose to represent logs at a few levels of abstraction including field level, log level, and log sequence level. The representation for each level can be computed from the previous level. These representations are in vector format and serve as interfaces to downstream applications. We use a version of transformer networks to encode numerical information as well as textual information that is suitable for log embedding. We show how a number of log processing applications can be readily solved with our representation.", "keywords": "Vector embedding;Logs;Search;Causal Analysis;Anomaly Detection", "primary_area": "", "supplementary_material": "", "author": "Mohammad Amin Sadeghi;Shameem Parambath;Ji Lucas;Youssef Meguebli;Maguette Toure;Fawaz Al Qahtani;Ting Yu;Sanjay Chawla", "authorids": "~Mohammad_Amin_Sadeghi3;spparambath@hbku.edu.qa;jlucas@hbku.edu.qa;youssef.meguebli@nokia.com;maguette.toure@nokia.com;fawalqahtani@qf.org.qa;tyu@hbku.edu.qa;~Sanjay_Chawla2", "gender": "M;;;;;;;M", "homepage": ";;;;;;;https://www.hbku.edu.qa/en/staff/sanjay-chawla", "dblp": ";;;;;;;22/5463.html", "google_scholar": "Viogmi8AAAAJ;;;;;;;fdUJcwYAAAAJ", "orcid": ";;;;;;;0000-0002-8102-2572", "linkedin": ";;;;;;;", "or_profile": "~Mohammad_Amin_Sadeghi3;spparambath@hbku.edu.qa;jlucas@hbku.edu.qa;youssef.meguebli@nokia.com;maguette.toure@nokia.com;fawalqahtani@qf.org.qa;tyu@hbku.edu.qa;~Sanjay_Chawla2", "aff": "Qatar Computing Research Institute;;;;;;;Qatar Computing Research Institute", "aff_domain": "qcri.com;;;;;;;hbku.edu.qa", "position": "Associate Professor;;;;;;;Full Professor", "bibtex": "@misc{\nsadeghi2021log,\ntitle={Log representation as an interface for log processing applications},\nauthor={Mohammad Amin Sadeghi and Shameem Parambath and Ji Lucas and Youssef Meguebli and Maguette Toure and Fawaz Al Qahtani and Ting Yu and Sanjay Chawla},\nyear={2021},\nurl={https://openreview.net/forum?id=-5VpoDCExrU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=-5VpoDCExrU", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "4;3;3;4", "wc_review": "339;358;383;414", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "674;673;793;172", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 373.5, 28.111385593741193 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 578.0, 239.42744203620435 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.16903085094570333, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5048391555804880062&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Qatar Computing Research Institute", "aff_unique_dep": "", "aff_unique_url": "https://www.qcri.org", "aff_unique_abbr": "QCRI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Qatar" }, { "id": "-5W5OBfFlwX", "title": "Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms", "track": "main", "status": "Reject", "tldr": "", "abstract": "EXP-based algorithms are often used for exploration in multi-armed bandit. We revisit the EXP3.P algorithm and establish both the lower and upper bounds of regret in the Gaussian multi-armed bandit setting, as well as a more general distribution option. The analyses do not require bounded rewards compared to classical regret assumptions. We also extend EXP4 from multi-armed bandit to reinforcement learning to incentivize exploration by multiple agents. The resulting algorithm has been tested on hard-to-explore games and it shows an improvement on exploration compared to state-of-the-art.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Mengfan Xu;Diego Klabjan", "authorids": "~Mengfan_Xu1;~Diego_Klabjan1", "gender": "F;M", "homepage": "https://mengfanxu1997.github.io/;http://dynresmanagement.com/index.html", "dblp": "205/7008;17/105", "google_scholar": "MR47V4cAAAAJ;TaQZ_VUAAAAJ", "orcid": ";0000-0003-4213-9281", "linkedin": "mengfan-xu-4ba804250/;diegoklabjan", "or_profile": "~Mengfan_Xu1;~Diego_Klabjan1", "aff": "Northwestern University, Northwestern University;Northwestern University", "aff_domain": "u.northwestern.edu;u.northwestern.edu", "position": "PhD student;Full Professor", "bibtex": "@misc{\nxu2021regret,\ntitle={Regret Bounds and Reinforcement Learning Exploration of {\\{}EXP{\\}}-based Algorithms},\nauthor={Mengfan Xu and Diego Klabjan},\nyear={2021},\nurl={https://openreview.net/forum?id=-5W5OBfFlwX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=-5W5OBfFlwX", "pdf_size": 0, "rating": "4;4;4", "confidence": "3;4;3", "wc_review": "289;345;499", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "456;416;475", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 377.6666666666667, 88.78938875538876 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 449.0, 24.589970855343985 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11550961165544813030&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "aff_unique_index": "0;0", "aff_unique_norm": "Northwestern University", "aff_unique_dep": "", "aff_unique_url": "https://www.northwestern.edu", "aff_unique_abbr": "NU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2625", "id": "-6vS_4Kfz0", "poster": "", "openreview": "https://openreview.net/forum?id=-6vS_4Kfz0", "slides": "https://iclr.cc/virtual/2021/poster/2625", "video": "https://iclr.cc/virtual/2021/poster/2625", "author_site": "Shauharda Khadka, Estelle Aflalo, Mattias Marder, Avrech Ben-David, Santiago Miret, Shie Mannor, Tamir Hazan, Hanlin Tang, Somdeb Majumdar", "tldr": "", "abstract": "For deep neural network accelerators, memory movement is both energetically expensive and can bound computation. Therefore, optimal mapping of tensors to memory hierarchies is critical to performance. The growing complexity of neural networks calls for automated memory mapping instead of manual heuristic approaches; yet the search space of neural network computational graphs have previously been prohibitively large. We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces, that combines graph neural networks, reinforcement learning, and evolutionary search. A set of fast, stateless policies guide the evolutionary search to improve its sample-efficiency. We train and validate our approach directly on the Intel NNP-I chip for inference. EGRL outperforms policy-gradient, evolutionary search and dynamic programming baselines on BERT, ResNet-101 and ResNet-50. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads. ", "keywords": "Reinforcement Learning;Memory Mapping;Device Placement;Evolutionary Algorithms", "primary_area": "", "supplementary_material": "/attachment/d8a0261c9c63af4da420d3c474f447025fa052d9.zip", "author": "Shauharda Khadka;Estelle Aflalo;Mattias Marder;Avrech Ben-David;Santiago Miret;Shie Mannor;Tamir Hazan;Hanlin Tang;Somdeb Majumdar", "authorids": "~Shauharda_Khadka1;estelle.aflalo@intel.com;mattias.marder@intel.com;avrech@campus.technion.ac.il;~Santiago_Miret1;~Shie_Mannor2;~Tamir_Hazan1;~Hanlin_Tang1;~Somdeb_Majumdar1", "gender": "M;;;;M;M;;;M", "homepage": "https://sites.google.com/oregonstate.edu/skhadka;;;;https://www.intel.ai/bio/santiago-miret/;https://shie.net.technion.ac.il;https://ie.technion.ac.il/~tamir.hazan/tamir.html;;https://www.intel.ai/bio/somdeb-majumdar/", "dblp": "183/9233;;;;241/5030;20/1669;36/5041;179/3388;63/8320", "google_scholar": "s-4Eoi8AAAAJ;;;;HLQ_te4AAAAJ;https://scholar.google.com.tw/citations?user=q1HlbIUAAAAJ;fqi186AAAAAJ;;", "orcid": ";;;;0000-0002-5121-3853;;;;", "linkedin": ";;;;santiago-miret/;;;;somdebmajumdar/", "or_profile": "~Shauharda_Khadka1;estelle.aflalo@intel.com;mattias.marder@intel.com;avrech@campus.technion.ac.il;~Santiago_Miret1;~Shie_Mannor2;~Tamir_Hazan1;~Hanlin_Tang1;~Somdeb_Majumdar1", "aff": "Microsoft;;;;Intel;Technion - Israel Institute of Technology, Technion;Technion;;Intel", "aff_domain": "microsoft.com;;;;intel.com;technion.il;technion.ac.il;;intel.com", "position": "Applied Scientist;;;;Researcher;Full Professor;Assistant Professor;;AI/ML Researcher", "bibtex": "@inproceedings{\nkhadka2021optimizing,\ntitle={Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning},\nauthor={Shauharda Khadka and Estelle Aflalo and Mattias Marder and Avrech Ben-David and Santiago Miret and Shie Mannor and Tamir Hazan and Hanlin Tang and Somdeb Majumdar},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-6vS_4Kfz0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "5;4;4;4", "wc_review": "403;736;454;211", "wc_reply_reviewers": "0;370;524;0", "wc_reply_authors": "518;1213;1476;358", "reply_reviewers": "0;1;3;0", "reply_authors": "1;3;3;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 451.0, 187.84168866361907 ], "wc_reply_reviewers_avg": [ 223.5, 230.03641016152204 ], "wc_reply_authors_avg": [ 891.25, 466.1348383247062 ], "reply_reviewers_avg": [ 1.0, 1.224744871391589 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8935345151073102603&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=-6vS_4Kfz0", "email": "microsoft.com;;;;intel.com;technion.il;technion.ac.il;;intel.com", "author_num": 9, "aff_unique_index": "0;1;2;2;1", "aff_unique_norm": "Microsoft;Intel;Technion - Israel Institute of Technology", "aff_unique_dep": "Microsoft Corporation;Intel Corporation;", "aff_unique_url": "https://www.microsoft.com;https://www.intel.com;https://www.technion.ac.il", "aff_unique_abbr": "Microsoft;Intel;Technion", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;1;0", "aff_country_unique": "United States;Israel" }, { "id": "-757TnNDwIn", "title": "Generative Adversarial Neural Architecture Search with Importance Sampling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite the empirical success of neural architecture search (NAS) in deep learning applications, the optimality, reproducibility and cost of NAS schemes remain hard to assess. The variation in search spaces adopted has further affected a fair comparison between search strategies. In this paper, we focus on search strategies in NAS and propose Generative Adversarial NAS (GA-NAS), promoting stable and reproducible neural architecture search. GA-NAS is theoretically inspired by importance sampling for rare event simulation, and iteratively refits a generator to previously discovered top architectures, thus increasingly focusing on important parts of the search space. We propose an efficient adversarial learning approach in GA-NAS, where the generator is not trained based on a large number of observations on architecture performance, but based on the relative prediction made by a discriminator, thus significantly reducing the number of evaluations required.\nExtensive experiments show that GA-NAS beats the best published results under several cases on the public NAS benchmarks including NAS-Bench-101, NAS-Bench-201, and NAS-Bench-301. We further show that GA-NAS can handle ad-hoc search constraints and search spaces. GA-NAS can find new architectures that enhance EfficientNet and ProxylessNAS in terms of ImageNet Top-1 accuracy and/or the number of parameters by searching in their original search spaces.", "keywords": "Nueral Architecture Search;Deep Learning;Generative Adversarial Network;Graph Neural Network;Computer Vision", "primary_area": "", "supplementary_material": "/attachment/cc4e7264485f2c29cfeedeea9a4cafcbdec22133.zip", "author": "SEYED SAEED CHANGIZ REZAEI;Fred X. Han;Di Niu;Mohammad Salameh;Keith G. Mills;Shangling Jui", "authorids": "~SEYED_SAEED_CHANGIZ_REZAEI1;~Fred_X._Han1;dniu@ualberta.ca;msalameh@ualberta.ca;~Keith_G._Mills1;jui.shangling@huawei.com", "gender": "M;;;;M;", "homepage": ";;;;https://kgmills.github.io/;", "dblp": "08/5692.html;;;;299/5864;", "google_scholar": ";;;;CBOD_ngAAAAJ;", "orcid": ";;;;0000-0001-6054-1798;", "linkedin": "https://ca.linkedin.com/in/seyed-saeed-changiz-rezaei-19639661;;;;kgmills/;", "or_profile": "~SEYED_SAEED_CHANGIZ_REZAEI1;~Fred_X._Han1;dniu@ualberta.ca;msalameh@ualberta.ca;~Keith_G._Mills1;jui.shangling@huawei.com", "aff": "Huawei Technologies Ltd.;;;;Huawei Technologies Ltd.;", "aff_domain": "huawei.com;;;;huawei.com;", "position": "Senior Machine Learning Researcher;;;;Research Intern;", "bibtex": "@misc{\nrezaei2021generative,\ntitle={Generative Adversarial Neural Architecture Search with Importance Sampling},\nauthor={SEYED SAEED CHANGIZ REZAEI and Fred X. Han and Di Niu and Mohammad Salameh and Keith G. Mills and Shangling Jui},\nyear={2021},\nurl={https://openreview.net/forum?id=-757TnNDwIn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=-757TnNDwIn", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;2;4;3", "wc_review": "455;657;517;400", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "857;872;994;424", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 507.25, 95.85503377496667 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 786.75, 216.06871013638232 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.4264014327112209, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:o_d82QNT1FcJ:scholar.google.com/&scioq=Generative+Adversarial+Neural+Architecture+Search+with+Importance+Sampling&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Huawei", "aff_unique_dep": "Huawei Technologies", "aff_unique_url": "https://www.huawei.com", "aff_unique_abbr": "Huawei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "-BA38x6Cf2", "title": "Can Kernel Transfer Operators Help Flow based Generative Models?", "track": "main", "status": "Reject", "tldr": "", "abstract": "Flow-based generative models refer to deep generative models with \ntractable likelihoods, and offer several attractive properties including \nefficient density estimation and sampling. Despite many advantages, \ncurrent formulations (e.g., normalizing flow) often have an expensive memory/runtime footprint, which hinders their use in a number of applications. \nIn this paper, we consider the setting where we have access to an autoencoder, which is\nsuitably effective for the dataset of interest. Under some mild conditions,\nwe show that we can calculate a mapping to a RKHS which subsequently enables deploying \nmature ideas from the kernel methods literature for flow-based generative models. Specifically, we can explicitly map the RKHS distribution (i.e., \napproximate the flow) to match or align with \na template/well-characterized distribution, via kernel transfer operators. This leads to a direct and resource efficient approximation avoiding iterative optimization. We empirically show that this simple idea yields competitive results on popular datasets such as CelebA,\nas well as promising results on a public 3D brain imaging dataset where the sample sizes are much smaller. ", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/7f9bb5abab084da46e0110e8c1acb5615acd90af.zip", "author": "Zhichun Huang;Rudrasis Chakraborty;Xingjian Zhen;Vikas Singh", "authorids": "~Zhichun_Huang1;~Rudrasis_Chakraborty1;~Xingjian_Zhen1;~Vikas_Singh1", "gender": "M;M;M;M", "homepage": ";;;http://vsingh-www.cs.wisc.edu/", "dblp": "247/6016.html;http://dblp.uni-trier.de/pers/hd/c/Chakraborty:Rudrasis;220/5337;", "google_scholar": "qaI1g_MAAAAJ;TB2Z8sgAAAAJ;Ita37_cAAAAJ;d32BmwcAAAAJ", "orcid": ";;;", "linkedin": "zhichun-huang-59563a132/;;;", "or_profile": "~Zhichun_Huang1;~Rudrasis_Chakraborty1;~Xingjian_Zhen1;~Vikas_Singh1", "aff": "Carnegie Mellon University;Lawrence Livermore National Labs;University of Wisconsin, Madison;University of Wisconsin, Madison", "aff_domain": "andrew.cmu.edu;llnl.gov;wisc.edu;wisc.edu", "position": "MS student;Researcher;PhD student;Professor", "bibtex": "@misc{\nhuang2021can,\ntitle={Can Kernel Transfer Operators Help Flow based Generative Models?},\nauthor={Zhichun Huang and Rudrasis Chakraborty and Xingjian Zhen and Vikas Singh},\nyear={2021},\nurl={https://openreview.net/forum?id=-BA38x6Cf2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=-BA38x6Cf2", "pdf_size": 0, "rating": "2;5;5;5", "confidence": "4;4;3;4", "wc_review": "586;428;331;1331", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "807;439;538;2389", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;3", "rating_avg": [ 4.25, 1.299038105676658 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 669.0, 392.8924789303048 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1043.25, 788.5513220456866 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:kO-hYkkv6eUJ:scholar.google.com/&scioq=Can+Kernel+Transfer+Operators+Help+Flow+based+Generative+Models%3F&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "Carnegie Mellon University;Lawrence Livermore National Laboratory;University of Wisconsin", "aff_unique_dep": ";;", "aff_unique_url": "https://www.cmu.edu;https://www.llnl.gov;https://www.wisc.edu", "aff_unique_abbr": "CMU;LLNL;UW", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Madison", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "-DRft_lKDqo", "title": "Generalized Universal Approximation for Certified Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "To certify safety and robustness of neural networks, researchers have successfully applied abstract interpretation, primarily using interval bound propagation. To understand the power of interval bounds, we present the abstract universal approximation (AUA) theorem, a generalization of the recent result by Baader et al. (2020) for ReLU networks to a large class of neural networks. The AUA theorem states that for any continuous function $f$, there exists a neural network that (1) approximates $f$ (universal approximation) and (2) whose interval bounds are an arbitrarily close approximation of the set semantics of $f$. The network may be constructed using any activation function from a rich class of functions---sigmoid, tanh, ReLU, ELU, etc.---making our result quite general. The key implication of the AUA theorem is that there always exists certifiably robust neural networks, which can be constructed using a wide range of activation functions.", "keywords": "adversarial deep learning;neural network verification;interval analysis", "primary_area": "", "supplementary_material": "", "author": "Zi Wang;Aws Albarghouthi;Somesh Jha", "authorids": "~Zi_Wang3;~Aws_Albarghouthi1;~Somesh_Jha1", "gender": "M;M;M", "homepage": "https://z1w.github.io/;http://pages.cs.wisc.edu/~aws/;", "dblp": ";90/8295;j/SomeshJha", "google_scholar": "https://scholar.google.com/citations?hl=en;https://scholar.google.com.tw/citations?user=CUbC2zYAAAAJ;BaI7l8QAAAAJ", "orcid": "0000-0002-0815-1343;;", "linkedin": "zi-wang-53221139/;;", "or_profile": "~Zi_Wang3;~Aws_Albarghouthi1;~Somesh_Jha1", "aff": "University of Wisconsin, Madison;University of Wisconsin, Madison;Department of Computer Science, University of Wisconsin, Madison", "aff_domain": "wisc.edu;wisc.edu;cs.wisc.edu", "position": "PhD student;Associate Professor;Full Professor", "bibtex": "@misc{\nwang2021generalized,\ntitle={Generalized Universal Approximation for Certified Networks},\nauthor={Zi Wang and Aws Albarghouthi and Somesh Jha},\nyear={2021},\nurl={https://openreview.net/forum?id=-DRft_lKDqo}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=-DRft_lKDqo", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "5;4;3;2", "wc_review": "481;1009;213;248", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "345;789;82;30", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 487.75, 318.0859750130458 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 311.5, 300.433436887441 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8944271909999159, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Z1c6B3P46NoJ:scholar.google.com/&scioq=Generalized+Universal+Approximation+for+Certified+Networks&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of Wisconsin;University of Wisconsin-Madison", "aff_unique_dep": ";Department of Computer Science", "aff_unique_url": "https://www.wisc.edu;https://www.wisc.edu", "aff_unique_abbr": "UW;UW-Madison", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Madison", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "-FzLjLBTQ1g", "title": "Context-Aware Temperature for Language Modeling", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Current practices to apply temperature scaling assume either a fixed, or a manually-crafted dynamically changing schedule. However, our studies indicate that the individual optimal trajectory for each class can change with the context. To this end, we propose context-aware temperature, a generalized approach to provide an individual optimal temperature trajectory over the context for each vocabulary, while allowing the temperature to be learned along with the remaining model parameters during training. Experiment results confirm that the proposed method significantly improves state-of-the-art language models, achieving a perplexity of 19.90 on Penn Treebank, 33.88 on WikiText-2, and 4.7 on WikiText-103.", "keywords": "natural language processing;language modeling;sequence modeling;temperature scaling", "primary_area": "", "supplementary_material": "", "author": "Pei-Hsin Wang;Sheng-Iou Hsieh;Shih-Chieh Chang;Yu-Ting Chen;Da-Cheng Juan;Jia-Yu Pan;Wei Wei", "authorids": "~Pei-Hsin_Wang1;~Sheng-Iou_Hsieh1;~Shih-Chieh_Chang1;~Yu-Ting_Chen2;~Da-Cheng_Juan1;~Jia-Yu_Pan1;~Wei_Wei15", "gender": "F;M;;M;;;M", "homepage": ";;;;;;http://www.weiwei.one", "dblp": ";;;;47/1564;14/4107;", "google_scholar": ";;;klyyP0YAAAAJ;https://scholar.google.com/citations?hl=en;https://scholar.google.com/citations?hl=en;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;0009-0005-6700-908X;", "linkedin": ";;;;;;", "or_profile": "~Pei-Hsin_Wang1;~Sheng-Iou_Hsieh1;~Shih-Chieh_Chang1;~Yu-Ting_Chen2;~Da-Cheng_Juan1;~Jia-Yu_Pan1;~wei_wei3", "aff": "National Tsing Hua University;;;Google;Google Research;Google;Google", "aff_domain": "nthu.edu.tw;;;google.com;google.com;google.com;google.com", "position": "MS student;;;Software Engineer;Senior Software Engineer;Software Engineer;Research Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=-FzLjLBTQ1g", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:kUD5Fv-OdwcJ:scholar.google.com/&scioq=Context-Aware+Temperature+for+Language+Modeling&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;1;1", "aff_unique_norm": "National Tsing Hua University;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.nthu.edu.tw;https://www.google.com", "aff_unique_abbr": "NTHU;Google", "aff_campus_unique_index": "0;1;1;1;1", "aff_campus_unique": "Taiwan;Mountain View", "aff_country_unique_index": "0;1;1;1;1", "aff_country_unique": "China;United States" }, { "title": "Categorical Normalizing Flows via Continuous Transformations", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2730", "id": "-GLNZeVDuik", "poster": "", "openreview": "https://openreview.net/forum?id=-GLNZeVDuik", "slides": "https://iclr.cc/virtual/2021/poster/2730", "video": "https://iclr.cc/virtual/2021/poster/2730", "author_site": "Phillip Lippe, Efstratios Gavves", "tldr": "", "abstract": "Despite their popularity, to date, the application of normalizing flows on categorical data stays limited. The current practice of using dequantization to map discrete data to a continuous space is inapplicable as categorical data has no intrinsic order. Instead, categorical data have complex and latent relations that must be inferred, like the synonymy between words. In this paper, we investigate Categorical Normalizing Flows, that is normalizing flows for categorical data. By casting the encoding of categorical data in continuous space as a variational inference problem, we jointly optimize the continuous representation and the model likelihood. Using a factorized decoder, we introduce an inductive bias to model any interactions in the normalizing flow. As a consequence, we do not only simplify the optimization compared to having a joint decoder, but also make it possible to scale up to a large number of categories that is currently impossible with discrete normalizing flows. Based on Categorical Normalizing Flows, we propose GraphCNF a permutation-invariant generative model on graphs. GraphCNF implements a three step approach modeling the nodes, edges, and adjacency matrix stepwise to increase efficiency. On molecule generation, GraphCNF outperforms both one-shot and autoregressive flow-based state-of-the-art.\n", "keywords": "Normalizing Flows;Density Estimation;Graph Generation", "primary_area": "", "supplementary_material": "/attachment/5566ef053acba9409bb4505b4d6d13b1788f8571.zip", "author": "Phillip Lippe;Efstratios Gavves", "authorids": "~Phillip_Lippe1;~Efstratios_Gavves1", "gender": "M;M", "homepage": "https://phlippe.github.io;https://www.egavves.com", "dblp": "267/9431;03/8693", "google_scholar": "69hFZp4AAAAJ;https://scholar.google.nl/citations?user=QqfCvsgAAAAJ", "orcid": "0000-0002-3639-6938;", "linkedin": "phillip-lippe/;", "or_profile": "~Phillip_Lippe1;~Efstratios_Gavves1", "aff": "University of Amsterdam;University of Amsterdam", "aff_domain": "uva.nl;uva.nl", "position": "PhD student;Associate Professor", "bibtex": "@inproceedings{\nlippe2021categorical,\ntitle={Categorical Normalizing Flows via Continuous Transformations},\nauthor={Phillip Lippe and Efstratios Gavves},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-GLNZeVDuik}\n}", "github": "[![github](/images/github_icon.svg) phlippe/CategoricalNF](https://github.com/phlippe/CategoricalNF)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "3;3;4;4", "wc_review": "528;457;172;217", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "744;1135;226;290", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 343.5, 151.93501900483642 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 598.75, 368.41917363242646 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 46, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3325488278431925119&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=-GLNZeVDuik", "email": "uva.nl;uva.nl", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Amsterdam", "aff_unique_dep": "", "aff_unique_url": "https://www.uva.nl", "aff_unique_abbr": "UvA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Netherlands" }, { "id": "-HsAI7VKsz", "title": "AggMask: Exploring locally aggregated learning of mask representations for instance segmentation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": " Recently proposed one-stage instance segmentation models (\\emph{e.g.}, SOLO) learn to directly predict location-specific object mask with fully-convolutional networks. They perform comparably well as the traditional two-stage Mask R-CNN model, yet enjoying much simpler architecture and higher efficiency. However, an intrinsic limitation of these models is that they tend to generate similar mask predictions for a single object at nearby locations, while most of them are directly discarded by non-maximum suppression, leading to a waste of some useful predictions that can supplement the final result. In this work, we aim to explore how the model can benefit from better leveraging the neighboring predictions while maintaining the architectural simplicity and efficiency. To this end, we develop a novel learning-based aggregation framework that learns to aggregate the neighboring predictions. Meanwhile, unlike original location-based masks, the segmentation model is implicitly supervised to learn location-aware \\textit{mask representations} that encode the geometric structure of nearby objects and complements adjacent representations with context. Based on the aggregation framework, we further introduce a mask interpolation mechanism that enables sharing mask representations for nearby spatial locations, thus allowing the model to generate much fewer representations for computation and memory saving. We experimentally show that by simply augmenting the baseline model with our proposed aggregation framework, the instance segmentation performance is significantly improved. For instance, it improves a SOLO model with ResNet-101 backbone by 2.0 AP on the COCO benchmark, with only about 2\\% increase of computation. {Code and models} are available at anonymous repository: {\\url{https://github.com/advdfacd/AggMask}}.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Tao Wang;Jun Hao Liew;Yu Li;Yunpeng Chen;Jiashi Feng", "authorids": "~Tao_Wang3;~Jun_Hao_Liew1;~Yu_Li7;~Yunpeng_Chen1;~Jiashi_Feng1", "gender": "M;;F;;", "homepage": ";;;;", "dblp": "12/5838-3;;;;", "google_scholar": "0Z_MGiYAAAAJ;https://scholar.google.com.sg/citations?user=8gm-CYYAAAAJ;4-1R-bMAAAAJ;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Tao_Wang3;~Jun_Hao_Liew1;~Yu_Li7;~Yunpeng_Chen1;~Jiashi_Feng1", "aff": "National University of Singapore;National University of Singapore;;;", "aff_domain": "nus.edu.sg;nus.edu.sg;;;", "position": "PhD student;Postdoc;;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=-HsAI7VKsz", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "4;5;4;4", "wc_review": "487;147;190;392", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 304.0, 140.42613716826366 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:W6Hoo73FYkQJ:scholar.google.com/&scioq=AggMask:+Exploring+locally+aggregated+learning+of+mask+representations+for+instance+segmentation&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "National University of Singapore", "aff_unique_dep": "", "aff_unique_url": "https://www.nus.edu.sg", "aff_unique_abbr": "NUS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Singapore" }, { "title": "Improving VAEs' Robustness to Adversarial Attack", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2585", "id": "-Hs_otp2RB", "poster": "", "openreview": "https://openreview.net/forum?id=-Hs_otp2RB", "slides": "https://iclr.cc/virtual/2021/poster/2585", "video": "https://iclr.cc/virtual/2021/poster/2585", "author_site": "Matthew Willetts, Alexander Camuto, Tom Rainforth, S Roberts, Christopher Holmes", "tldr": "", "abstract": "Variational autoencoders (VAEs) have recently been shown to be vulnerable to adversarial attacks, wherein they are fooled into reconstructing a chosen target image. However, how to defend against such attacks remains an open problem. We make significant advances in addressing this issue by introducing methods for producing adversarially robust VAEs. Namely, we first demonstrate that methods proposed to obtain disentangled latent representations produce VAEs that are more robust to these attacks. However, this robustness comes at the cost of reducing the quality of the reconstructions. We ameliorate this by applying disentangling methods to hierarchical VAEs. The resulting models produce high--fidelity autoencoders that are also adversarially robust. We confirm their capabilities on several different datasets and with current state-of-the-art VAE adversarial attacks, and also show that they increase the robustness of downstream tasks to attack.", "keywords": "deep generative models;variational autoencoders;robustness;adversarial attack", "primary_area": "", "supplementary_material": "/attachment/a8b6b8aeff23f0b4634451eb727ea06cc5868463.zip", "author": "Matthew JF Willetts;Alexander Camuto;Tom Rainforth;S Roberts;Christopher C Holmes", "authorids": "~Matthew_JF_Willetts1;acamuto@turing.ac.uk;~Tom_Rainforth1;~S_Roberts1;cholmes@stats.ox.ac.uk", "gender": "M;;M;;", "homepage": "http://csml.stats.ox.ac.uk/people/willetts/;;http://www.robots.ox.ac.uk/~twgr;;", "dblp": ";;166/1198;;", "google_scholar": "https://scholar.google.co.uk/citations?user=cuy1270AAAAJ;;https://scholar.google.co.uk/citations?user=ieLRNKMAAAAJ;;", "orcid": ";;;;", "linkedin": "https://linkedin.com/in/mwilletts/;;;;", "or_profile": "~Matthew_JF_Willetts1;acamuto@turing.ac.uk;~Tom_Rainforth1;~S_Roberts1;cholmes@stats.ox.ac.uk", "aff": "University of Oxford;;;;", "aff_domain": "ox.ac.uk;;ox.ac.uk;;", "position": "PhD student;;Postdoc;;", "bibtex": "@inproceedings{\nwilletts2021improving,\ntitle={Improving {\\{}VAE{\\}}s' Robustness to Adversarial Attack},\nauthor={Matthew JF Willetts and Alexander Camuto and Tom Rainforth and S Roberts and Christopher C Holmes},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-Hs_otp2RB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "5;3;2;3", "wc_review": "555;291;308;330", "wc_reply_reviewers": "80;0;16;0", "wc_reply_authors": "1434;588;753;711", "reply_reviewers": "2;0;1;0", "reply_authors": "4;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.25, 1.0897247358851685 ], "wc_review_avg": [ 371.0, 107.12842759977391 ], "wc_reply_reviewers_avg": [ 24.0, 32.984845004941285 ], "wc_reply_authors_avg": [ 871.5, 330.3713819325155 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 36, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17839469412730126252&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=-Hs_otp2RB", "email": "ox.ac.uk;;ox.ac.uk;;", "author_num": 5, "aff_unique_index": "0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "title": "Universal approximation power of deep residual neural networks via nonlinear control theory", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3010", "id": "-IXhmY16R3M", "poster": "", "openreview": "https://openreview.net/forum?id=-IXhmY16R3M", "slides": "https://iclr.cc/virtual/2021/poster/3010", "video": "https://iclr.cc/virtual/2021/poster/3010", "author_site": "Paulo Tabuada, Bahman Gharesifard", "tldr": "", "abstract": "In this paper, we explain the universal approximation capabilities of deep residual neural networks through geometric nonlinear control. Inspired by recent work establishing links between residual networks and control systems, we provide a general sufficient condition for a residual network to have the power of universal approximation by asking the activation function, or one of its derivatives, to satisfy a quadratic differential equation. Many activation functions used in practice satisfy this assumption, exactly or approximately, and we show this property to be sufficient for an adequately deep neural network with $n+1$ neurons per\nlayer to approximate arbitrarily well, on a compact set and with respect to the supremum norm, any continuous function from $\\mathbb{R}^n$ to $\\mathbb{R}^n$. We further show this result to hold for very simple architectures for which the weights only need to assume two values. The first key technical contribution consists of relating the universal approximation problem to controllability of an ensemble of control systems corresponding to a residual network and to leverage classical Lie algebraic techniques to characterize controllability. The second technical contribution is to identify monotonicity as the bridge between controllability of finite ensembles and uniform approximability on compact sets.", "keywords": "Deep residual neural networks;universal approximation;nonlinear control theory", "primary_area": "", "supplementary_material": "", "author": "Paulo Tabuada;Bahman Gharesifard", "authorids": "~Paulo_Tabuada1;bahman.gharesifard@queensu.ca", "gender": "M;", "homepage": "http://www.ee.ucla.edu/~tabuada;", "dblp": "43/2753;", "google_scholar": "bOElZi8AAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Paulo_Tabuada1;bahman.gharesifard@queensu.ca", "aff": "University of California, Los Angeles;", "aff_domain": "ucla.edu;", "position": "Professor;", "bibtex": "@inproceedings{\ntabuada2021universal,\ntitle={Universal approximation power of deep residual neural networks via nonlinear control theory},\nauthor={Paulo Tabuada and Bahman Gharesifard},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-IXhmY16R3M}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;4;3;3", "wc_review": "215;917;283;548", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 490.75, 275.7556663062429 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 40, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8191608884754487543&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=-IXhmY16R3M", "email": "ucla.edu;", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "University of California, Los Angeles", "aff_unique_dep": "", "aff_unique_url": "https://www.ucla.edu", "aff_unique_abbr": "UCLA", "aff_campus_unique_index": "0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "-J9xYzP2HD", "title": "Chameleon: Learning Model Initializations Across Tasks With Different Schemas", "track": "main", "status": "Reject", "tldr": "", "abstract": "Parametric models, and particularly neural networks, require weight initialization as a starting point for gradient-based optimization. Recent work shows that an initial parameter set can be learned from a population of supervised learning tasks that enables a fast convergence for unseen tasks even when only a handful of instances is available (model-agnostic meta-learning). \nCurrently, methods for learning model initializations are limited to a population of tasks sharing the same schema, i.e., the same number, order, type, and semantics of predictor and target variables.\nIn this paper, we address the problem of meta-learning weight initialization across tasks with different schemas, for example, if the number of predictors varies across tasks, while they still share some variables. We propose Chameleon, a model that learns to align different predictor schemas to a common representation. \nIn experiments on 23 datasets of the OpenML-CC18 benchmark, we show that Chameleon can successfully learn parameter initializations across tasks with different schemas, presenting, to the best of our knowledge, the first cross-dataset few-shot classification approach for unstructured data.", "keywords": "Meta-Learning;Initialization;Few-shot classification", "primary_area": "", "supplementary_material": "/attachment/631ecf657d58d32784f0149cbb2547ccbb4ef8fe.zip", "author": "Lukas Brinkmeyer;Rafael Rego Drumond;Randolf Scholz;Josif Grabocka;Lars Schmidt-Thieme", "authorids": "~Lukas_Brinkmeyer1;radrumond@ismll.uni-hildesheim.de;scholz@ismll.uni-hildesheim.de;~Josif_Grabocka1;~Lars_Schmidt-Thieme1", "gender": ";;;M;M", "homepage": "https://www.ismll.uni-hildesheim.de/personen/brinkmeyer.html;;;https://www.utn.de/departments/department-engineering/machine-learning-lab/;https://www.ismll.uni-hildesheim.de/personen/lst_en.html", "dblp": "https://dblp.uni-trier.de/pid/249/8048.html;;;117/4936;s/LarsSchmidtThieme", "google_scholar": ";;;KRy27XcAAAAJ;https://scholar.google.de/citations?user=l3taTdYAAAAJ", "orcid": ";;;;0000-0001-5729-6023", "linkedin": ";;;;", "or_profile": "~Lukas_Brinkmeyer1;radrumond@ismll.uni-hildesheim.de;scholz@ismll.uni-hildesheim.de;~Josif_Grabocka1;~Lars_Schmidt-Thieme1", "aff": "University of Hildesheim;;;Universit\u00e4t Freiburg;University of Hildesheim", "aff_domain": "uni-hildesheim.de;;;uni-freiburg.de;uni-hildesheim.de", "position": "PhD student;;;Assistant Professor;Full Professor", "bibtex": "@misc{\nbrinkmeyer2021chameleon,\ntitle={Chameleon: Learning Model Initializations Across Tasks With Different Schemas},\nauthor={Lukas Brinkmeyer and Rafael Rego Drumond and Randolf Scholz and Josif Grabocka and Lars Schmidt-Thieme},\nyear={2021},\nurl={https://openreview.net/forum?id=-J9xYzP2HD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=-J9xYzP2HD", "pdf_size": 0, "rating": "3;3;4;6;6", "confidence": "5;4;4;3;3", "wc_review": "751;525;340;219;246", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "567;509;471;291;189", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 4.4, 1.3564659966250536 ], "confidence_avg": [ 3.8, 0.7483314773547882 ], "wc_review_avg": [ 416.2, 198.79175033184853 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 405.4, 142.17397792845216 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.9063269671749656, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8331232190847247148&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Hildesheim;University of Freiburg", "aff_unique_dep": ";", "aff_unique_url": "https://www.uni-hildesheim.de/;https://www.uni-freiburg.de", "aff_unique_abbr": ";Uni Freiburg", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Germany" }, { "title": "Disentangling 3D Prototypical Networks for Few-Shot Concept Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3303", "id": "-Lr-u0b42he", "poster": "", "openreview": "https://openreview.net/forum?id=-Lr-u0b42he", "slides": "https://iclr.cc/virtual/2021/poster/3303", "video": "https://iclr.cc/virtual/2021/poster/3303", "author_site": "Mihir Prabhudesai, Shamit Lal, Darshan Patil, Hsiao-Yu Tung, Adam Harley, Katerina Fragkiadaki", "tldr": "", "abstract": "We present neural architectures that disentangle RGB-D images into objects\u2019 shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot concept classification. Our networks incorporate architectural biases that reflect the image formation process, 3D geometry of the world scene, and shape-style interplay. They are trained end-to-end self-supervised by predicting views in static scenes, alongside a small number of 3D object boxes. Objects and scenes are represented in terms of 3D feature grids in the bottleneck of the network. We show the proposed 3D neural representations are compositional: they can generate novel 3D scene feature maps by mixing object shapes and styles, resizing and adding the resulting object 3D feature maps over background scene feature maps. We show object detectors trained on hallucinated 3D neural scenes generalize better to novel environments. We show classifiers for object categories, color, materials, and spatial relationships trained over the disentangled 3D feature sub-spaces generalize better with dramatically fewer exemplars over the current state-of-the-art, and enable a visual question answering system that uses them as its modules to generalize one-shot to novel objects in the scene.", "keywords": "Disentanglement;Few Shot Learning;3D Vision;VQA", "primary_area": "", "supplementary_material": "/attachment/55d4429cb369a74f8edaaae4ecc9378602af4409.zip", "author": "Mihir Prabhudesai;Shamit Lal;Darshan Patil;Hsiao-Yu Tung;Adam W Harley;Katerina Fragkiadaki", "authorids": "~Mihir_Prabhudesai1;~Shamit_Lal1;~Darshan_Patil1;~Hsiao-Yu_Tung1;~Adam_W_Harley1;~Katerina_Fragkiadaki1", "gender": "M;M;M;M;F;F", "homepage": "https://mihirp1998.github.io/;https://shamitlal.github.io/;http://www.darshanpatil.com/;https://adamharley.com;https://www.cs.cmu.edu/~katef/;", "dblp": "249/9214;215/9914;211/8734;159/2077;21/8780;199/1661", "google_scholar": ";https://scholar.google.co.in/citations?user=8BZMgt4AAAAJ;https://scholar.google.ca/citations?user=X3HJD0AAAAAJ;OB6vAtkAAAAJ;FWp7728AAAAJ;", "orcid": ";;;;;", "linkedin": ";shamit-lal-877357b8/;;;;", "or_profile": "~Mihir_Prabhudesai1;~Shamit_Lal1;~Darshan_Patil1;~Adam_W_Harley1;~Katerina_Fragkiadaki1;~Hsiao-Yu_Fish_Tung1", "aff": "School of Computer Science, Carnegie Mellon University;Fyusion;University of Montreal;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University", "aff_domain": "cs.cmu.edu;fyusion.com;umontreal.ca;cmu.edu;cmu.edu;cmu.edu", "position": "PhD student;Researcher;MS student;PhD student;Assistant Professor;PhD student", "bibtex": "@inproceedings{\nprabhudesai2021disentangling,\ntitle={Disentangling 3D Prototypical Networks for Few-Shot Concept Learning},\nauthor={Mihir Prabhudesai and Shamit Lal and Darshan Patil and Hsiao-Yu Tung and Adam W Harley and Katerina Fragkiadaki},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-Lr-u0b42he}\n}", "github": "[![github](/images/github_icon.svg) mihirp1998/Disentangling-3D-Prototypical-Nets](https://github.com/mihirp1998/Disentangling-3D-Prototypical-Nets)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;4;3;4", "wc_review": "538;493;554;201", "wc_reply_reviewers": "0;16;0;0", "wc_reply_authors": "634;800;545;151", "reply_reviewers": "0;1;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 446.5, 143.49303118967137 ], "wc_reply_reviewers_avg": [ 4.0, 6.928203230275509 ], "wc_reply_authors_avg": [ 532.5, 238.51467460095617 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.7071067811865476, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3118057905544966050&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=-Lr-u0b42he", "email": "cs.cmu.edu;fyusion.com;umontreal.ca;cmu.edu;cmu.edu;cmu.edu", "author_num": 6, "aff_unique_index": "0;1;2;0;0;0", "aff_unique_norm": "Carnegie Mellon University;Fyusion;University of Montreal", "aff_unique_dep": "School of Computer Science;;", "aff_unique_url": "https://www.cmu.edu;https://www.fyusion.com;https://wwwumontreal.ca", "aff_unique_abbr": "CMU;;UM", "aff_campus_unique_index": "0", "aff_campus_unique": "Pittsburgh;", "aff_country_unique_index": "0;1;1;0;0;0", "aff_country_unique": "United States;Canada" }, { "title": "SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2986", "id": "-M0QkvBGTTq", "poster": "", "openreview": "https://openreview.net/forum?id=-M0QkvBGTTq", "slides": "https://iclr.cc/virtual/2021/poster/2986", "video": "https://iclr.cc/virtual/2021/poster/2986", "author_site": "A F M Shahab Uddin, Mst. Sirazam Monira, Wheemyung Shin, TaeChoong Chung, Sung-Ho Bae", "tldr": "", "abstract": "Advanced data augmentation strategies have widely been studied to improve the generalization ability of deep learning models. Regional dropout is one of the popular solutions that guides the model to focus on less discriminative parts by randomly removing image regions, resulting in improved regularization. However, such information removal is undesirable. On the other hand, recent strategies suggest to randomly cut and mix patches and their labels among training images, to enjoy the advantages of regional dropout without having any pointless pixel in the augmented images. We argue that such random selection strategies of the patches may not necessarily represent sufficient information about the corresponding object and thereby mixing the labels according to that uninformative patch enables the model to learn unexpected feature representation. Therefore, we propose SaliencyMix that carefully selects a representative image patch with the help of a saliency map and mixes this indicative patch with the target image, thus leading the model to learn more appropriate feature representation. SaliencyMix achieves the best known top-1 error of $21.26\\%$ and $20.09\\%$ for ResNet-50 and ResNet-101 architectures on ImageNet classification, respectively, and also improves the model robustness against adversarial perturbations. Furthermore, models that are trained with SaliencyMix, help to improve the object detection performance. Source code is available at \\url{https://github.com/SaliencyMix/SaliencyMix}.", "keywords": "SaliencyMix;Saliency Guided Data Augmentation;Data Augmentation;Regularization", "primary_area": "", "supplementary_material": "/attachment/1383918cadf6ca4b34b950131384a81aaf74f6b0.zip", "author": "A F M Shahab Uddin;Mst. Sirazam Monira;Wheemyung Shin;TaeChoong Chung;Sung-Ho Bae", "authorids": "~A_F_M_Shahab_Uddin1;~Mst._Sirazam_Monira1;wheemi@khu.ac.kr;tcchung@khu.ac.kr;~Sung-Ho_Bae1", "gender": "M;F;;;M", "homepage": ";;;;https://sites.google.com/a/khu.ac.kr/mlvc/", "dblp": ";;;;76/2068", "google_scholar": "Ckkj9gQAAAAJ;;;;https://scholar.google.co.kr/citations?user=EULut5oAAAAJ", "orcid": "0000-0003-1074-0515;0000-0001-6932-5557;;;", "linkedin": ";;;;", "or_profile": "~A_F_M_Shahab_Uddin1;~Mst._Sirazam_Monira1;wheemi@khu.ac.kr;tcchung@khu.ac.kr;~Sung-Ho_Bae1", "aff": "KyungHee University;;;;Kyung Hee University", "aff_domain": "khu.ac.kr;;;;khu.ac.kr", "position": "PhD student;;;;Associate Professor", "bibtex": "@inproceedings{\nuddin2021saliencymix,\ntitle={SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization},\nauthor={A F M Shahab Uddin and Mst. Sirazam Monira and Wheemyung Shin and TaeChoong Chung and Sung-Ho Bae},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-M0QkvBGTTq}\n}", "github": "[![github](/images/github_icon.svg) SaliencyMix/SaliencyMix](https://github.com/SaliencyMix/SaliencyMix) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=-M0QkvBGTTq)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer5;AnonReviewer1", "pdf_size": 0, "rating": "3;3;7;7;9", "confidence": "3;4;4;4;5", "wc_review": "256;300;1140;242;162", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "719;541;768;466;67", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.8, 2.4 ], "confidence_avg": [ 4.0, 0.6324555320336759 ], "wc_review_avg": [ 420.0, 362.75170571618264 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 512.2, 248.74356273077703 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.7905694150420948, "gs_citation": 246, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13015633056720744259&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=-M0QkvBGTTq", "email": "khu.ac.kr;;;;khu.ac.kr", "author_num": 5, "aff_unique_index": "0;0", "aff_unique_norm": "Kyung Hee University", "aff_unique_dep": "", "aff_unique_url": "http://www.khu.ac.kr", "aff_unique_abbr": "KHU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "title": "Lipschitz Recurrent Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3112", "id": "-N7PBXqOUJZ", "poster": "", "openreview": "https://openreview.net/forum?id=-N7PBXqOUJZ", "slides": "https://iclr.cc/virtual/2021/poster/3112", "video": "https://iclr.cc/virtual/2021/poster/3112", "author_site": "N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W Mahoney", "tldr": "", "abstract": "Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity. This particular functional form facilitates stability analysis of the long-term behavior of the recurrent unit using tools from nonlinear systems theory. In turn, this enables architectural design decisions before experimentation. Sufficient conditions for global stability of the recurrent unit are obtained, motivating a novel scheme for constructing hidden-to-hidden matrices. Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks, including computer vision, language modeling and speech prediction tasks. Finally, through Hessian-based analysis we demonstrate that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.", "keywords": "recurrent neural networks;dynamical systems;differential equations", "primary_area": "", "supplementary_material": "/attachment/998595eab54906f974e7dec769e309f1a0f86c6a.zip", "author": "N. Benjamin Erichson;Omri Azencot;Alejandro Queiruga;Liam Hodgkinson;Michael W. Mahoney", "authorids": "~N._Benjamin_Erichson1;azencot@bgu.ac.il;afq@google.com;~Liam_Hodgkinson1;~Michael_W._Mahoney1", "gender": "M;;;M;", "homepage": "https://www.benerichson.com/;;;http://www.liamhodgkinson.com;", "dblp": "173/5153;;;238/1555;", "google_scholar": "https://scholar.google.co.uk/citations?user=8ViYcioAAAAJ;;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~N._Benjamin_Erichson1;azencot@bgu.ac.il;afq@google.com;~Liam_Hodgkinson1;~Michael_W._Mahoney1", "aff": "University of California, Berkeley;;;University of California, Berkeley;", "aff_domain": "berkeley.edu;;;berkeley.edu;", "position": "Postdoc;;;Postdoc;", "bibtex": "@inproceedings{\nerichson2021lipschitz,\ntitle={Lipschitz Recurrent Neural Networks},\nauthor={N. Benjamin Erichson and Omri Azencot and Alejandro Queiruga and Liam Hodgkinson and Michael W. Mahoney},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-N7PBXqOUJZ}\n}", "github": "[![github](/images/github_icon.svg) erichson/LipschitzRNN](https://github.com/erichson/LipschitzRNN)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "4;4;4;3", "wc_review": "599;155;640;150", "wc_reply_reviewers": "250;0;464;0", "wc_reply_authors": "1611;535;1632;140", "reply_reviewers": "2;0;1;0", "reply_authors": "4;1;3;1", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 386.0, 233.9561924805582 ], "wc_reply_reviewers_avg": [ 178.5, 193.87302545738538 ], "wc_reply_authors_avg": [ 979.5, 657.0557434495189 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.25, 1.299038105676658 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7745966692414834, "gs_citation": 142, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9494951983450732150&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "pdf": "https://openreview.net/pdf?id=-N7PBXqOUJZ", "email": "berkeley.edu;;;berkeley.edu;", "author_num": 5, "aff_unique_index": "0;0", "aff_unique_norm": "University of California, Berkeley", "aff_unique_dep": "", "aff_unique_url": "https://www.berkeley.edu", "aff_unique_abbr": "UC Berkeley", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Berkeley", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "-NEXDKk8gZ", "title": "Improved Denoising Diffusion Probabilistic Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "We explore denoising diffusion probabilistic models, a class of generative models which have recently been shown to produce excellent samples in the image and audio domains. While these models produce excellent samples, it has yet to be shown that they can achieve competitive log-likelihoods. We show that, with several small modifications, diffusion models can achieve competitive log-likelihoods in the image domain while maintaining high sample quality. Additionally, our models allow for sampling with an order of magnitude fewer diffusion steps with only a modest difference in sample quality. Finally, we explore how sample quality and log-likelihood scale with the number of diffusion steps and the amount of model capacity. We conclude that denoising diffusion probabilistic models are a promising class of generative models with excellent scaling properties and sample quality.", "keywords": "neural networks;generative models;log-likelihood;diffusion models;denoising diffusion probabilistic models;image generation", "primary_area": "", "supplementary_material": "", "author": "Alexander Quinn Nichol;Prafulla Dhariwal", "authorids": "~Alexander_Quinn_Nichol1;~Prafulla_Dhariwal1", "gender": "M;M", "homepage": "https://github.com/unixpickle;https://prafulladhariwal.com/", "dblp": ";", "google_scholar": ";0pOgVVAAAAAJ", "orcid": ";", "linkedin": ";prafulladhariwal", "or_profile": "~Alexander_Quinn_Nichol1;~Prafulla_Dhariwal1", "aff": "OpenAI;OpenAI", "aff_domain": "openai.com;openai.com", "position": "Researcher;Researcher", "bibtex": "@misc{\nnichol2021improved,\ntitle={Improved Denoising Diffusion Probabilistic Models},\nauthor={Alexander Quinn Nichol and Prafulla Dhariwal},\nyear={2021},\nurl={https://openreview.net/forum?id=-NEXDKk8gZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer5", "site": "https://openreview.net/forum?id=-NEXDKk8gZ", "pdf_size": 0, "rating": "5;5;5;5", "confidence": "2;3;4;3", "wc_review": "266;135;315;417", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "436;254;611;344", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 283.25, 101.45534732087806 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 411.25, 132.06319509992176 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 4241, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2227179395488568184&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0", "aff_unique_norm": "OpenAI", "aff_unique_dep": "", "aff_unique_url": "https://openai.com", "aff_unique_abbr": "OpenAI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3255", "id": "-ODN6SbiUU", "poster": "", "openreview": "https://openreview.net/forum?id=-ODN6SbiUU", "slides": "https://iclr.cc/virtual/2021/poster/3255", "video": "https://iclr.cc/virtual/2021/poster/3255", "author_site": "Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah", "tldr": "", "abstract": "The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance. However, they heavily rely on domain-specific data augmentations, which are not easy to generate for all data modalities. Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation. We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models; these predictions generate many incorrect pseudo-labels, leading to noisy training. We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process. Furthermore, UPS generalizes the pseudo-labeling process, allowing for the creation of negative pseudo-labels; these negative pseudo-labels can be used for multi-label classification as well as negative learning to improve the single-label classification. We achieve strong performance when compared to recent SSL methods on the CIFAR-10 and CIFAR-100 datasets. Also, we demonstrate the versatility of our method on the video dataset UCF-101 and the multi-label dataset Pascal VOC.", "keywords": "Semi-Supervised Learning;Pseudo-Labeling;Uncertainty;Calibration;Deep Learning", "primary_area": "", "supplementary_material": "", "author": "Mamshad Nayeem Rizve;Kevin Duarte;Yogesh S Rawat;Mubarak Shah", "authorids": "~Mamshad_Nayeem_Rizve1;~Kevin_Duarte1;~Yogesh_S_Rawat1;~Mubarak_Shah3", "gender": "M;M;M;M", "homepage": "https://nayeemrizve.github.io/;;https://www.crcv.ucf.edu/person/rawat/;https://www.crcv.ucf.edu/person/mubarak-shah/", "dblp": "260/4900;220/4092;148/2258;s/MubarakShah", "google_scholar": "kA8ZM5oAAAAJ;PxD5DrYAAAAJ;D_JvEcwAAAAJ;https://scholar.google.com.tw/citations?user=p8gsO3gAAAAJ", "orcid": ";;;0000-0002-8216-1128", "linkedin": ";kevin-duarte-vision/;;mubarak-shah-b6aa68213/", "or_profile": "~Mamshad_Nayeem_Rizve1;~Kevin_Duarte1;~Yogesh_S_Rawat1;~Mubarak_Shah3", "aff": "University of Central Florida;University of Central Florida;University of Central Florida;University of Central Florida", "aff_domain": "ucf.edu;ucf.edu;ucf.edu;ucf.edu", "position": "MS student;PhD student;Assistant Professor;Full Professor", "bibtex": "@inproceedings{\nrizve2021in,\ntitle={In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning},\nauthor={Mamshad Nayeem Rizve and Kevin Duarte and Yogesh S Rawat and Mubarak Shah},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-ODN6SbiUU}\n}", "github": "[![github](/images/github_icon.svg) nayeemrizve/ups](https://github.com/nayeemrizve/ups) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=-ODN6SbiUU)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "5;6;6;9", "confidence": "4;4;4;5", "wc_review": "336;308;294;531", "wc_reply_reviewers": "51;250;0;0", "wc_reply_authors": "907;924;391;420", "reply_reviewers": "1;1;0;0", "reply_authors": "3;2;1;1", "rating_avg": [ 6.5, 1.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 367.25, 95.74281957410696 ], "wc_reply_reviewers_avg": [ 75.25, 103.01789893023445 ], "wc_reply_authors_avg": [ 660.5, 255.2768105410282 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.9622504486493763, "gs_citation": 743, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18358012281479028989&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=-ODN6SbiUU", "email": "ucf.edu;ucf.edu;ucf.edu;ucf.edu", "author_num": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "University of Central Florida", "aff_unique_dep": "", "aff_unique_url": "https://www.ucf.edu", "aff_unique_abbr": "UCF", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "-Qaj4_O3cO", "title": "DCT-SNN: Using DCT to Distribute Spatial Information over Time for Learning Low-Latency Spiking Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Spiking Neural Networks (SNNs) offer a promising alternative to traditional deep learning frameworks, since they provide higher \ncomputational efficiency due to event-driven information processing. SNNs distribute the analog values of pixel intensities into binary spikes over time. However, the most widely used input coding schemes, such as Poisson based rate-coding, do not leverage the additional temporal learning capability of SNNs effectively. Moreover, these SNNs suffer from high inference latency which is a major bottleneck to their deployment. To overcome this, we propose a scalable time-based encoding scheme that utilizes the Discrete Cosine Transform (DCT) to reduce the number of timesteps required for inference. DCT decomposes an image into a weighted sum of sinusoidal basis images. At each time step, a single frequency base, taken in order and modulated\nby its corresponding DCT coefficient, is input to an accumulator that generates spikes upon crossing a threshold. We use the proposed scheme to learn DCT-SNN, a low-latency deep SNN with leaky-integrate-and-fire neurons, trained using surrogate gradient descent based backpropagation. We achieve top-1 accuracy of 89.94%, 68.3% and 52.43% on CIFAR-10, CIFAR-100 and TinyImageNet, respectively using VGG architectures. Notably, DCT-SNN performs inference with 2-14X reduced latency compared to other state-of-the-art SNNs, while achieving comparable accuracy to their standard deep learning counterparts. The dimension of the transform allows us to control the number of timesteps required for inference. Additionally, we can trade-off accuracy with latency in a principled manner by dropping the highest frequency components during inference.", "keywords": "Spiking Neural Networks;Input Encoding;Low Latency;Discrete Cosine Transform;Temporal Information;Frequency Domain", "primary_area": "", "supplementary_material": "/attachment/56525a045ab8bd664bc29909366f721955a56688.zip", "author": "Isha Garg;Sayeed Shafayet Chowdhury;Kaushik Roy", "authorids": "~Isha_Garg1;~Sayeed_Shafayet_Chowdhury3;~Kaushik_Roy1", "gender": "F;M;M", "homepage": ";;https://engineering.purdue.edu/NRL/Group", "dblp": ";;r/KaushikRoy", "google_scholar": ";646ndV4AAAAJ;to4P8KgAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Isha_Garg1;~Sayeed_Shafayet_Chowdhury3;~Kaushik_Roy1", "aff": "Purdue University;Purdue University;Purdue University", "aff_domain": "purdue.edu;purdue.edu;purdue.edu", "position": "PhD student;PhD student;Full Professor", "bibtex": "@misc{\ngarg2021dctsnn,\ntitle={{\\{}DCT{\\}}-{\\{}SNN{\\}}: Using {\\{}DCT{\\}} to Distribute Spatial Information over Time for Learning Low-Latency Spiking Neural Networks},\nauthor={Isha Garg and Sayeed Shafayet Chowdhury and Kaushik Roy},\nyear={2021},\nurl={https://openreview.net/forum?id=-Qaj4_O3cO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=-Qaj4_O3cO", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;3;4;5", "wc_review": "430;362;768;368", "wc_reply_reviewers": "0;0;106;109", "wc_reply_authors": "1928;1727;2592;1551", "reply_reviewers": "0;0;1;2", "reply_authors": "4;3;5;3", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 482.0, 167.25429740368406 ], "wc_reply_reviewers_avg": [ 53.75, 53.760464097699156 ], "wc_reply_authors_avg": [ 1949.5, 394.20077371816507 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 3.75, 0.82915619758885 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 47, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9254587044725882983&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "Purdue University", "aff_unique_dep": "", "aff_unique_url": "https://www.purdue.edu", "aff_unique_abbr": "Purdue", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Meta-learning Symmetries by Reparameterization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2607", "id": "-QxT4mJdijq", "poster": "", "openreview": "https://openreview.net/forum?id=-QxT4mJdijq", "slides": "https://iclr.cc/virtual/2021/poster/2607", "video": "https://iclr.cc/virtual/2021/poster/2607", "author_site": "Allan Zhou, Tom Knowles, Chelsea Finn", "tldr": "", "abstract": "Many successful deep learning architectures are equivariant to certain transformations in order to conserve parameters and improve generalization: most famously, convolution layers are equivariant to shifts of the input. This approach only works when practitioners know the symmetries of the task and can manually construct an architecture with the corresponding equivariances. Our goal is an approach for learning equivariances from data, without needing to design custom task-specific architectures. We present a method for learning and encoding equivariances into networks by learning corresponding parameter sharing patterns from data. Our method can provably represent equivariance-inducing parameter sharing for any finite group of symmetry transformations. Our experiments suggest that it can automatically learn to encode equivariances to common transformations used in image processing tasks.", "keywords": "meta-learning;equivariance;convolution;symmetry", "primary_area": "", "supplementary_material": "", "author": "Allan Zhou;Tom Knowles;Chelsea Finn", "authorids": "~Allan_Zhou1;tknowles@stanford.edu;~Chelsea_Finn1", "gender": ";;F", "homepage": "http://bland.website;;https://ai.stanford.edu/~cbfinn/", "dblp": "195/6907;;131/1783", "google_scholar": ";;vfPE6hgAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Allan_Zhou1;tknowles@stanford.edu;~Chelsea_Finn1", "aff": "Meta Facebook;;Google", "aff_domain": "facebook.com;;google.com", "position": "Intern;;Research Scientist", "bibtex": "@inproceedings{\nzhou2021metalearning,\ntitle={Meta-learning Symmetries by Reparameterization},\nauthor={Allan Zhou and Tom Knowles and Chelsea Finn},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-QxT4mJdijq}\n}", "github": "[![github](/images/github_icon.svg) AllanYangZhou/metalearning-symmetries](https://github.com/AllanYangZhou/metalearning-symmetries) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=-QxT4mJdijq)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;6;8;9", "confidence": "3;4;4;4", "wc_review": "450;241;664;315", "wc_reply_reviewers": "169;0;0;0", "wc_reply_authors": "861;55;341;203", "reply_reviewers": "2;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 7.0, 1.5811388300841898 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 417.5, 160.83920541957423 ], "wc_reply_reviewers_avg": [ 42.25, 73.17914661978507 ], "wc_reply_authors_avg": [ 365.0, 303.7005103716489 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.7302967433402215, "gs_citation": 109, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9023763137137918184&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=-QxT4mJdijq", "email": "facebook.com;;google.com", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Meta;Google", "aff_unique_dep": "Meta Platforms, Inc.;Google", "aff_unique_url": "https://meta.com;https://www.google.com", "aff_unique_abbr": "Meta;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "-RQVWPX73VP", "title": "Interpretable Meta-Reinforcement Learning with Actor-Critic Method", "track": "main", "status": "Reject", "tldr": "", "abstract": "Meta-reinforcement learning (meta-RL) algorithms have successfully trained agent systems to perform well on different tasks within only few updates. However, in gradient-based meta-RL algorithms, the Q-function at adaptation step is mainly estimated by the return of few trajectories, which can lead to high variance in Q-value and biased meta-gradient estimation, and the adaptation uses a large number of batched trajectories. To address these challenges, we propose a new meta-RL algorithm that can reduce the variance and bias of the meta-gradient estimation and perform few-shot task data sampling, which makes the meta-policy more interpretable. We reformulate the meta-RL objective, and introduce contextual Q-function as a meta-policy critic during task adaptation step and learn the Q-function under a soft actor-critic (SAC) framework. The experimental results on 2D navigation task and meta-RL benchmarks show that our approach can learn an more interpretable meta-policy to explore unknown environment and the performance are comparable to previous gradient-based algorithms.", "keywords": "meta-reinforcement learning;actor-critic;deep learning;interpretable", "primary_area": "", "supplementary_material": "", "author": "Xingyuan Liang;Xu-Ying Liu", "authorids": "~Xingyuan_Liang1;liuxy@seu.edu.cn", "gender": "M;", "homepage": ";", "dblp": ";", "google_scholar": ";", "orcid": "0000-0001-7947-904X;", "linkedin": ";", "or_profile": "~Xingyuan_Liang1;liuxy@seu.edu.cn", "aff": "Southeast University;", "aff_domain": "seu.edu.cn;", "position": "MS student;", "bibtex": "@misc{\nliang2021interpretable,\ntitle={Interpretable Meta-Reinforcement Learning with Actor-Critic Method},\nauthor={Xingyuan Liang and Xu-Ying Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=-RQVWPX73VP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer5;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=-RQVWPX73VP", "pdf_size": 0, "rating": "2;3;3;4;4", "confidence": "5;4;3;3;4", "wc_review": "214;655;255;465;307", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 3.2, 0.7483314773547882 ], "confidence_avg": [ 3.8, 0.7483314773547882 ], "wc_review_avg": [ 379.2, 162.08442244706922 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.6428571428571428, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:bq5coJkZPJYJ:scholar.google.com/&scioq=Interpretable+Meta-Reinforcement+Learning+with+Actor-Critic+Method&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Southeast University", "aff_unique_dep": "", "aff_unique_url": "https://www.seu.edu.cn/", "aff_unique_abbr": "SEU", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "-S7-RsPv78e", "title": "Optimizing Quantized Neural Networks in a Weak Curvature Manifold", "track": "main", "status": "Reject", "tldr": "", "abstract": "Quantized Neural Networks (QNNs) have achieved an enormous step in improving computational efficiency, making it possible to deploy large models to mobile and miniaturized devices.\nIn order to narrow the performance gap between low-precision and full-precision models, we introduce the natural gradient to train a low-precision model by viewing the parameter space as a Riemannian manifold.\nSpecifically, we propose a novel Optimized Natural Gradient Descent (ONGD) method defined by the Hyperbolic divergence, which provides a perspective to calculate the optimized natural gradient in weak curvature and updates the parameters with an amount of computation comparable to Stochastic Gradient Descent (SGD).\nWe conduct an ablation study and results show that the 4-bit quantized ResNet-32 trained with ONGD has a better result than SGD, i.e. 2.05\\% higher in Top-1 accuracy on CIFAR100 dataset.\nFurther comparison experiments illustrate that our method achieves state-of-the-art results in CIFAR and ImageNet datasets, where the 8-bit version of MobileNet achieves 0.25\\%/0.13\\% higher in Top-1/Top-5 accuracies than the full-precision version on ImageNet dataset.", "keywords": "Deep learning;Neural network quantization;Information geometry", "primary_area": "", "supplementary_material": "", "author": "Jun Chen;Hanwen Chen;Jiangning Zhang;Wenzhou Chen;Yong Liu;Yunliang Jiang", "authorids": "~Jun_Chen9;chenhanwen@zju.edu.cn;~Jiangning_Zhang1;wenzhouchen@zju.edu.cn;~Yong_Liu11;~Yunliang_Jiang2", "gender": "M;;M;;M;M", "homepage": ";;https://www.researchgate.net/profile/Jiangning_Zhang2;;https://person.zju.edu.cn/en/yongliu;http://www.zjhu.edu.cn/page/12.html", "dblp": ";;241/9593;;29/4867-7;", "google_scholar": "YKc2O78AAAAJ;;https://scholar.google.com.hk/citations?user=2hA4X9wAAAAJ;;https://scholar.google.com.hk/citations?user=qYcgBbEAAAAJ;", "orcid": "0000-0001-6568-8801;;;;0000-0003-4822-8939;", "linkedin": ";;;;;", "or_profile": "~Jun_Chen9;chenhanwen@zju.edu.cn;~Jiangning_Zhang1;wenzhouchen@zju.edu.cn;~Yong_Liu11;~Yunliang_Jiang2", "aff": "Zhejiang University;;Zhejiang University;;Zhejiang University;Huzhou University", "aff_domain": "zju.edu.cn;;zju.edu.cn;;zju.edu.cn;zjhu.edu.cn", "position": "PhD student;;PhD student;;Full Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=-S7-RsPv78e", "pdf_size": 0, "rating": "3;3;5;5", "confidence": "4;5;4;3", "wc_review": "528;730;264;224", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "579;785;509;361", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.0, 1.0 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 436.5, 205.80755574079393 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 558.5, 152.62617730913658 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.7071067811865475, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:IrFmdHidiDoJ:scholar.google.com/&scioq=Optimizing+Quantized+Neural+Networks+in+a+Weak+Curvature+Manifold&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Zhejiang University;Huzhou University", "aff_unique_dep": ";", "aff_unique_url": "https://www.zju.edu.cn;http://www.hzu.edu.cn", "aff_unique_abbr": "ZJU;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "title": "PseudoSeg: Designing Pseudo Labels for Semantic Segmentation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3033", "id": "-TwO99rbVRu", "poster": "", "openreview": "https://openreview.net/forum?id=-TwO99rbVRu", "slides": "https://iclr.cc/virtual/2021/poster/3033", "video": "https://iclr.cc/virtual/2021/poster/3033", "author_site": "Yuliang Zou, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, Tomas Pfister", "tldr": "", "abstract": "Recent advances in semi-supervised learning (SSL) demonstrate that a combination of consistency regularization and pseudo-labeling can effectively improve image classification accuracy in the low-data regime. Compared to classification, semantic segmentation tasks require much more intensive labeling costs. Thus, these tasks greatly benefit from data-efficient training methods. However, structured outputs in segmentation render particular difficulties (e.g., designing pseudo-labeling and augmentation) to apply existing SSL strategies. To address this problem, we present a simple and novel re-design of pseudo-labeling to generate well-calibrated structured pseudo labels for training with unlabeled or weakly-labeled data. Our proposed pseudo-labeling strategy is network structure agnostic to apply in a one-stage consistency training framework. We demonstrate the effectiveness of the proposed pseudo-labeling strategy in both low-data and high-data regimes. Extensive experiments have validated that pseudo labels generated from wisely fusing diverse sources and strong data augmentation are crucial to consistency training for segmentation. The source code will be released.", "keywords": "pseudo-labeling;semi-supervised;semantic-segmentation", "primary_area": "", "supplementary_material": "", "author": "Yuliang Zou;Zizhao Zhang;Han Zhang;Chun-Liang Li;Xiao Bian;Jia-Bin Huang;Tomas Pfister", "authorids": "~Yuliang_Zou1;~Zizhao_Zhang3;~Han_Zhang1;~Chun-Liang_Li1;~Xiao_Bian3;~Jia-Bin_Huang1;~Tomas_Pfister1", "gender": "M;M;M;M;M;M;M", "homepage": "http://yuliang.vision;https://sites.google.com/corp/view/zizhaozhang;https://sites.google.com/corp/view/hanzhang;http://chunliangli.github.io;https://scholar.google.com/citations?user=ZpF26loAAAAJ&hl=en&oi=ao;https://jbhuang0604.github.io/;http://tomas.pfister.fi", "dblp": "199/2331;;;;116/5018;51/1815-1.html;14/8360", "google_scholar": "6qQlncEAAAAJ;https://scholar.google.dk/citations?hl=en;cxEoVL4AAAAJ;https://scholar.google.com.tw/citations?user=vqHIt_sAAAAJ;ZpF26loAAAAJ;pp848fYAAAAJ;ahSpJOAAAAAJ", "orcid": ";;;;;;0009-0004-4088-8718", "linkedin": ";;;;;jia-bin-huang-070a7418/;", "or_profile": "~Yuliang_Zou1;~Zizhao_Zhang3;~Han_Zhang1;~Chun-Liang_Li1;~Xiao_Bian3;~Jia-Bin_Huang1;~Tomas_Pfister1", "aff": "Virginia Tech;Google;Google;Google;Google;Virginia Tech;Google", "aff_domain": "vt.edu;google.com;google.com;google.com;google.com;vt.edu;google.com", "position": "PhD student;Researcher;Researcher;Researcher;Staff software engineer;Assistant Professor;Head of Research @ Cloud AI", "bibtex": "@inproceedings{\nzou2021pseudoseg,\ntitle={PseudoSeg: Designing Pseudo Labels for Semantic Segmentation},\nauthor={Yuliang Zou and Zizhao Zhang and Han Zhang and Chun-Liang Li and Xiao Bian and Jia-Bin Huang and Tomas Pfister},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-TwO99rbVRu}\n}", "github": "[![github](/images/github_icon.svg) googleinterns/wss](https://github.com/googleinterns/wss) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=-TwO99rbVRu)", "project": "", "reviewers": "AnonReviewer5;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;7;8", "confidence": "4;4;5", "wc_review": "292;255;335", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "709;816;495", "reply_reviewers": "0;0;0", "reply_authors": "1;2;1", "rating_avg": [ 7.0, 0.816496580927726 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 294.0, 32.69046751985457 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 673.3333333333334, 133.45244679493726 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 413, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11801417491488488735&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=-TwO99rbVRu", "email": "vt.edu;google.com;google.com;google.com;google.com;vt.edu;google.com", "author_num": 7, "aff_unique_index": "0;1;1;1;1;0;1", "aff_unique_norm": "Virginia Tech;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.vt.edu;https://www.google.com", "aff_unique_abbr": "VT;Google", "aff_campus_unique_index": "1;1;1;1;1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "-UOq2-C27Yc", "title": "Modeling from Features: a Mean-field Frameworkfor Over-parameterized Deep Neural Networks", "track": "main", "status": "Desk Reject", "tldr": "", "abstract": "This paper proposes a new mean-field framework for over-parameterized deep neural networks (DNNs), which can be used to analyze neural network training. In this framework, a DNN is represented by probability measures and functions over its features (that is, the function values of the hidden units over the training data) in the continuous limit, instead of the neural network parameters as most existing studies have done. This new representation overcomes the degenerate situation where all the hidden units essentially have only one meaningful hidden unit in each middle layer, leading to a simpler representation of DNNs. Moreover, we construct a non-linear dynamics called neural feature flow, which captures the evolution of an over-parameterized DNN trained by Gradient Descent. We illustrate the framework via the Residual Network (Res-Net) architecture. It is shown that when the neural feature flow process converges, it reaches a global minimal solution under suitable conditions. Our analysis leads to the first global convergence proof for over-parameterized neural network training with more than $3$ layers in the mean-field regime.", "keywords": "deep neural networks;mean-field theory;global convergence", "primary_area": "", "supplementary_material": "/attachment/6109a0ccafeb6d80879039ca983768040058139a.zip", "author": "Anonymous", "authorids": "ICLR.cc/2021/Conference/Paper1345/Authors", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nanonymous2021modeling,\ntitle={Modeling from Features: a Mean-field Frameworkfor Over-parameterized Deep Neural Networks},\nauthor={Anonymous},\nbooktitle={Submitted to International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-UOq2-C27Yc},\nnote={under review}\n}", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=-UOq2-C27Yc", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0, "gs_citation": 65, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8677521020332226679&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7 }, { "id": "-WwaX9vKKt", "title": "Ask Question with Double Hints: Visual Question Generation with Answer-awareness and Region-reference", "track": "main", "status": "Reject", "tldr": "", "abstract": "The task of visual question generation~(VQG) aims to generate human-like questions from an image and potentially other side information (e.g. answer type or the answer itself). Despite promising results have been achieved, previous works on VQG either i) suffer from one image to many questions mapping problem rendering the failure of generating referential and meaningful questions from an image, or ii) ignore rich correlations among the visual objects in an image and potential interactions between the side information and image. To address these limitations, we first propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference. In particular, we aim to ask the right visual questions with \\emph{Double Hints - textual answers and visual regions of interests}, effectively mitigating the existing one-to-many mapping issue. To this end, we develop a simple methodology to self-learn the visual hints without introducing any additional human annotations. Furthermore, to capture these sophisticated relationships, we propose a new double-hints guided Graph-to-Sequence learning framework that first models them as a dynamic graph and learns the implicit topology end-to-end, and then utilize a graph-to-sequence model to generate the questions with double hints. Our experiments on VQA2.0 and COCO-QA datasets demonstrate that our proposed model on this new setting can significantly outperform existing state-of-the-art baselines by a large margin. ", "keywords": "Semi-supervised Learning;graph neural network;vision and language;question generation", "primary_area": "", "supplementary_material": "", "author": "Kai Shen;Lingfei Wu;Siliang Tang;Fangli Xu;Zhu Zhang;Yu Qiang;Yueting Zhuang", "authorids": "~Kai_Shen2;~Lingfei_Wu1;~Siliang_Tang1;~Fangli_Xu2;~Zhu_Zhang3;~Yu_Qiang1;~Yueting_Zhuang1", "gender": "M;;M;;M;;M", "homepage": ";https://sites.google.com/view/teddy-lfwu/;https://person.zju.edu.cn/en/siliang;https://www.linkedin.com/in/lily-xu-2018/;;http://www.citycloud.com.cn/index;https://person.zju.edu.cn/yzhuang", "dblp": ";27/9060;44/5693;89/10932.html;;;", "google_scholar": "https://scholar.google.com/citations?hl=en;https://scholar.google.com/citations?hl=en;8e7H3PcAAAAJ;TFxZdJ0AAAAJ;https://scholar.google.com.hk/citations?user=cjWy38wAAAAJ;;1RD7UJAAAAAJ", "orcid": ";;0000-0002-7356-9711;;;;", "linkedin": ";;siliang-tang-4734272a/;;;;", "or_profile": "~Kai_Shen2;~Lingfei_Wu1;~Siliang_Tang1;~Fangli_Xu2;~Zhu_Zhang3;~Yu_Qiang1;~Yueting_Zhuang1", "aff": "Zhejiang University;International Business Machines;Zhejiang University;Squirrel AI Learning;Zhejiang University;;Zhejiang University", "aff_domain": "zju.edu.cn;ibm.com;zju.edu.cn;yixue.us;zju.edu.cn;;zju.edu.cn", "position": "PhD student;Research Staff Member;Associate Professor;Machine Learning Engineer;MS student;;Full Professor", "bibtex": "@misc{\nkai2021ask,\ntitle={Ask Question with Double Hints: Visual Question Generation with Answer-awareness and Region-reference},\nauthor={Kai Shen and Lingfei Wu and Siliang Tang and Fangli Xu and Zhu Zhang and Yu Qiang and Yueting Zhuang},\nyear={2021},\nurl={https://openreview.net/forum?id=-WwaX9vKKt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=-WwaX9vKKt", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "5;5;5;3", "wc_review": "482;582;330;366", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "793;1489;646;662", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 440.0, 99.37806599043876 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 897.5, 346.23149770059916 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14743061815823094112&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0;2;0;0", "aff_unique_norm": "Zhejiang University;International Business Machines Corporation;Squirrel Ai Learning", "aff_unique_dep": ";;", "aff_unique_url": "https://www.zju.edu.cn;https://www.ibm.com;https://www.squirrelai.com/", "aff_unique_abbr": "ZJU;IBM;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0;0;0", "aff_country_unique": "China;United States" }, { "id": "-YCAwPdyPKw", "title": "A Bayesian-Symbolic Approach to Learning and Reasoning for Intuitive Physics", "track": "main", "status": "Reject", "tldr": "", "abstract": "Humans are capable of reasoning about physical phenomena by inferring laws of physics from a very limited set of observations. The inferred laws can potentially depend on unobserved properties, such as mass, texture, charge, etc. This sample-efficient physical reasoning is considered a core domain of human common-sense knowledge and hints at the existence of a physics engine in the head. In this paper, we propose a Bayesian symbolic framework for learning sample-efficient models of physical reasoning and prediction, which are of special interests in the field of intuitive physics. In our framework, the environment is represented by a top-down generative model with a collection of entities with some known and unknown properties as latent variables to capture uncertainty. The physics engine depends on physical laws which are modeled as interpretable symbolic expressions and are assumed to be functions of the latent properties of the entities interacting under simple Newtonian physics. As such, learning the laws is then reduced to symbolic regression and Bayesian inference methods are used to obtain the distribution of unobserved properties. These inference and regression steps are performed in an iterative manner following the expectation\u2013maximization algorithm to infer the unknown properties and use them to learn the laws from a very small set of observations. We demonstrate that on three physics learning tasks that compared to the existing methods of learning physics, our proposed framework is more data-efficient, accurate and makes joint reasoning and learning possible.", "keywords": "physics learning;symbolic regression;intuitive physics", "primary_area": "", "supplementary_material": "/attachment/3e115e498039888cf095f6fc305125319d23186d.zip", "author": "Kai Xu;Akash Srivastava;Dan Gutfreund;Felix Sosa;Tomer Ullman;Joshua B. Tenenbaum;Charles Sutton", "authorids": "~Kai_Xu4;~Akash_Srivastava1;~Dan_Gutfreund1;fsosa@fas.harvard.edu;~Tomer_Ullman1;~Joshua_B._Tenenbaum1;~Charles_Sutton1", "gender": "M;M;;;;;M", "homepage": "https://xuk.ai;http://akashgit.github.io;https://researcher.watson.ibm.com/researcher/view.php?person=us-dgutfre;;;;http://homepages.inf.ed.ac.uk/csutton/", "dblp": ";24/9528;g/DanGutfreund;;;t/JoshuaBTenenbaum;59/5879", "google_scholar": "https://scholar.google.ca/citations?user=kf3C60wAAAAJ;https://scholar.google.co.uk/citations?user=2h6SZeEAAAAJ;fRJbyD8AAAAJ;;;;https://scholar.google.co.uk/citations?user=hYtGXD0AAAAJ", "orcid": ";;;;;;0000-0002-0041-3820", "linkedin": ";https://uk.linkedin.com/in/akash-srivastava-aa97361b;;;;;charles-sutton-772aa126", "or_profile": "~Kai_Xu4;~Akash_Srivastava1;~Dan_Gutfreund1;fsosa@fas.harvard.edu;~Tomer_Ullman1;~Joshua_B._Tenenbaum1;~Charles_Sutton1", "aff": "University of Edinburgh;MIT-IBM Watson AI Research Lab;MIT-IBM Watson AI Lab;;;Massachusetts Institute of Technology;University of Edinburgh", "aff_domain": "ed.ac.uk;ibm.com;mit.edu;;;mit.edu;ed.ac.uk", "position": "PhD student;Research Scientist;Principal Researcher;;;Professor;Professor", "bibtex": "@misc{\nxu2021a,\ntitle={A Bayesian-Symbolic Approach to Learning and Reasoning for Intuitive Physics},\nauthor={Kai Xu and Akash Srivastava and Dan Gutfreund and Felix Sosa and Tomer Ullman and Joshua B. Tenenbaum and Charles Sutton},\nyear={2021},\nurl={https://openreview.net/forum?id=-YCAwPdyPKw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=-YCAwPdyPKw", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;4;4;3", "wc_review": "598;458;385;339", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "742;815;608;462", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 445.0, 97.99744894638839 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 656.75, 134.73562075412724 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1604588134589236147&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1;1;0", "aff_unique_norm": "University of Edinburgh;Massachusetts Institute of Technology", "aff_unique_dep": ";MIT-IBM Watson AI Research Lab", "aff_unique_url": "https://www.ed.ac.uk;https://www.mitibmwatsonailab.org", "aff_unique_abbr": "Edinburgh;MIT-IBM AI Lab", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;1;0", "aff_country_unique": "United Kingdom;United States" }, { "title": "A Discriminative Gaussian Mixture Model with Sparsity", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3362", "id": "-_Zp7r2-cGK", "poster": "", "openreview": "https://openreview.net/forum?id=-_Zp7r2-cGK", "slides": "https://iclr.cc/virtual/2021/poster/3362", "video": "https://iclr.cc/virtual/2021/poster/3362", "author_site": "Hideaki Hayashi, Seiichi Uchida", "tldr": "", "abstract": "In probabilistic classification, a discriminative model based on the softmax function has a potential limitation in that it assumes unimodality for each class in the feature space. The mixture model can address this issue, although it leads to an increase in the number of parameters. We propose a sparse classifier based on a discriminative GMM, referred to as a sparse discriminative Gaussian mixture (SDGM). In the SDGM, a GMM-based discriminative model is trained via sparse Bayesian learning. Using this sparse learning framework, we can simultaneously remove redundant Gaussian components and reduce the number of parameters used in the remaining components during learning; this learning method reduces the model complexity, thereby improving the generalization capability. Furthermore, the SDGM can be embedded into neural networks (NNs), such as convolutional NNs, and can be trained in an end-to-end manner. Experimental results demonstrated that the proposed method outperformed the existing softmax-based discriminative models.", "keywords": "classification;sparse Bayesian learning;Gaussian mixture model", "primary_area": "", "supplementary_material": "/attachment/0e59ec0bba47da92928b305863b46e1171f70187.zip", "author": "Hideaki Hayashi;Seiichi Uchida", "authorids": "~Hideaki_Hayashi1;~Seiichi_Uchida1", "gender": "M;M", "homepage": "https://sites.google.com/view/hideakihayashi/home;", "dblp": "40/11365;07/2381", "google_scholar": "https://scholar.google.co.jp/citations?user=XwYPKOYAAAAJ;https://scholar.google.co.jp/citations?user=QMpdhysAAAAJ", "orcid": "0000-0002-4800-1761;0000-0001-8592-7566", "linkedin": ";", "or_profile": "~Hideaki_Hayashi1;~Seiichi_Uchida1", "aff": "Kyushu University;Kyushu University", "aff_domain": "kyushu-u.ac.jp;kyushu-u.ac.jp", "position": "Assistant Professor;Full Professor", "bibtex": "@inproceedings{\nhayashi2021a,\ntitle={A Discriminative Gaussian Mixture Model with Sparsity},\nauthor={Hideaki Hayashi and Seiichi Uchida},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-_Zp7r2-cGK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "3;4;4;5", "wc_review": "75;405;225;175", "wc_reply_reviewers": "0;108;43;5", "wc_reply_authors": "141;1220;931;332", "reply_reviewers": "0;1;1;1", "reply_authors": "1;3;2;2", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 220.0, 119.68709203585824 ], "wc_reply_reviewers_avg": [ 39.0, 43.16827538829875 ], "wc_reply_authors_avg": [ 656.0, 437.01315769665337 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.9486832980505139, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4560865937658175043&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=-_Zp7r2-cGK", "email": "kyushu-u.ac.jp;kyushu-u.ac.jp", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Kyushu University", "aff_unique_dep": "", "aff_unique_url": "https://www.kyushu-u.ac.jp", "aff_unique_abbr": "Kyushu U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "id": "-aThAo4b1zn", "title": "A Theory of Self-Supervised Framework for Few-Shot Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recently, self-supervised learning (SSL) algorithms have been applied to Few-shot learning(FSL). FSL aims at distilling transferable knowledge on existing classes with large-scale labeled data to cope with novel classes for which only a few labeled data are available. Due to the limited number of novel classes, the initial embedding network becomes an essential component and can largely affect the performance in practice. But almost no one analyzes why a pre-trained embedding network with self-supervised training can provide representation for downstream FSL tasks in theory. In this paper, we first summarized the supervised FSL methods and explained why SSL is suitable for FSL. Then we further analyzed the main difference between supervised training and self-supervised training on FSL and obtained the bound for the gap between self-supervised loss and supervised loss. Finally, we proposed potential ways to improve the test accuracy under the setting of self-supervised FSL. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Zhong Cao;Jiang Lu;Jian Liang;Changshui Zhang", "authorids": "~Zhong_Cao1;~Jiang_Lu1;~Jian_Liang3;~Changshui_Zhang2", "gender": "M;M;M;M", "homepage": ";;;http://bigeye.au.tsinghua.edu.cn/english/Introduction.html", "dblp": ";https://dblp.org/pers/hd/l/Lu:Jiang;19/2208;z/ChangshuiZhang", "google_scholar": ";;mrunnpoAAAAJ;GL9M37YAAAAJ", "orcid": "0000-0001-8766-9652;;;", "linkedin": ";;;", "or_profile": "~Zhong_Cao1;~Jiang_Lu1;~Jian_Liang3;~Changshui_Zhang2", "aff": "Tsinghua University;Tsinghua University;Alibaba Group;Tsinghua University", "aff_domain": "tsinghua.edu.cn;tsinghua.edu.cn;alibaba-inc.com;mail.tsinghua.edu.cn", "position": "PhD student;PhD student;Senior Algorithm Engineer;Full Professor", "bibtex": "@misc{\ncao2021a,\ntitle={A Theory of Self-Supervised Framework for Few-Shot Learning},\nauthor={Zhong Cao and Jiang Lu and Jian Liang and Changshui Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=-aThAo4b1zn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer5;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=-aThAo4b1zn", "pdf_size": 0, "rating": "2;2;3;4;4", "confidence": "3;4;3;3;4", "wc_review": "224;378;452;439;283", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 3.0, 0.8944271909999159 ], "confidence_avg": [ 3.4, 0.4898979485566356 ], "wc_review_avg": [ 355.2, 88.7003945876229 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:spHpgqphJnwJ:scholar.google.com/&scioq=A+Theory+of+Self-Supervised+Framework+for+Few-Shot+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "Tsinghua University;Alibaba Group", "aff_unique_dep": ";", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.alibaba.com", "aff_unique_abbr": "THU;Alibaba", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "title": "Self-supervised Learning from a Multi-view Perspective", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2613", "id": "-bdp_8Itjwp", "poster": "", "openreview": "https://openreview.net/forum?id=-bdp_8Itjwp", "slides": "https://iclr.cc/virtual/2021/poster/2613", "video": "https://iclr.cc/virtual/2021/poster/2613", "author_site": "Yao-Hung Hubert Tsai, Yue Wu, Ruslan Salakhutdinov, Louis-Philippe Morency", "tldr": "", "abstract": "As a subset of unsupervised representation learning, self-supervised representation learning adopts self-defined signals as supervision and uses the learned representation for downstream tasks, such as object detection and image captioning. Many proposed approaches for self-supervised learning follow naturally a multi-view perspective, where the input (e.g., original images) and the self-supervised signals (e.g., augmented images) can be seen as two redundant views of the data. Building from this multi-view perspective, this paper provides an information-theoretical framework to better understand the properties that encourage successful self-supervised learning. Specifically, we demonstrate that self-supervised learned representations can extract task-relevant information and discard task-irrelevant information. Our theoretical framework paves the way to a larger space of self-supervised learning objective design. In particular, we propose a composite objective that bridges the gap between prior contrastive and predictive learning objectives, and introduce an additional objective term to discard task-irrelevant information. To verify our analysis, we conduct controlled experiments to evaluate the impact of the composite objectives. We also explore our framework's empirical generalization beyond the multi-view perspective, where the cross-view redundancy may not be clearly observed.", "keywords": "Self-supervised Learning;Unsupervised Learning;Multi-view Representation Learning", "primary_area": "", "supplementary_material": "/attachment/a04b2cce9c70e5f35626f469c6e3e0b7072335dc.zip", "author": "Yao-Hung Hubert Tsai;Yue Wu;Ruslan Salakhutdinov;Louis-Philippe Morency", "authorids": "~Yao-Hung_Hubert_Tsai1;~Yue_Wu17;~Ruslan_Salakhutdinov1;~Louis-Philippe_Morency1", "gender": "M;M;M;M", "homepage": ";https://www.yuewu.ml;https://www.cs.cmu.edu/~morency/;https://www.cs.cmu.edu/~rsalakhu/", "dblp": "154/3702;41/5979;31/739;", "google_scholar": ";LcrSIhgAAAAJ;https://scholar.google.com.tw/citations?user=APgaFK0AAAAJ;", "orcid": ";;0000-0001-6376-7696;", "linkedin": ";;morency?challengeId=AQELGK_OvMa0vwAAAY72L-VV4X9hW8juuY80VHVeeSGHZ1PJHeeEa5LTFoeTmDGU0t1OL07MXJTYC9EAi6qgPDd2z9ztnbdFYA&submissionId=09a0ff34-04ac-c717-bef7-8c9c8811b463&challengeSource=AgFhxWkU3q7v4wAAAY72L-1xRE0eG-BnZUNE9e3eAG95pgOCZ9u1nxEg-1dK2Dw&challegeType=AgHMzV0lqKgEFwAAAY72L-11X6DHMd3V_A3Iur8XZeyYF2-oBzoufs8&memberId=AgH4yz7pZ_riCgAAAY72L-146jmR2pdr3dmhy2icxBtEQzQ&recognizeDevice=AgFDCNyrhKiFSAAAAY72L-16m7z2EH2t0ueWmMKjyk1_ZJAkfFVe;", "or_profile": "~Yao-Hung_Hubert_Tsai1;~Yue_Wu17;~Louis-Philippe_Morency1;~Russ_Salakhutdinov1", "aff": "Carnegie Mellon University;Apple;Carnegie Mellon University;School of Computer Science, Carnegie Mellon University", "aff_domain": "cmu.edu;apple.com;cmu.edu;cs.cmu.edu", "position": "PhD student;Intern;Associate Professor;Full Professor", "bibtex": "@inproceedings{\ntsai2021selfsupervised,\ntitle={Self-supervised Learning from a Multi-view Perspective},\nauthor={Yao-Hung Hubert Tsai and Yue Wu and Ruslan Salakhutdinov and Louis-Philippe Morency},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-bdp_8Itjwp}\n}", "github": "[![github](/images/github_icon.svg) yaohungt/Demystifying_Self_Supervised_Learning](https://github.com/yaohungt/Demystifying_Self_Supervised_Learning)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "3;4;4;5", "wc_review": "337;187;152;382", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "471;522;561;977", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 264.5, 97.11462299777516 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 632.75, 201.2987518590217 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 235, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12546454131517763029&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=-bdp_8Itjwp", "email": "cmu.edu;apple.com;cmu.edu;cs.cmu.edu", "author_num": 4, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Carnegie Mellon University;Apple", "aff_unique_dep": ";Apple Inc.", "aff_unique_url": "https://www.cmu.edu;https://www.apple.com", "aff_unique_abbr": "CMU;Apple", "aff_campus_unique_index": "1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Calibration tests beyond classification", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2682", "id": "-bxf89v3Nx", "poster": "", "openreview": "https://openreview.net/forum?id=-bxf89v3Nx", "slides": "https://iclr.cc/virtual/2021/poster/2682", "video": "https://iclr.cc/virtual/2021/poster/2682", "author_site": "David Widmann, Fredrik Lindsten, Dave Zachariah", "tldr": "", "abstract": "Most supervised machine learning tasks are subject to irreducible prediction errors. Probabilistic predictive models address this limitation by providing probability distributions that represent a belief over plausible targets, rather than point estimates. Such models can be a valuable tool in decision-making under uncertainty, provided that the model output is meaningful and interpretable. Calibrated models guarantee that the probabilistic predictions are neither over- nor under-confident. In the machine learning literature, different measures and statistical tests have been proposed and studied for evaluating the calibration of classification models. For regression problems, however, research has been focused on a weaker condition of calibration based on predicted quantiles for real-valued targets. In this paper, we propose the first framework that unifies calibration evaluation and tests for general probabilistic predictive models. It applies to any such model, including classification and regression models of arbitrary dimension. Furthermore, the framework generalizes existing measures and provides a more intuitive reformulation of a recently proposed framework for calibration in multi-class classification. In particular, we reformulate and generalize the kernel calibration error, its estimators, and hypothesis tests using scalar-valued kernels, and evaluate the calibration of real-valued regression\nproblems.", "keywords": "calibration;uncertainty quantification;framework;integral probability metric;maximum mean discrepancy", "primary_area": "", "supplementary_material": "", "author": "David Widmann;Fredrik Lindsten;Dave Zachariah", "authorids": "~David_Widmann1;fredrik.lindsten@liu.se;dave.zachariah@it.uu.se", "gender": ";;", "homepage": ";;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": ";;", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "@inproceedings{\nwidmann2021calibration,\ntitle={Calibration tests beyond classification},\nauthor={David Widmann and Fredrik Lindsten and Dave Zachariah},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-bxf89v3Nx}\n}", "github": "[![github](/images/github_icon.svg) devmotion/calibration_iclr2021](https://github.com/devmotion/calibration_iclr2021)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "5;7;9", "confidence": "3;4;4", "wc_review": "499;333;249", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "886;299;171", "reply_reviewers": "0;0;0", "reply_authors": "2;1;1", "rating_avg": [ 7.0, 1.632993161855452 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 360.3333333333333, 103.87599444636966 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 452.0, 311.30156868648555 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 29, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7019919403601581708&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=-bxf89v3Nx", "email": ";;", "author_num": 3 }, { "id": "-csYGiUuGlt", "title": "Convergent Adaptive Gradient Methods in Decentralized Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks, in the past few years. Meanwhile, given the need for distributed training procedures, distributed optimization algorithms are at the center of attention. With the growth of computing power and the need for using machine learning models on mobile devices, the communication cost of distributed training algorithms needs careful consideration. In that regard, more and more attention is shifted from the traditional parameter server training paradigm to the decentralized one, which usually requires lower communication costs. In this paper, we rigorously incorporate adaptive gradient methods into decentralized training procedures and introduce novel convergent decentralized adaptive gradient methods. Specifically, we propose a general algorithmic framework that can convert existing adaptive gradient methods to their decentralized counterparts. In addition, we thoroughly analyze the convergence behavior of the proposed algorithmic framework and show that if a given adaptive gradient method converges, under some specific conditions, then its decentralized counterpart is also convergent. ", "keywords": "Adam;decentralized optimization;adaptive gradient methods", "primary_area": "", "supplementary_material": "", "author": "Xiangyi Chen;Belhal Karimi;Weijie Zhao;Ping Li", "authorids": "~Xiangyi_Chen1;~Belhal_Karimi1;weijiezhao@baidu.com;~Ping_Li3", "gender": "M;M;;M", "homepage": ";http://belhalk.github.io;;http://www.stat.rutgers.edu/home/pingli/", "dblp": "02/445;;;62/5860-1", "google_scholar": "M0ki5ZgAAAAJ;https://scholar.google.fr/citations?user=Xh_OIWkAAAAJ;;", "orcid": ";;;", "linkedin": ";belhal-karimi-2baa71a5/;;", "or_profile": "~Xiangyi_Chen1;~Belhal_Karimi1;weijiezhao@baidu.com;~Ping_Li3", "aff": "University of Minnesota, Minneapolis;Baidu Research;;Rutgers University", "aff_domain": "umn.edu;baidu.com;;", "position": "PhD student;Postdoc;;Associate Professor", "bibtex": "@misc{\nchen2021convergent,\ntitle={Convergent Adaptive Gradient Methods in Decentralized Optimization},\nauthor={Xiangyi Chen and Belhal Karimi and Weijie Zhao and Ping Li},\nyear={2021},\nurl={https://openreview.net/forum?id=-csYGiUuGlt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=-csYGiUuGlt", "pdf_size": 0, "rating": "3;3;4;7;8", "confidence": "3;5;5;1;4", "wc_review": "137;185;458;216;410", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "187;659;992;167;561", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;2;3;1;2", "rating_avg": [ 5.0, 2.0976176963403033 ], "confidence_avg": [ 3.6, 1.4966629547095764 ], "wc_review_avg": [ 281.2, 128.17706503115136 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 513.2, 309.5328092464513 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.8, 0.7483314773547883 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.4459412925079223, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8744485105157016795&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Minnesota;Baidu;Rutgers University", "aff_unique_dep": ";Baidu Research;", "aff_unique_url": "https://www.minnesota.edu;https://research.baidu.com;https://www.rutgers.edu", "aff_unique_abbr": "UMN;Baidu;Rutgers", "aff_campus_unique_index": "0", "aff_campus_unique": "Minneapolis;", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;China" }, { "id": "-gabSeMKO4H", "title": "Translation Memory Guided Neural Machine Translation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many studies have proven that Translation Memory (TM) can help improve the translation quality of neural machine translation (NMT). Existing ways either employ extra encoder to encode information from TM or concatenate source sentence and TM sentences as encoder's input. These previous methods don't model the semantic relationship between the source sentence and TM sentences. Meanwhile, the training corpus related to TM is limited, and the sentence level retrieval approach further limits its scale. \nIn this paper, we propose a novel method to combine the strengths of both TM and NMT. We treat the matched sentence pair of TM as the additional signal and apply one encoder enhanced by the pre-trained language model (PLM) to encode the TM information and source sentence together. Additionally, we extend the sentence level retrieval method to the n-gram retrieval method that we don't need to calculate the similarity score. Further, we explore new methods to manipulate the information flow from TM to the NMT decoder. We validate our proposed methods on a mixed test set of multiple domains. Experiment results demonstrate that the proposed methods can significantly improve the translation quality and show strong adaptation for an unknown or new domain.", "keywords": "neural machine translation;translation memory;pre-train language model", "primary_area": "", "supplementary_material": "", "author": "Shaohui Kuang;Heng Yu;Weihua Luo;Qiang Wang", "authorids": "~Shaohui_Kuang1;~Heng_Yu1;weihua.luowh@alibaba-inc.com;~Qiang_Wang8", "gender": "M;M;;M", "homepage": ";http://nlp.ict.ac.cn/~hengyu/;;https://wangqiangneu.github.io/", "dblp": "190/7978;;;", "google_scholar": "https://scholar.google.com/citations?hl=zh-CN;aYAMnucAAAAJ;;gDCxDEsAAAAJ", "orcid": ";0000-0003-4258-1029;;", "linkedin": ";heng-yu-5a668963/;;", "or_profile": "~Shaohui_Kuang1;~Heng_Yu1;weihua.luowh@alibaba-inc.com;~Qiang_Wang8", "aff": "ByteDance Inc.;;;Alibaba Group", "aff_domain": "bytedance.com;;;alibaba-inc.com", "position": "Engineer;;;Algorithm Engineer", "bibtex": "@misc{\nkuang2021translation,\ntitle={Translation Memory Guided Neural Machine Translation},\nauthor={Shaohui Kuang and Heng Yu and Weihua Luo and Qiang Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=-gabSeMKO4H}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=-gabSeMKO4H", "pdf_size": 0, "rating": "2;4;4;4", "confidence": "5;5;4;4", "wc_review": "811;493;821;315", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.8660254037844386 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 610.0, 215.42748199800323 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18141233474822363128&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "ByteDance;Alibaba Group", "aff_unique_dep": ";", "aff_unique_url": "https://www.bytedance.com;https://www.alibaba.com", "aff_unique_abbr": "ByteDance;Alibaba", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "title": "Learning advanced mathematical computations from examples", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3219", "id": "-gfhS00XfKj", "poster": "", "openreview": "https://openreview.net/forum?id=-gfhS00XfKj", "slides": "https://iclr.cc/virtual/2021/poster/3219", "video": "https://iclr.cc/virtual/2021/poster/3219", "author_site": "Fran\u00e7ois Charton, Amaury Hayat, Guillaume Lample", "tldr": "", "abstract": "Using transformers over large generated datasets, we train models to learn mathematical properties of differential systems, such as local stability, behavior at infinity and controllability. We achieve near perfect prediction of qualitative characteristics, and good approximations of numerical features of the system. This demonstrates that neural networks can learn to perform complex computations, grounded in advanced theory, from examples, without built-in mathematical knowledge.", "keywords": "differential equations;computation;transformers;deep learning", "primary_area": "", "supplementary_material": "/attachment/b277aa4884ed4891fabcd87995d8c8edf5db106e.zip", "author": "Francois Charton;Amaury Hayat;Guillaume Lample", "authorids": "~Francois_Charton1;amaury.hayat@enpc.fr;~Guillaume_Lample1", "gender": "M;;M", "homepage": ";;", "dblp": "255/5318;;", "google_scholar": ";;H7sVDmIAAAAJ", "orcid": ";;", "linkedin": "fran%C3%A7ois-charton-214187120/;;", "or_profile": "~Francois_Charton1;amaury.hayat@enpc.fr;~Guillaume_Lample1", "aff": "Meta Facebook;;Meta Facebook", "aff_domain": "fb.com;;fb.com", "position": "Research Engineer;;Researcher", "bibtex": "@inproceedings{\ncharton2021learning,\ntitle={Learning advanced mathematical computations from examples},\nauthor={Francois Charton and Amaury Hayat and Guillaume Lample},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-gfhS00XfKj}\n}", "github": "[![github](/images/github_icon.svg) facebookresearch/MathsFromExamples](https://github.com/facebookresearch/MathsFromExamples)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "3;6;7;8", "confidence": "4;3;4;4", "wc_review": "254;236;704;623", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1707;1597;2995;2439", "reply_reviewers": "0;0;0;0", "reply_authors": "3;3;6;5", "rating_avg": [ 6.0, 1.8708286933869707 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 454.25, 211.29644459857815 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 2184.5, 568.9558418717572 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 4.25, 1.299038105676658 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 40, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8069536277199398832&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=-gfhS00XfKj", "email": "fb.com;;fb.com", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta Platforms, Inc.", "aff_unique_url": "https://meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "-kfLEqppEm_", "title": "Convex Regularization in Monte-Carlo Tree Search", "track": "main", "status": "Reject", "tldr": "", "abstract": "Monte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making. The recent AlphaGo and AlphaZero algorithms have shown how to successfully combine these two paradigms to solve large scale sequential decision problems. These methodologies exploit a variant of the well-known UCT algorithm to trade off the exploitation of good actions and the exploration of unvisited states, but their empirical success comes at the cost of poor sample-efficiency and high computation time. In this paper, we overcome these limitations by studying the benefit of convex regularization in Monte-Carlo Tree Search (MCTS) to drive exploration efficiently and to improve policy updates, as already observed in RL. First, we introduce a unifying theory on the use of generic convex regularizers in MCTS, deriving the first regret analysis of regularized MCTS and showing that it guarantees an exponential convergence rate. Second, we exploit our theoretical framework to introduce novel regularized backup operators for MCTS, based on the relative entropy of the policy update and on the Tsallis entropy of the policy. We provide an intuitive demonstration of the effect of each regularizer empirically verifying the consequence of our theoretical results on a toy problem. Finally, we show how our framework can easily be incorporated in AlphaGo and AlphaZero, and we empirically show the superiority of convex regularization w.r.t. representative baselines, on well-known RL problems across several Atari games.", "keywords": "Monte-Carlo Tree Search;Entropy regularization;Reinforcement Learning", "primary_area": "", "supplementary_material": "/attachment/1e17346b0e1d37a28fa0460283e9873416b9c2cd.zip", "author": "Tuan Quang Dam;Carlo D'Eramo;Jan Peters;Joni Pajarinen", "authorids": "~Tuan_Quang_Dam1;~Carlo_D'Eramo2;~Jan_Peters3;~Joni_Pajarinen2", "gender": "M;M;M;", "homepage": "https://tuanquangdam.com/;https://carloderamo.wixsite.com/home;https://www.jan-peters.net;", "dblp": "252/5881.html;182/8953;p/JanPeters1;23/8355", "google_scholar": "https://scholar.google.com/citations?hl=en;https://scholar.google.it/citations?user=1Rt_86gAAAAJ;https://scholar.google.de/citations?user=-kIVAcAAAAAJ;https://scholar.google.fi/citations?user=-2fJStwAAAAJ", "orcid": ";0000-0003-2712-118X;0000-0002-5266-8091;0000-0003-4469-8191", "linkedin": ";carlo-d-eramo-6438a289/;janrpeters/;", "or_profile": "~Tuan_Quang_Dam1;~Carlo_D'Eramo2;~Jan_Peters3;~Joni_Pajarinen2", "aff": ";TU Darmstadt;Max Planck Institute for Intelligent Systems;Technische Universit\u00e4t Darmstadt", "aff_domain": ";tu-darmstadt.de;tue.mpg.de;tu-darmstadt.de", "position": ";Postdoc;Researcher;Researcher", "bibtex": "@misc{\ndam2021convex,\ntitle={Convex Regularization in Monte-Carlo Tree Search},\nauthor={Tuan Quang Dam and Carlo D'Eramo and Jan Peters and Joni Pajarinen},\nyear={2021},\nurl={https://openreview.net/forum?id=-kfLEqppEm_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=-kfLEqppEm_", "pdf_size": 0, "rating": "4;5;5;8", "confidence": "1;3;4;4", "wc_review": "498;378;288;360", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "566;727;349;238", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 3.0, 1.224744871391589 ], "wc_review_avg": [ 381.0, 75.47847375245475 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 470.0, 189.55869803308948 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.6804138174397716, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=466083474712545091&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 11, "aff_unique_index": "0;1;0", "aff_unique_norm": "Technische Universit\u00e4t Darmstadt;Max Planck Institute for Intelligent Systems", "aff_unique_dep": ";Intelligent Systems", "aff_unique_url": "https://www.tu-darmstadt.de;https://www.mpi-is.mpg.de", "aff_unique_abbr": "TU Darmstadt;MPI-IS", "aff_campus_unique_index": "0", "aff_campus_unique": "Darmstadt;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Germany" }, { "id": "-kigPjfTIGd", "title": "SSW-GAN: Scalable Stage-wise Training of Video GANs", "track": "main", "status": "Reject", "tldr": "", "abstract": "Current state-of-the-art generative models for videos have high computational requirements that impede high resolution generations beyond a few frames. In this work we propose a stage-wise strategy to train Generative Adversarial Networks (GANs) for videos. We decompose the generative process to first produce a downsampled video that is then spatially upscaled and temporally interpolated by subsequent stages. Upsampling stages are applied locally on temporal chunks of previous outputs to manage the computational complexity. Stages are defined as Generative Adversarial Networks, which are trained sequentially and independently. We validate our approach on Kinetics-600 and BDD100K, for which we train a three stage model capable of generating 128x128 videos with 100 frames.", "keywords": "video generation;GANs;scalable methods", "primary_area": "", "supplementary_material": "/attachment/2445a28c935bb0cfe8ce6d2cb88aaafea748da61.zip", "author": "Lluis Castrejon;Nicolas Ballas;Aaron Courville", "authorids": "~Lluis_Castrejon1;~Nicolas_Ballas1;~Aaron_Courville3", "gender": ";;", "homepage": ";;", "dblp": "183/6532;120/9066;56/1688", "google_scholar": "https://scholar.google.ca/citations?user=XWhajuQAAAAJ;euUV4iUAAAAJ;https://scholar.google.ca/citations?user=km6CP8cAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Lluis_Castrejon1;~Nicolas_Ballas1;~Aaron_Courville3", "aff": "Meta Facebook;Meta;Universit\u00e9 de Montr\u00e9al", "aff_domain": "facebook.com;meta.com; ", "position": "PhD student;Researcher;Assistant Professor", "bibtex": "@misc{\ncastrejon2021sswgan,\ntitle={{\\{}SSW{\\}}-{\\{}GAN{\\}}: Scalable Stage-wise Training of Video {\\{}GAN{\\}}s},\nauthor={Lluis Castrejon and Nicolas Ballas and Aaron Courville},\nyear={2021},\nurl={https://openreview.net/forum?id=-kigPjfTIGd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer5", "site": "https://openreview.net/forum?id=-kigPjfTIGd", "pdf_size": 0, "rating": "3;3;6;6;7", "confidence": "4;3;5;3;5", "wc_review": "133;309;409;356;982", "wc_reply_reviewers": "0;0;0;659;0", "wc_reply_authors": "306;435;821;764;735", "reply_reviewers": "0;0;0;2;0", "reply_authors": "1;1;2;2;2", "rating_avg": [ 5.0, 1.6733200530681511 ], "confidence_avg": [ 4.0, 0.8944271909999159 ], "wc_review_avg": [ 437.8, 287.4601885479101 ], "wc_reply_reviewers_avg": [ 131.8, 263.6 ], "wc_reply_authors_avg": [ 612.2, 203.41032422175627 ], "reply_reviewers_avg": [ 0.4, 0.8 ], "reply_authors_avg": [ 1.6, 0.4898979485566356 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5345224838248488, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:7rRPp2HltpUJ:scholar.google.com/&scioq=SSW-GAN:+Scalable+Stage-wise+Training+of+Video+GANs&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "Meta;Universit\u00e9 de Montr\u00e9al", "aff_unique_dep": "Meta Platforms, Inc.;", "aff_unique_url": "https://meta.com;https://www.umontreal.ca", "aff_unique_abbr": "Meta;UdeM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;Canada" }, { "title": "Isometric Propagation Network for Generalized Zero-shot Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2709", "id": "-mWcQVLPSPy", "poster": "", "openreview": "https://openreview.net/forum?id=-mWcQVLPSPy", "slides": "https://iclr.cc/virtual/2021/poster/2709", "video": "https://iclr.cc/virtual/2021/poster/2709", "author_site": "Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Xuanyi Dong, Chengqi Zhang", "tldr": "", "abstract": "Zero-shot learning (ZSL) aims to classify images of an unseen class only based on a few attributes describing that class but no access to any training sample. A popular strategy is to learn a mapping between the semantic space of class attributes and the visual space of images based on the seen classes and their data. Thus, an unseen class image can be ideally mapped to its corresponding class attributes. The key challenge is how to align the representations in the two spaces. For most ZSL settings, the attributes for each seen/unseen class are only represented by a vector while the seen-class data provide much more information. Thus, the imbalanced supervision from the semantic and the visual space can make the learned mapping easily overfitting to the seen classes. To resolve this problem, we propose Isometric Propagation Network (IPN), which learns to strengthen the relation between classes within each space and align the class dependency in the two spaces. Specifically, IPN learns to propagate the class representations on an auto-generated graph within each space. In contrast to only aligning the resulted static representation, we regularize the two dynamic propagation procedures to be isometric in terms of the two graphs' edge weights per step by minimizing a consistency loss between them. IPN achieves state-of-the-art performance on three popular ZSL benchmarks. To evaluate the generalization capability of IPN, we further build two larger benchmarks with more diverse unseen classes and demonstrate the advantages of IPN on them.", "keywords": "Zero-shot learning;isometric;prototype propagation;alignment of semantic and visual space", "primary_area": "", "supplementary_material": "", "author": "Lu Liu;Tianyi Zhou;Guodong Long;Jing Jiang;Xuanyi Dong;Chengqi Zhang", "authorids": "~Lu_Liu7;~Tianyi_Zhou1;~Guodong_Long2;~Jing_Jiang6;~Xuanyi_Dong1;~Chengqi_Zhang1", "gender": "M;M;F;M;M;F", "homepage": "https://tianyizhou.github.io/;https://www.uts.edu.au/staff/guodong.long;https://www.uts.edu.au/staff/jing.jiang;https://xuanyidong.com/;https://research.polyu.edu.hk/en/persons/chengqi-zhang;https://liulu112601.github.io/", "dblp": "88/8205-1;34/10089;68/1974-2;198/1522;71/964;", "google_scholar": "OKvgizMAAAAJ;https://scholar.google.com.au/citations?user=Pl8m7hMAAAAJ;https://scholar.google.com.au/citations?hl=en;7zp9arUAAAAJ;https://scholar.google.com.au/citations?user=B6lBmqEAAAAJ;epMGJ28AAAAJ", "orcid": "0000-0001-5348-0632;0000-0003-3740-9515;;0000-0001-9272-1590;0000-0001-5715-7154;", "linkedin": "tianyizhou;;;;chengqi-zhang-55aa8910/;lu-liu-2b5b93187/", "or_profile": "~Tianyi_Zhou1;~Guodong_Long2;~Jing_Jiang6;~Xuanyi_Dong1;~Chengqi_Zhang1;~Lu_Liu4", "aff": "University of Washington, Seattle;University of Technology Sydney;University of Technology Sydney;University of Technology Sydney;University of Technology Sydney;University of Technology Sydney", "aff_domain": "uw.edu;uts.edu.au;uts.edu.au;uts.edu.au;uts.edu.au;uts.edu.au", "position": "PhD student;Associate Professor;Lecturer;PhD student;Full Professor;PhD student", "bibtex": "@inproceedings{\nliu2021isometric,\ntitle={Isometric Propagation Network for Generalized Zero-shot Learning},\nauthor={Lu Liu and Tianyi Zhou and Guodong Long and Jing Jiang and Xuanyi Dong and Chengqi Zhang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-mWcQVLPSPy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "4;3;4;5", "wc_review": "119;748;393;462", "wc_reply_reviewers": "0;96;0;0", "wc_reply_authors": "252;569;315;329", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 430.5, 223.73924555160187 ], "wc_reply_reviewers_avg": [ 24.0, 41.569219381653056 ], "wc_reply_authors_avg": [ 366.25, 120.59721182514959 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.28867513459481287, "gs_citation": 49, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4176098347129897651&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=-mWcQVLPSPy", "email": "uw.edu;uts.edu.au;uts.edu.au;uts.edu.au;uts.edu.au;uts.edu.au", "author_num": 6, "aff_unique_index": "0;1;1;1;1;1", "aff_unique_norm": "University of Washington;University of Technology Sydney", "aff_unique_dep": ";", "aff_unique_url": "https://www.washington.edu;https://www.uts.edu.au", "aff_unique_abbr": "UW;UTS", "aff_campus_unique_index": "0", "aff_campus_unique": "Seattle;", "aff_country_unique_index": "0;1;1;1;1;1", "aff_country_unique": "United States;Australia" }, { "id": "-oeKiM9lD9h", "title": "Rethinking Convolution: Towards an Optimal Efficiency", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we present our recent research about the computational efficiency in convolution. Convolution operation is the most critical component in recent surge of deep learning research. Conventional 2D convolution takes $O(C^{2}K^{2}HW)$ to calculate, where $C$ is the channel size, $K$ is the kernel size, while $H$ and $W$ are the output height and width. Such computation has become really costly considering that these parameters increased over the past few years to meet the needs of demanding applications. Among various implementation of the convolution, separable convolution has been proven to be more efficient in reducing the computational demand. For example, depth separable convolution reduces the complexity to $O(CHW\\cdot(C+K^{2}))$ while spatial separable convolution reduces the complexity to $O(C^{2}KHW)$. However, these are considered an ad hoc design which cannot ensure that they can in general achieve optimal separation. In this research, we propose a novel operator called \\emph{optimal separable convolution} which can be calculated at $O(C^{\\frac{3}{2}}KHW)$ by optimal design for the internal number of groups and kernel sizes for general separable convolutions. When there is no restriction in the number of separated convolutions, an even lower complexity at $O(CHW\\cdot\\log(CK^{2}))$ can be achieved. Experimental results demonstrate that the proposed optimal separable convolution is able to achieve an improved accuracy-FLOPs and accuracy-#Params trade-offs over both conventional and depth/spatial separable convolutions.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/c78dde97684e99e9811273eb43fb1950754abf37.zip", "author": "Tao Wei;Yonghong Tian;Chang Wen Chen", "authorids": "~Tao_Wei1;~Yonghong_Tian1;~Chang_Wen_Chen1", "gender": "M;M;M", "homepage": ";http://www.pkuml.org;https://chenlab.comp.polyu.edu.hk/", "dblp": ";86/5857;29/4638", "google_scholar": ";https://scholar.google.com/citations?hl=en;w2HXPUUAAAAJ", "orcid": ";0000-0002-2978-5935;0000-0002-6720-234X", "linkedin": ";;chang-wen-chen-7b72095/", "or_profile": "~Tao_Wei1;~Yonghong_Tian1;~Chang_Wen_Chen1", "aff": ";Peking University;State University of New York, Buffalo", "aff_domain": ";pku.edu.cn;buffalo.edu", "position": ";Full Professor;Emeritus", "bibtex": "@misc{\nwei2021rethinking,\ntitle={Rethinking Convolution: Towards an Optimal Efficiency},\nauthor={Tao Wei and Yonghong Tian and Chang Wen Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=-oeKiM9lD9h}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=-oeKiM9lD9h", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;3;3;4", "wc_review": "420;364;241;492", "wc_reply_reviewers": "271;0;0;0", "wc_reply_authors": "2234;685;173;177", "reply_reviewers": "1;0;0;0", "reply_authors": "4;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 379.25, 91.81332964226927 ], "wc_reply_reviewers_avg": [ 67.75, 117.34644221279143 ], "wc_reply_authors_avg": [ 817.25, 844.0451335681049 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=456622035050179067&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Peking University;State University of New York at Buffalo", "aff_unique_dep": ";", "aff_unique_url": "http://www.pku.edu.cn;https://www.buffalo.edu", "aff_unique_abbr": "Peking U;SUNY Buffalo", "aff_campus_unique_index": "1", "aff_campus_unique": ";Buffalo", "aff_country_unique_index": "0;1", "aff_country_unique": "China;United States" }, { "id": "-p6rexF3qdQ", "title": "Learn Robust Features via Orthogonal Multi-Path", "track": "main", "status": "Reject", "tldr": "", "abstract": "\tIt is now widely known that by adversarial attacks, clean images with invisible perturbations can fool deep neural networks.\n\tTo defend adversarial attacks, we design a block containing multiple paths to learn robust features and the parameters of these paths are required to be orthogonal with each other. \n\tThe so-called Orthogonal Multi-Path (OMP) block could be posed in any layer of a neural network. \n\tVia forward learning and backward correction, one OMP block makes the neural networks learn features that are appropriate for all the paths and hence are expected to be robust. With careful design and thorough experiments on e.g., the positions of imposing orthogonality constraint, and the trade-off between the variety and accuracy, \n\tthe robustness of the neural networks is significantly improved. \n\tFor example, under white-box PGD attack with $l_\\infty$ bound ${8}/{255}$ (this is a fierce attack that can make the accuracy of many vanilla neural networks drop to nearly $10\\%$ on CIFAR10), VGG16 with the proposed OMP block could keep over $50\\%$ accuracy. For black-box attacks, neural networks equipped with an OMP block have accuracy over $80\\%$. The performance under both white-box and black-box attacks is much better than the existing state-of-the-art adversarial defenders. ", "keywords": "adversarial robustness;orthogonal multi-path", "primary_area": "", "supplementary_material": "", "author": "Kun Fang;Xiaolin Huang;Yingwen Wu;Tao Li;Jie Yang", "authorids": "~Kun_Fang1;~Xiaolin_Huang1;~Yingwen_Wu1;~Tao_Li12;jieyang@sjtu.edu.cn", "gender": "M;M;F;M;", "homepage": "https://fanghenshaometeor.github.io/;http://www.pami.sjtu.edu.cn/en/xiaolin;https://github.com/snowien;https://nblt.github.io/;", "dblp": "51/5923-4;61/2227;236/4329;;", "google_scholar": "yC2s2JIAAAAJ;DR-gBcEAAAAJ;https://scholar.google.com.hk/citations?user=PcJzfBEAAAAJ;https://scholar.google.com/citations?hl=zh-CN;", "orcid": "0000-0001-6351-201X;;;;", "linkedin": ";;;;", "or_profile": "~Kun_Fang1;~Xiaolin_Huang1;~Yingwen_Wu1;~Tao_Li12;jieyang@sjtu.edu.cn", "aff": "Shanghai Jiaotong University;Shanghai Jiaotong University;Shanghai Jiaotong University;Shanghai Jiaotong University;", "aff_domain": "sjtu.edu.cn;sjtu.edu.cn;sjtu.edu;sjtu.edu;", "position": "MS student;Associate Professor;PhD student;PhD student;", "bibtex": "@misc{\nfang2021learn,\ntitle={Learn Robust Features via Orthogonal Multi-Path},\nauthor={Kun Fang and Xiaolin Huang and Yingwen Wu and Tao Li and Jie Yang},\nyear={2021},\nurl={https://openreview.net/forum?id=-p6rexF3qdQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=-p6rexF3qdQ", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "5;3;3;3", "wc_review": "197;262;179;340", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "627;363;197;479", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 244.5, 63.19216723613774 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 416.5, 157.52698181581465 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16784146821839291751&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Shanghai Jiao Tong University", "aff_unique_dep": "", "aff_unique_url": "https://www.sjtu.edu.cn", "aff_unique_abbr": "SJTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "-qB7ZgRNRq", "title": "Towards Data Distillation for End-to-end Spoken Conversational Question Answering", "track": "main", "status": "Reject", "tldr": "", "abstract": "In spoken question answering, QA systems are designed to answer questions from contiguous text spans within the related speech transcripts. However, the most natural way that human seek or test their knowledge is via human conversations. Therefore, we propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora. In this task, our main objective is to build a QA system to deal with conversational questions both in spoken and text forms, and to explore the plausibility of providing more cues in spoken documents with systems in information gathering. To this end, instead of adopting automatically generated speech transcripts with highly noisy data, we propose a novel unified data distillation approach, DDNet, which directly fuse audio-text features to reduce the misalignment between automatic speech recognition hypotheses and the reference transcriptions. In addition, to evaluate the capacity of QA systems in a dialogue-style interaction, we assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with more than 120k question-answer pairs. Experiments demonstrate that our proposed method achieves superior performance in spoken conversational question answering.\n", "keywords": "spoken question answering;natural language processing;speech and language processing;knowledge distillation", "primary_area": "", "supplementary_material": "", "author": "Chenyu You;Nuo Chen;Fenglin Liu;Dongchao Yang;Zhiyang Xu;Yuexian Zou", "authorids": "~Chenyu_You1;~Nuo_Chen1;~Fenglin_Liu1;~Dongchao_Yang1;~Zhiyang_Xu1;~Yuexian_Zou1", "gender": "M;M;M;M;M;", "homepage": "https://chenyuyou.me/;https://jerrynchen.github.io/;;http://dongchaoyang.top;;", "dblp": "191/9432;135/5622-1;;;267/2280;", "google_scholar": "hy_wB7cAAAAJ;https://scholar.google.com/citations?hl=zh-CN;AcbVE3UAAAAJ;WNiojyAAAAAJ;11zbVUAAAAAJ;", "orcid": "0000-0001-8365-7822;;;;;", "linkedin": "chenyu-you-b07475a4/;;;;;", "or_profile": "~Chenyu_You1;~Nuo_Chen1;~Fenglin_Liu1;~Dongchao_Yang1;~Zhiyang_Xu1;~Yuexian_Zou1", "aff": "Yale University;Peking University;;Peking University;Virginia Polytechnic Institute and State University;", "aff_domain": "yale.edu;pku.edu.cn;;pku.edu.cn;vt.edu;", "position": "PhD student;MS student;;MS student;PhD student;", "bibtex": "@misc{\nyou2021towards,\ntitle={Towards Data Distillation for End-to-end Spoken Conversational Question Answering},\nauthor={Chenyu You and Nuo Chen and Fenglin Liu and Dongchao Yang and Zhiyang Xu and Yuexian Zou},\nyear={2021},\nurl={https://openreview.net/forum?id=-qB7ZgRNRq}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=-qB7ZgRNRq", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;3;4;3", "wc_review": "635;225;351;506", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "236;383;317;299", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 429.25, 154.97156997333414 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 308.75, 52.36590016413353 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 43, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13558533081945903316&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "Yale University;Peking University;Virginia Tech", "aff_unique_dep": ";;", "aff_unique_url": "https://www.yale.edu;http://www.pku.edu.cn;https://www.vt.edu", "aff_unique_abbr": "Yale;Peking U;VT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "United States;China" }, { "title": "Analyzing the Expressive Power of Graph Neural Networks in a Spectral Perspective", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3158", "id": "-qh0M9XWxnv", "poster": "", "openreview": "https://openreview.net/forum?id=-qh0M9XWxnv", "slides": "https://iclr.cc/virtual/2021/poster/3158", "video": "https://iclr.cc/virtual/2021/poster/3158", "author_site": "Muhammet Balcilar, Guillaume Renton, Pierre H\u00e9roux, Benoit Ga\u00fcz\u00e8re, S\u00e9bastien Adam, Paul Honeine", "tldr": "", "abstract": "In the recent literature of Graph Neural Networks (GNN), the expressive power of models has been studied through their capability to distinguish if two given graphs are isomorphic or not. Since the graph isomorphism problem is NP-intermediate, and Weisfeiler-Lehman (WL) test can give sufficient but not enough evidence in polynomial time, the theoretical power of GNNs is usually evaluated by the equivalence of WL-test order, followed by an empirical analysis of the models on some reference inductive and transductive datasets. However, such analysis does not account the signal processing pipeline, whose capability is generally evaluated in the spectral domain. In this paper, we argue that a spectral analysis of GNNs behavior can provide a complementary point of view to go one step further in the understanding of GNNs. By bridging the gap between the spectral and spatial design of graph convolutions, we theoretically demonstrate some equivalence of the graph convolution process regardless it is designed in the spatial or the spectral domain. Using this connection, we managed to re-formulate most of the state-of-the-art graph neural networks into one common framework. This general framework allows to lead a spectral analysis of the most popular GNNs, explaining their performance and showing their limits according to spectral point of view. Our theoretical spectral analysis is confirmed by experiments on various graph databases. Furthermore, we demonstrate the necessity of high and/or band-pass filters on a graph dataset, while the majority of GNN is limited to only low-pass and inevitably it fails.", "keywords": "Graph Neural Networks;Spectral Graph Filter;Spectral Analysis", "primary_area": "", "supplementary_material": "", "author": "Muhammet Balcilar;Guillaume Renton;Pierre H\u00e9roux;Benoit Ga\u00fcz\u00e8re;S\u00e9bastien Adam;Paul Honeine", "authorids": "~Muhammet_Balcilar1;guillaume.renton@gmail.com;pierre.heroux@univ-rouen.fr;benoit.gauzere@insa-rouen.fr;~S\u00e9bastien_Adam1;~Paul_Honeine1", "gender": "M;;;;M;M", "homepage": "https://balcilar.weebly.com/;;;;http://pagesperso.litislab.fr/sebadam/;http://honeine.fr", "dblp": "130/0818;;;;03/6714.html;53/7011", "google_scholar": "https://scholar.google.fr/citations?hl=fr;;;;https://scholar.google.fr/citations?user=vNZC5qcAAAAJ;yxk7n1kAAAAJ", "orcid": "0000-0003-1428-4297;;;;;0000-0002-3042-183X", "linkedin": ";;;;;paulhoneine", "or_profile": "~Muhammet_Balcilar1;guillaume.renton@gmail.com;pierre.heroux@univ-rouen.fr;benoit.gauzere@insa-rouen.fr;~S\u00e9bastien_Adam1;~Paul_Honeine1", "aff": "Universit\u00e9 de Rouen;;;;Universit\u00e9 de Rouen;LITIS, Universit\u00e9 de Rouen Normandie, France", "aff_domain": "univ-rouen.fr;;;;univ-rouen.fr;univ-rouen.fr", "position": "Postdoc;;;;Full Professor;Full Professor", "bibtex": "@inproceedings{\nbalcilar2021analyzing,\ntitle={Analyzing the Expressive Power of Graph Neural Networks in a Spectral Perspective},\nauthor={Muhammet Balcilar and Guillaume Renton and Pierre H{\\'e}roux and Benoit Ga{\\\"u}z{\\`e}re and S{\\'e}bastien Adam and Paul Honeine},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=-qh0M9XWxnv}\n}", "github": "[![github](/images/github_icon.svg) balcilar/gnn-spectral-expressive-power](https://github.com/balcilar/gnn-spectral-expressive-power)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;8;8", "confidence": "2;4;4;4", "wc_review": "410;526;357;417", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "811;1559;753;817", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 7.0, 1.0 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 427.5, 61.41864537744218 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 985.0, 332.34018715767735 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 235, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12539425234528098281&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 14, "pdf": "https://openreview.net/pdf?id=-qh0M9XWxnv", "email": "univ-rouen.fr;;;;univ-rouen.fr;univ-rouen.fr", "author_num": 6, "aff_unique_index": "0;0;1", "aff_unique_norm": "Universit\u00e9 de Rouen;Universit\u00e9 de Rouen Normandie", "aff_unique_dep": ";LITIS", "aff_unique_url": "https://www.univ-rouen.fr;https://www.univ-rouen.fr", "aff_unique_abbr": "UniRouen;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "France" }, { "id": "-u4j4dHeWQi", "title": "Explore with Dynamic Map: Graph Structured Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "In reinforcement learning, a map with states and transitions built based on historical trajectories is often helpful in exploration and exploitation. Even so, learning and planning on such a map within a sparse environment remains a challenge. As a step towards this goal, we propose Graph Structured Reinforcement Learning (GSRL), which utilizes historical trajectories to slowly adjust exploration directions and learn related experiences while rapidly updating the value function estimation. GSRL constructs a dynamic graph on top of state transitions in the replay buffer based on historical trajectories, and develops an attention strategy on the map to select an appropriate goal direction, which decomposes the task of reaching a distant goal state into a sequence of easier tasks. We also leverage graph structure to sample related trajectories for efficient value learning. Results demonstrate that GSRL can outperform the state-of-the-art algorithms in terms of sample efficiency on benchmarks with sparse reward functions. ", "keywords": "Deep Reinforcement Learning;Graph Structured Reinforcement Learning;Exploration", "primary_area": "", "supplementary_material": "/attachment/4543d55ce62b253e8dd6f4e7e13092bf766c5f99.zip", "author": "Jiarui Jin;Sijin Zhou;Weinan Zhang;Rasool Fakoor;David Wipf;Tong He;Yong Yu;Zheng Zhang;Alex Smola", "authorids": "~Jiarui_Jin1;zhousijin@sjtu.edu.cn;~Weinan_Zhang1;~Rasool_Fakoor1;~David_Wipf1;~Tong_He5;~Yong_Yu1;~Zheng_Zhang1;~Alex_Smola1", "gender": "M;;M;M;M;M;;M;M", "homepage": "https://jinjiarui.github.io/;;http://wnzhang.net;http://rasoolfa.github.io;http://www.davidwipf.com/;https://hetong007.github.io/;https://apex.sjtu.edu.cn/members/yyu;https://shanghai.nyu.edu/academics/faculty/directory/zheng-zhang;http://alex.smola.org", "dblp": "241/9563;;28/10261-1;123/2447;81/6421;02/1554-2;43/5685.html;;s/AlexanderJSmola", "google_scholar": "unCPHQEAAAAJ;;Qzss0GEAAAAJ;nVsOPtQAAAAJ;YJx1WSgAAAAJ;hV5D8GYAAAAJ;;https://scholar.google.com.hk/citations?user=k0KiE4wAAAAJ;Tb0ZrYwAAAAJ", "orcid": "0000-0001-6458-1586;;0000-0002-0127-2425;;;;0000-0003-4457-2820;;", "linkedin": "jiarui-jerry-jin-ba4a84176/;;;rasool-fakoor-695b5845/;;;;;smola", "or_profile": "~Jiarui_Jin1;zhousijin@sjtu.edu.cn;~Weinan_Zhang1;~Rasool_Fakoor1;~David_Wipf1;~Tong_He5;~Yong_Yu1;~Zheng_Zhang1;~Alex_Smola1", "aff": "Shanghai Jiaotong University;;Shanghai Jiaotong University;Amazon Web Services;Amazon AI Research Lab;Amazon;Shanghai Jiaotong University;Amazon;Amazon", "aff_domain": "sjtu.edu.cn;;sjtu.edu.cn;amazon.com;amazon.com;amazon.com;sjtu.edu.cn;amazon.com;amazon.com", "position": "PhD student;;Associate Professor;Researcher;Principal Research Scientist;Researcher;Full Professor;Senior Principal Scientist;Distinguished Scientist", "bibtex": "@misc{\njin2021explore,\ntitle={Explore with Dynamic Map: Graph Structured Reinforcement Learning},\nauthor={Jiarui Jin and Sijin Zhou and Weinan Zhang and Rasool Fakoor and David Wipf and Tong He and Yong Yu and Zheng Zhang and Alex Smola},\nyear={2021},\nurl={https://openreview.net/forum?id=-u4j4dHeWQi}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=-u4j4dHeWQi", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;3;3;4", "wc_review": "640;358;336;740", "wc_reply_reviewers": "0;0;0;722", "wc_reply_authors": "528;759;414;1304", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;2", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 518.5, 175.27906321064134 ], "wc_reply_reviewers_avg": [ 180.5, 312.63517076618234 ], "wc_reply_authors_avg": [ 751.25, 342.48020015761495 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:afbSV3Yd7CIJ:scholar.google.com/&scioq=Explore+with+Dynamic+Map:+Graph+Structured+Reinforcement+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1;1;1;0;1;1", "aff_unique_norm": "Shanghai Jiao Tong University;Amazon", "aff_unique_dep": ";Amazon Web Services", "aff_unique_url": "https://www.sjtu.edu.cn;https://aws.amazon.com", "aff_unique_abbr": "SJTU;AWS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;1;1;0;1;1", "aff_country_unique": "China;United States" }, { "id": "-yo2vfTt_Cg", "title": "Adaptive norms for deep learning with regularized Newton methods", "track": "main", "status": "Reject", "tldr": "", "abstract": "We investigate the use of regularized Newton methods with adaptive norms for optimizing neural networks. This approach can be seen as a second-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we prove that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for provable convergence of second-order trust region methods with standard worst-case complexities on general non-convex objectives. Furthermore, we run experiments across different neural architectures and datasets to find that the ellipsoidal constraints constantly outperform their spherical counterpart both in terms of number of backpropagations and asymptotic loss value. Finally, we find comparable performance to state-of-the-art first-order methods in terms of backpropagations, but further advances in hardware are needed to render Newton methods competitive in terms of computational time.", "keywords": "Stochastic Optimization;Non-convex Optimization;Deep Learning;Adaptive methods;Newton methods;Second-order optimization", "primary_area": "", "supplementary_material": "/attachment/fc45c084dba8f1e6dc083a491f1687e0f37c863d.zip", "author": "Jonas K Kohler;Leonard Adolphs;Aurelien Lucchi", "authorids": "~Jonas_K_Kohler1;~Leonard_Adolphs1;~Aurelien_Lucchi1", "gender": "M;;M", "homepage": ";https://leox1v.com/;http://people.inf.ethz.ch/alucchi/", "dblp": ";220/4213;14/5780", "google_scholar": "a1rCLUMAAAAJ;HCwVW3sAAAAJ;https://scholar.google.ch/citations?user=V1ONSgIAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Jonas_K_Kohler1;~Leonard_Adolphs1;~Aurelien_Lucchi1", "aff": "Swiss Federal Institute of Technology;Swiss Federal Institute of Technology;Swiss Federal Institute of Technology", "aff_domain": "ethz.ch;ethz.ch;ethz.ch", "position": "PhD student;PhD student;Researcher", "bibtex": "@misc{\nkohler2021adaptive,\ntitle={Adaptive norms for deep learning with regularized Newton methods},\nauthor={Jonas K Kohler and Leonard Adolphs and Aurelien Lucchi},\nyear={2021},\nurl={https://openreview.net/forum?id=-yo2vfTt_Cg}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=-yo2vfTt_Cg", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;3;2", "wc_review": "334;277;193;230", "wc_reply_reviewers": "0;149;0;0", "wc_reply_authors": "154;171;230;149", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 258.5, 52.78494103435183 ], "wc_reply_reviewers_avg": [ 37.25, 64.51889258194068 ], "wc_reply_authors_avg": [ 176.0, 32.225766088644036 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=610275819751650124&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Swiss Federal Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Switzerland" }, { "title": "MoPro: Webly Supervised Learning with Momentum Prototypes", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2567", "id": "0-EYBhgw80y", "poster": "", "openreview": "https://openreview.net/forum?id=0-EYBhgw80y", "slides": "https://iclr.cc/virtual/2021/poster/2567", "video": "https://iclr.cc/virtual/2021/poster/2567", "author_site": "Junnan Li, Caiming Xiong, Steven Hoi", "tldr": "", "abstract": "We propose a webly-supervised representation learning method that does not suffer from the annotation unscalability of supervised learning, nor the computation unscalability of self-supervised learning. Most existing works on webly-supervised representation learning adopt a vanilla supervised learning method without accounting for the prevalent noise in the training data, whereas most prior methods in learning with label noise are less effective for real-world large-scale noisy data. We propose momentum prototypes (MoPro), a simple contrastive learning method that achieves online label noise correction, out-of-distribution sample removal, and representation learning. MoPro achieves state-of-the-art performance on WebVision, a weakly-labeled noisy dataset. MoPro also shows superior performance when the pretrained model is transferred to down-stream image classification and detection tasks. It outperforms the ImageNet supervised pretrained model by +10.5 on 1-shot classification on VOC, and outperforms the best self-supervised pretrained model by +17.3 when finetuned on 1% of ImageNet labeled samples. Furthermore, MoPro is more robust to distribution shifts. Code and pretrained models are available at https://github.com/salesforce/MoPro.", "keywords": "webly-supervised learning;weakly-supervised learning;contrastive learning;representation learning", "primary_area": "", "supplementary_material": "/attachment/47c26b80d704bccaee748680d14da5bb379bf04a.zip", "author": "Junnan Li;Caiming Xiong;Steven Hoi", "authorids": "~Junnan_Li2;~Caiming_Xiong1;~Steven_Hoi2", "gender": "M;M;M", "homepage": "http://cmxiong.com/;http://stevenhoi.com;https://sites.google.com/site/junnanlics/", "dblp": "80/7282;;193/6773-1.html", "google_scholar": "vaSdahkAAAAJ;JoLjflYAAAAJ;MuUhwi0AAAAJ", "orcid": ";;", "linkedin": "caiming-xiong-150a1417;;", "or_profile": "~Caiming_Xiong1;~Steven_Hoi2;~Junnan_li1", "aff": "Salesforce Research;Singapore Management University;Salesforce Research", "aff_domain": "salesforce.com;smu.edu.sg;salesforce.com", "position": "Research Scientist;Associate Professor;Research Scientist", "bibtex": "@inproceedings{\nli2021mopro,\ntitle={MoPro: Webly Supervised Learning with Momentum Prototypes},\nauthor={Junnan Li and Caiming Xiong and Steven Hoi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0-EYBhgw80y}\n}", "github": "[![github](/images/github_icon.svg) salesforce/MoPro](https://github.com/salesforce/MoPro) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=0-EYBhgw80y)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;4;4;4", "wc_review": "305;288;363;266", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "452;240;247;233", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 305.5, 35.961785272703025 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 293.0, 91.93204011659917 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 127, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3510417880461380553&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=0-EYBhgw80y", "email": "salesforce.com;smu.edu.sg;salesforce.com", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "Salesforce;Singapore Management University", "aff_unique_dep": "Salesforce Research;", "aff_unique_url": "https://research.salesforce.com;https://www.smu.edu.sg", "aff_unique_abbr": "Salesforce;SMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;Singapore" }, { "title": "Human-Level Performance in No-Press Diplomacy via Equilibrium Search", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2843", "id": "0-uUGPbIjD", "poster": "", "openreview": "https://openreview.net/forum?id=0-uUGPbIjD", "slides": "https://iclr.cc/virtual/2021/poster/2843", "video": "https://iclr.cc/virtual/2021/poster/2843", "author_site": "Jonathan Gray, Adam Lerer, Anton Bakhtin, Noam Brown", "tldr": "", "abstract": "Prior AI breakthroughs in complex games have focused on either the purely adversarial or purely cooperative settings. In contrast, Diplomacy is a game of shifting alliances that involves both cooperation and competition. For this reason, Diplomacy has proven to be a formidable research challenge. In this paper we describe an agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via regret minimization. Regret minimization techniques have been behind previous AI successes in adversarial games, most notably poker, but have not previously been shown to be successful in large-scale games involving cooperation. We show that our agent greatly exceeds the performance of past no-press Diplomacy bots, is unexploitable by expert humans, and ranks in the top 2% of human players when playing anonymous games on a popular Diplomacy website.", "keywords": "multi-agent systems;regret minimization;no-regret learning;game theory;reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/3274b9f417e258f3486082e095b50b3399a6d629.zip", "author": "Jonathan Gray;Adam Lerer;Anton Bakhtin;Noam Brown", "authorids": "~Jonathan_Gray2;~Adam_Lerer1;~Anton_Bakhtin1;~Noam_Brown2", "gender": ";M;;", "homepage": ";;;http://www.cs.cmu.edu/~noamb", "dblp": ";;;https://dblp.uni-trier.de/pers/hd/b/Brown:Noam", "google_scholar": "abPVGwYAAAAJ;;50O3v1MAAAAJ;RLDbLcUAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Jonathan_Gray2;~Adam_Lerer1;~Anton_Bakhtin1;~Noam_Brown2", "aff": "Facebook AI Research;;Meta Facebook;Meta Facebook", "aff_domain": "fb.com;;facebook.com;facebook.com", "position": "Researcher;;Researcher;Research Scientist", "bibtex": "@inproceedings{\ngray2021humanlevel,\ntitle={Human-Level Performance in No-Press Diplomacy via Equilibrium Search},\nauthor={Jonathan Gray and Adam Lerer and Anton Bakhtin and Noam Brown},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0-uUGPbIjD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer5;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "7;7;8;8", "confidence": "4;5;4;4", "wc_review": "469;813;1822;308", "wc_reply_reviewers": "0;128;215;0", "wc_reply_authors": "779;324;911;68", "reply_reviewers": "0;1;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 7.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 853.0, 588.4390367744139 ], "wc_reply_reviewers_avg": [ 85.75, 91.09987650924671 ], "wc_reply_authors_avg": [ 520.5, 340.10329313313036 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 60, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17504167850181788730&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=0-uUGPbIjD", "email": "fb.com;;facebook.com;facebook.com", "author_num": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Facebook AI Research", "aff_unique_url": "https://research.facebook.com", "aff_unique_abbr": "FAIR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2561", "id": "01olnfLIbD", "poster": "", "openreview": "https://openreview.net/forum?id=01olnfLIbD", "slides": "https://iclr.cc/virtual/2021/poster/2561", "video": "https://iclr.cc/virtual/2021/poster/2561", "author_site": "Jonas Geiping, Liam H Fowl, Ronny Huang, Wojciech Czaja, Gavin Taylor, Michael Moeller, Tom Goldstein", "tldr": "", "abstract": "Data Poisoning attacks modify training data to maliciously control a model trained on such data.\nIn this work, we focus on targeted poisoning attacks which cause a reclassification of an unmodified test image and as such breach model integrity. We consider a\nparticularly malicious poisoning attack that is both ``from scratch\" and ``clean label\", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data. \nPrevious poisoning attacks against deep neural networks in this setting have been limited in scope and success, working only in simplified settings or being prohibitively expensive for large datasets.\nThe central mechanism of the new attack is matching the gradient direction of malicious examples. We analyze why this works, supplement with practical considerations. and show its threat to real-world practitioners, finding that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.\nFinally we demonstrate the limitations of existing defensive strategies against such an attack, concluding that data poisoning is a credible threat, even for large-scale deep learning systems.", "keywords": "Data Poisoning;ImageNet;Large-scale;Gradient Alignment;Security;Backdoor Attacks;from-scratch;clean-label", "primary_area": "", "supplementary_material": "/attachment/dda63631727de698b89e21cd475183d5b808b192.zip", "author": "Jonas Geiping;Liam H Fowl;W. Ronny Huang;Wojciech Czaja;Gavin Taylor;Michael Moeller;Tom Goldstein", "authorids": "~Jonas_Geiping1;~Liam_H_Fowl1;~W._Ronny_Huang1;~Wojciech_Czaja1;~Gavin_Taylor1;~Michael_Moeller1;~Tom_Goldstein1", "gender": "M;;;;M;M;M", "homepage": "https://jonasgeiping.github.io/;;;;https://www.usna.edu/Users/cs/taylor/;http://vsa.informatik.uni-siegen.de;https://www.cs.umd.edu/~tomg/", "dblp": "190/7229;241/6940;;;;08/5840-1;25/8184", "google_scholar": "https://scholar.google.de/citations?user=206vNCEAAAAJ;IXv3ToAAAAAJ;;;https://scholar.google.no/citations?user=hDqVCIoAAAAJ;https://scholar.google.de/citations?user=sxzdAGUAAAAJ;KmSuVtgAAAAJ", "orcid": ";;;;;;", "linkedin": ";;;;;;", "or_profile": "~Jonas_Geiping1;~Liam_H_Fowl1;~W._Ronny_Huang1;~Wojciech_Czaja1;~Gavin_Taylor1;~Michael_Moeller1;~Tom_Goldstein1", "aff": "University of Siegen;University of Maryland, College Park;;;US Naval Academy;University of Siegen;University of Maryland, College Park", "aff_domain": "uni-siegen.de;umd.edu;;;usna.edu;uni-siegen.de;umd.edu", "position": "PhD student;PhD student;;;Full Professor;Full Professor;Associate Professor", "bibtex": "@inproceedings{\ngeiping2021witches,\ntitle={Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching},\nauthor={Jonas Geiping and Liam H Fowl and W. Ronny Huang and Wojciech Czaja and Gavin Taylor and Michael Moeller and Tom Goldstein},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=01olnfLIbD}\n}", "github": "[![github](/images/github_icon.svg) JonasGeiping/poisoning-gradient-matching](https://github.com/JonasGeiping/poisoning-gradient-matching) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=01olnfLIbD)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "5;4;4;3", "wc_review": "1758;243;169;643", "wc_reply_reviewers": "294;20;0;0", "wc_reply_authors": "1932;231;21;684", "reply_reviewers": "2;1;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 703.25, 635.0946287759014 ], "wc_reply_reviewers_avg": [ 78.5, 124.68660713966035 ], "wc_reply_authors_avg": [ 717.0, 741.2701936541089 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 269, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12446963321584021008&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=01olnfLIbD", "email": "uni-siegen.de;umd.edu;;;usna.edu;uni-siegen.de;umd.edu", "author_num": 7, "aff_unique_index": "0;1;2;0;1", "aff_unique_norm": "University of Siegen;University of Maryland;United States Naval Academy", "aff_unique_dep": ";;", "aff_unique_url": "https://www.uni-siegen.de;https://www/umd.edu;https://www.usna.edu", "aff_unique_abbr": "Uni Siegen;UMD;USNA", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";College Park", "aff_country_unique_index": "0;1;1;0;1", "aff_country_unique": "Germany;United States" }, { "title": "Set Prediction without Imposing Structure as Conditional Density Estimation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2835", "id": "04ArenGOz3", "poster": "", "openreview": "https://openreview.net/forum?id=04ArenGOz3", "slides": "https://iclr.cc/virtual/2021/poster/2835", "video": "https://iclr.cc/virtual/2021/poster/2835", "author_site": "David Zhang, Gertjan J Burghouts, Cees G Snoek", "tldr": "", "abstract": "Set prediction is about learning to predict a collection of unordered variables with unknown interrelations. Training such models with set losses imposes the structure of a metric space over sets. We focus on stochastic and underdefined cases, where an incorrectly chosen loss function leads to implausible predictions. Example tasks include conditional point-cloud reconstruction and predicting future states of molecules. In this paper we propose an alternative to training via set losses, by viewing learning as conditional density estimation. Our learning framework fits deep energy-based models and approximates the intractable likelihood with gradient-guided sampling. Furthermore, we propose a stochastically augmented prediction algorithm that enables multiple predictions, reflecting the possible variations in the target set. We empirically demonstrate on a variety of datasets the capability to learn multi-modal densities and produce different plausible predictions. Our approach is competitive with previous set prediction models on standard benchmarks. More importantly, it extends the family of addressable tasks beyond those that have unambiguous predictions.", "keywords": "set prediction;energy based models", "primary_area": "", "supplementary_material": "", "author": "David W Zhang;Gertjan J. Burghouts;Cees G. M. Snoek", "authorids": "~David_W_Zhang1;gertjan.burghouts@tno.nl;~Cees_G._M._Snoek1", "gender": "M;;", "homepage": "https://davzha.netlify.app/;;", "dblp": "119/0960;;", "google_scholar": "https://scholar.google.nl/citations?user=MG3oLzUAAAAJ;;", "orcid": "0000-0002-2137-1738;;", "linkedin": "david-zhang-1b86b314a;;", "or_profile": "~David_W_Zhang1;gertjan.burghouts@tno.nl;~Cees_G._M._Snoek1", "aff": "University of Amsterdam;;", "aff_domain": "uva.nl;;", "position": "PhD student;;", "bibtex": "@inproceedings{\nzhang2021set,\ntitle={Set Prediction without Imposing Structure as Conditional Density Estimation},\nauthor={David W Zhang and Gertjan J. Burghouts and Cees G. M. Snoek},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=04ArenGOz3}\n}", "github": "[![github](/images/github_icon.svg) davzha/DESP](https://github.com/davzha/DESP)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "2;3;3;3", "wc_review": "596;371;436;402", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "782;773;599;764", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 451.25, 86.67576074082073 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 729.5, 75.61249896677135 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3129155688534171639&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=04ArenGOz3", "email": "uva.nl;;", "author_num": 3, "aff_unique_index": "0", "aff_unique_norm": "University of Amsterdam", "aff_unique_dep": "", "aff_unique_url": "https://www.uva.nl", "aff_unique_abbr": "UvA", "aff_country_unique_index": "0", "aff_country_unique": "Netherlands" }, { "title": "Learning a Latent Simplex in Input Sparsity Time", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2553", "id": "04LZCAxMSco", "poster": "", "openreview": "https://openreview.net/forum?id=04LZCAxMSco", "slides": "https://iclr.cc/virtual/2021/poster/2553", "video": "https://iclr.cc/virtual/2021/poster/2553", "author_site": "Ainesh Bakshi, Chiranjib Bhattacharyya, Ravi Kannan, David Woodruff, Samson Zhou", "tldr": "", "abstract": "We consider the problem of learning a latent $k$-vertex simplex $K\\in\\mathbb{R}^d$, given $\\mathbf{A}\\in\\mathbb{R}^{d\\times n}$, which can be viewed as $n$ data points that are formed by randomly perturbing some latent points in $K$, possibly beyond $K$. A large class of latent variable models, such as adversarial clustering, mixed membership stochastic block models, and topic models can be cast in this view of learning a latent simplex. Bhattacharyya and Kannan (SODA 2020) give an algorithm for learning such a $k$-vertex latent simplex in time roughly $O(k\\cdot\\text{nnz}(\\mathbf{A}))$, where $\\text{nnz}(\\mathbf{A})$ is the number of non-zeros in $\\mathbf{A}$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $\\mathbf{A}$, which holds in many of these applications. Further, we show this assumption is necessary, as otherwise an algorithm for learning a latent simplex would imply a better low rank approximation algorithm than what is known. \n\nWe obtain a spectral low-rank approximation to $\\mathbf{A}$ in input-sparsity time and show that the column space thus obtained has small $\\sin\\Theta$ (angular) distance to the right top-$k$ singular space of $\\mathbf{A}$. Our algorithm then selects $k$ points in the low-rank subspace with the largest inner product (in absolute value) with $k$ carefully chosen random vectors. By working in the low-rank subspace, we avoid reading the entire matrix in each iteration and thus circumvent the $\\Theta(k\\cdot\\text{nnz}(\\mathbf{A}))$ running time.", "keywords": "Latent Simplex;numerical linear algebra;low-rank approximation", "primary_area": "", "supplementary_material": "/attachment/6f286ec06f9c0f16bbeb666a8709cfdaac17266c.zip", "author": "Ainesh Bakshi;Chiranjib Bhattacharyya;Ravi Kannan;David Woodruff;Samson Zhou", "authorids": "~Ainesh_Bakshi1;~Chiranjib_Bhattacharyya1;~Ravi_Kannan1;~David_Woodruff1;~Samson_Zhou1", "gender": "M;M;;M;", "homepage": "http://aineshbakshi.com/;http://www.csa.iisc.ac.in/~chiru/;;http://www.cs.cmu.edu/~dwoodruf/;https://samsonzhou.github.io/", "dblp": "132/1905;b/CBhattacharyya;k/RaviKannan;w/DPWoodruff;179/2683", "google_scholar": ";;;https://scholar.google.com.tw/citations?user=0G2t-6sAAAAJ;NpjsgocAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Ainesh_Bakshi1;~Chiranjib_Bhattacharyya1;~Ravi_Kannan1;~David_Woodruff1;~Samson_Zhou1", "aff": "School of Computer Science, Carnegie Mellon University;Indian Institute of Science, Indian institute of science, Bangalore;;Carnegie Mellon University;School of Computer Science, Carnegie Mellon University", "aff_domain": "cs.cmu.edu;iisc.ac.in;;cmu.edu;cs.cmu.edu", "position": "PhD student;Full Professor;;Associate Professor;Postdoc", "bibtex": "@inproceedings{\nbakshi2021learning,\ntitle={Learning a Latent Simplex in Input Sparsity Time},\nauthor={Ainesh Bakshi and Chiranjib Bhattacharyya and Ravi Kannan and David Woodruff and Samson Zhou},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=04LZCAxMSco}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "7;8;9", "confidence": "4;3;4", "wc_review": "268;188;240", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "125;121;132", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 8.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 232.0, 33.14614105241614 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 126.0, 4.546060565661952 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5133476229134810687&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=04LZCAxMSco", "email": "cs.cmu.edu;iisc.ac.in;;cmu.edu;cs.cmu.edu", "author_num": 5, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Carnegie Mellon University;Indian Institute of Science", "aff_unique_dep": "School of Computer Science;", "aff_unique_url": "https://www.cmu.edu;https://www.iisc.ac.in", "aff_unique_abbr": "CMU;IISc", "aff_campus_unique_index": "0;1;0", "aff_campus_unique": "Pittsburgh;Bangalore;", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "United States;India" }, { "title": "A Universal Representation Transformer Layer for Few-Shot Image Classification", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2568", "id": "04cII6MumYV", "poster": "", "openreview": "https://openreview.net/forum?id=04cII6MumYV", "slides": "https://iclr.cc/virtual/2021/poster/2568", "video": "https://iclr.cc/virtual/2021/poster/2568", "author_site": "Lu Liu, William Hamilton, Guodong Long, Jing Jiang, Hugo Larochelle", "tldr": "", "abstract": "Few-shot classification aims to recognize unseen classes when presented with only a small number of samples. We consider the problem of multi-domain few-shot image classification, where unseen classes and examples come from diverse data sources. This problem has seen growing interest and has inspired the development of benchmarks such as Meta-Dataset. A key challenge in this multi-domain setting is to effectively integrate the feature representations from the diverse set of training domains. Here, we propose a Universal Representation Transformer (URT) layer, that meta-learns to leverage universal features for few-shot classification by dynamically re-weighting and composing the most appropriate domain-specific representations. In experiments, we show that URT sets a new state-of-the-art result on Meta-Dataset. Specifically, it achieves top-performance on the highest number of data sources compared to competing methods. We analyze variants of URT and present a visualization of the attention score heatmaps that sheds light on how the model performs cross-domain generalization.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/31866908eb2dbabb750f64bb3c3a5d6fb11ec434.zip", "author": "Lu Liu;William L. Hamilton;Guodong Long;Jing Jiang;Hugo Larochelle", "authorids": "~Lu_Liu7;~William_L._Hamilton1;~Guodong_Long2;~Jing_Jiang6;~Hugo_Larochelle1", "gender": ";M;F;M;F", "homepage": ";https://www.uts.edu.au/staff/guodong.long;https://www.uts.edu.au/staff/jing.jiang;https://mila.quebec/en/directory/hugo-larochelle;https://liulu112601.github.io/", "dblp": "137/3314;34/10089;68/1974-2;86/3862.html;", "google_scholar": ";https://scholar.google.com.au/citations?user=Pl8m7hMAAAAJ;https://scholar.google.com.au/citations?hl=en;https://scholar.google.ca/citations?user=U89FHq4AAAAJ;epMGJ28AAAAJ", "orcid": ";0000-0003-3740-9515;;;", "linkedin": ";;;;lu-liu-2b5b93187/", "or_profile": "~William_L._Hamilton1;~Guodong_Long2;~Jing_Jiang6;~Hugo_Larochelle1;~Lu_Liu4", "aff": "McGill University;University of Technology Sydney;University of Technology Sydney;Universit\u00e9 de Sherbrooke;University of Technology Sydney", "aff_domain": "mcgill.ca;uts.edu.au;uts.edu.au;usherbrooke.ca;uts.edu.au", "position": "Assistant Professor;Associate Professor;Lecturer;Adjunct Professor;PhD student", "bibtex": "@inproceedings{\nliu2021a,\ntitle={A Universal Representation Transformer Layer for Few-Shot Image Classification},\nauthor={Lu Liu and William L. Hamilton and Guodong Long and Jing Jiang and Hugo Larochelle},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=04cII6MumYV}\n}", "github": "[![github](/images/github_icon.svg) liulu112601/URT](https://github.com/liulu112601/URT)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;7;8", "confidence": "4;5;5;5;5", "wc_review": "395;388;399;594;460", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "672;399;188;512;60", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 6.8, 0.7483314773547882 ], "confidence_avg": [ 4.8, 0.39999999999999997 ], "wc_review_avg": [ 447.2, 77.80334183054093 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 366.2, 219.70926243561055 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5345224838248487, "gs_citation": 169, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6018140255832554871&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=04cII6MumYV", "email": "mcgill.ca;uts.edu.au;uts.edu.au;usherbrooke.ca;uts.edu.au", "author_num": 5, "aff_unique_index": "0;1;1;2;1", "aff_unique_norm": "McGill University;University of Technology Sydney;Universit\u00e9 de Sherbrooke", "aff_unique_dep": ";;", "aff_unique_url": "https://www.mcgill.ca;https://www.uts.edu.au;https://www.usherbrooke.ca", "aff_unique_abbr": "McGill;UTS;UdeS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0;1", "aff_country_unique": "Canada;Australia" }, { "title": "Self-supervised Representation Learning with Relative Predictive Coding", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2792", "id": "068E_JSq9O", "poster": "", "openreview": "https://openreview.net/forum?id=068E_JSq9O", "slides": "https://iclr.cc/virtual/2021/poster/2792", "video": "https://iclr.cc/virtual/2021/poster/2792", "author_site": "Yao-Hung Hubert Tsai, Martin Ma, Muqiao Yang, Han Zhao, Louis-Philippe Morency, Ruslan Salakhutdinov", "tldr": "", "abstract": "This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance. The key to the success of RPC is two-fold. First, RPC introduces the relative parameters to regularize the objective for boundedness and low variance. Second, RPC contains no logarithm and exponential score functions, which are the main cause of training instability in prior contrastive objectives. We empirically verify the effectiveness of RPC on benchmark vision and speech self-supervised learning tasks. Lastly, we relate RPC with mutual information (MI) estimation, showing RPC can be used to estimate MI with low variance.", "keywords": "self-supervised learning;contrastive learning;dependency based method", "primary_area": "", "supplementary_material": "/attachment/afca024ca1a3df20233289c76ab82d4f76913c37.zip", "author": "Yao-Hung Hubert Tsai;Martin Q. Ma;Muqiao Yang;Han Zhao;Louis-Philippe Morency;Ruslan Salakhutdinov", "authorids": "~Yao-Hung_Hubert_Tsai1;~Martin_Q._Ma1;~Muqiao_Yang1;~Han_Zhao1;~Louis-Philippe_Morency1;~Ruslan_Salakhutdinov1", "gender": "M;M;M;M;M;M", "homepage": ";https://muqiaoy.github.io;https://hanzhaoml.github.io/;https://www.cs.cmu.edu/~morency/;http://www.cs.cmu.edu/~qianlim/;https://www.cs.cmu.edu/~rsalakhu/", "dblp": "154/3702;239/6073;03/3520-2;31/739;251/5669.html;", "google_scholar": ";https://scholar.google.com/citations?hl=en;x942ipYAAAAJ;https://scholar.google.com.tw/citations?user=APgaFK0AAAAJ;TFCtuaQAAAAJ;", "orcid": ";0000-0001-6273-0138;0000-0002-8579-1600;0000-0001-6376-7696;;", "linkedin": ";muqiaoy/;;morency?challengeId=AQELGK_OvMa0vwAAAY72L-VV4X9hW8juuY80VHVeeSGHZ1PJHeeEa5LTFoeTmDGU0t1OL07MXJTYC9EAi6qgPDd2z9ztnbdFYA&submissionId=09a0ff34-04ac-c717-bef7-8c9c8811b463&challengeSource=AgFhxWkU3q7v4wAAAY72L-1xRE0eG-BnZUNE9e3eAG95pgOCZ9u1nxEg-1dK2Dw&challegeType=AgHMzV0lqKgEFwAAAY72L-11X6DHMd3V_A3Iur8XZeyYF2-oBzoufs8&memberId=AgH4yz7pZ_riCgAAAY72L-146jmR2pdr3dmhy2icxBtEQzQ&recognizeDevice=AgFDCNyrhKiFSAAAAY72L-16m7z2EH2t0ueWmMKjyk1_ZJAkfFVe;;", "or_profile": "~Yao-Hung_Hubert_Tsai1;~Muqiao_Yang1;~Han_Zhao1;~Louis-Philippe_Morency1;~Martin_Ma2;~Russ_Salakhutdinov1", "aff": "Carnegie Mellon University;Carnegie Mellon University;University of Illinois, Urbana Champaign;Carnegie Mellon University;Carnegie Mellon University;School of Computer Science, Carnegie Mellon University", "aff_domain": "cmu.edu;andrew.cmu.edu;illinois.edu;cmu.edu;cs.cmu.edu;cs.cmu.edu", "position": "PhD student;PhD student;Assistant Professor;Associate Professor;PhD student;Full Professor", "bibtex": "@inproceedings{\ntsai2021selfsupervised,\ntitle={Self-supervised Representation Learning with Relative Predictive Coding},\nauthor={Yao-Hung Hubert Tsai and Martin Q. Ma and Muqiao Yang and Han Zhao and Louis-Philippe Morency and Ruslan Salakhutdinov},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=068E_JSq9O}\n}", "github": "[![github](/images/github_icon.svg) martinmamql/relative_predictive_coding](https://github.com/martinmamql/relative_predictive_coding)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "4;3;3;4", "wc_review": "364;150;197;149", "wc_reply_reviewers": "92;0;0;0", "wc_reply_authors": "1004;332;525;31", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 215.0, 88.1844657521947 ], "wc_reply_reviewers_avg": [ 23.0, 39.83716857408418 ], "wc_reply_authors_avg": [ 473.0, 353.52156935610026 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 44, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17809486725301186145&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=068E_JSq9O", "email": "cmu.edu;andrew.cmu.edu;illinois.edu;cmu.edu;cs.cmu.edu;cs.cmu.edu", "author_num": 6, "aff_unique_index": "0;0;1;0;0;0", "aff_unique_norm": "Carnegie Mellon University;University of Illinois Urbana-Champaign", "aff_unique_dep": ";", "aff_unique_url": "https://www.cmu.edu;https://illinois.edu", "aff_unique_abbr": "CMU;UIUC", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Urbana-Champaign;Pittsburgh", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "083vV3utxpC", "title": "Deep Partial Updating", "track": "main", "status": "Reject", "tldr": "", "abstract": "Emerging edge intelligence applications require the server to continuously retrain and update deep neural networks deployed on remote edge nodes to leverage newly collected data samples. Unfortunately, it may be impossible in practice to continuously send fully updated weights to these edge nodes due to the highly constrained communication resource. In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects only a subset of weights to update at each server-to-edge communication round, while achieving a similar performance compared to full updating. Our method is established through analytically upper-bounding the loss difference between partial updating and full updating, and only updates the weights which make the largest contributions to the upper bound. Extensive experimental results demonstrate the efficacy of our partial updating methodology which achieves a high inference accuracy while updating a rather small number of weights.", "keywords": "Partial updating;communication constraints;server-to-edge;deep neural networks", "primary_area": "", "supplementary_material": "/attachment/2456854f07f27cde653697b188dc657275957540.zip", "author": "Zhongnan Qu;Cong Liu;Junfeng Guo;Lothar Thiele", "authorids": "~Zhongnan_Qu1;~Cong_Liu2;~Junfeng_Guo2;~Lothar_Thiele1", "gender": "M;;M;", "homepage": ";https://intra.ece.ucr.edu/~cong/;https://junfenggo.github.io/;", "dblp": ";https://dblp.uni-trier.de/pers/l/Liu_0005:Cong.html;;", "google_scholar": "https://scholar.google.com/citations?hl=zh-CN;vpc4bggAAAAJ;TqblqYcAAAAJ;", "orcid": ";;;", "linkedin": "zhongnan-qu-a79749115/;;;", "or_profile": "~Zhongnan_Qu1;~Cong_Liu2;~Junfeng_Guo2;~Lothar_Thiele1", "aff": "Meta ;University of Texas, Dallas;University of Texas, Dallas;", "aff_domain": "fb.com;utdallas.edu;utdallas.edu;", "position": "Intern;Associate Professor;PhD student;", "bibtex": "@misc{\nqu2021deep,\ntitle={Deep Partial Updating},\nauthor={Zhongnan Qu and Cong Liu and Junfeng Guo and Lothar Thiele},\nyear={2021},\nurl={https://openreview.net/forum?id=083vV3utxpC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=083vV3utxpC", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;3;3;3", "wc_review": "196;379;1072;582", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "271;248;777;576", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 557.25, 327.05303469009425 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 468.0, 220.42799277768694 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1;1", "aff_unique_norm": "Meta;University of Texas at Dallas", "aff_unique_dep": "Meta Platforms, Inc.;", "aff_unique_url": "https://meta.com;https://www.utdallas.edu", "aff_unique_abbr": "Meta;UT Dallas", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Dallas", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Rethinking Positional Encoding in Language Pre-training", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2982", "id": "09-528y2Fgf", "poster": "", "openreview": "https://openreview.net/forum?id=09-528y2Fgf", "slides": "https://iclr.cc/virtual/2021/poster/2982", "video": "https://iclr.cc/virtual/2021/poster/2982", "author_site": "Guolin Ke, Di He, Tie-Yan Liu", "tldr": "", "abstract": "In this work, we investigate the positional encoding methods used in language pre-training (e.g., BERT) and identify several problems in the existing formulations. First, we show that in the absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations between the two heterogeneous information resources. It may bring unnecessary randomness in the attention and further limit the expressiveness of the model. Second, we question whether treating the position of the symbol \\texttt{[CLS]} the same as other words is a reasonable design, considering its special role (the representation of the entire sentence) in the downstream tasks. Motivated from above analysis, we propose a new positional encoding method called \\textbf{T}ransformer with \\textbf{U}ntied \\textbf{P}ositional \\textbf{E}ncoding (TUPE). In the self-attention module, TUPE computes the word contextual correlation and positional correlation separately with different parameterizations and then adds them together. This design removes the mixed and noisy correlations over heterogeneous embeddings and offers more expressiveness by using different projection matrices. Furthermore, TUPE unties the \\texttt{[CLS]} symbol from other positions, making it easier to capture information from all positions. Extensive experiments and ablation studies on GLUE benchmark demonstrate the effectiveness of the proposed method. Codes and models are released at \\url{https://github.com/guolinke/TUPE}.", "keywords": "Natural Language Processing;Pre-training", "primary_area": "", "supplementary_material": "", "author": "Guolin Ke;Di He;Tie-Yan Liu", "authorids": "~Guolin_Ke3;~Di_He1;~Tie-Yan_Liu1", "gender": "M;M;M", "homepage": "https://dihe-pku.github.io/;http://member.acm.org/~tieyanliu;https://guolinke.github.io", "dblp": "74/184;l/TieYanLiu;190/7810", "google_scholar": "https://scholar.google.co.jp/citations?user=orVoz4IAAAAJ;Nh832fgAAAAJ;M2qJgtoAAAAJ", "orcid": ";0000-0002-0476-8020;", "linkedin": ";;", "or_profile": "~Di_He1;~Tie-Yan_Liu1;~guolin_ke1", "aff": "Microsoft;Microsoft;Microsoft", "aff_domain": "microsoft.com;microsoft.com;microsoft.com", "position": "Senior Researcher;Distinguished Scientist;Senior Researcher", "bibtex": "@inproceedings{\nke2021rethinking,\ntitle={Rethinking Positional Encoding in Language Pre-training},\nauthor={Guolin Ke and Di He and Tie-Yan Liu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=09-528y2Fgf}\n}", "github": "[![github](/images/github_icon.svg) guolinke/TUPE](https://github.com/guolinke/TUPE) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=09-528y2Fgf)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;4;4", "wc_review": "701;271;294;296", "wc_reply_reviewers": "65;0;0;0", "wc_reply_authors": "596;126;51;157", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 390.5, 179.53620804729056 ], "wc_reply_reviewers_avg": [ 16.25, 28.145825622994256 ], "wc_reply_authors_avg": [ 232.5, 213.37584211901776 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 348, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13553136852407909165&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=09-528y2Fgf", "email": "microsoft.com;microsoft.com;microsoft.com", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Microsoft Corporation", "aff_unique_url": "https://www.microsoft.com", "aff_unique_abbr": "Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "0BaWDGvCa5p", "title": "A Provably Convergent and Practical Algorithm for Min-Max Optimization with Applications to GANs", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present a first-order algorithm for nonconvex-nonconcave min-max optimization problems such as those that arise in training GANs. Our algorithm provably converges in $\\mathrm{poly}(d,L, b)$ steps for any loss function $f:\\mathbb{R}^d \\times \\mathbb{R}^d \\rightarrow \\mathbb{R}$ which is $b$-bounded with ${L}$-Lipschitz gradient. To achieve convergence, we 1) give a novel approximation to the global strategy of the max-player based on first-order algorithms such as gradient ascent, and 2) empower the min-player to look ahead and simulate the max-player\u2019s response for arbitrarily many steps, but restrict the min-player to move according to updates sampled from a stochastic gradient oracle. Our algorithm, when used to train GANs on synthetic and real-world datasets, does not cycle, results in GANs that seem to avoid mode collapse, and achieves a training time per iteration and memory requirement similar to gradient descent-ascent.\n", "keywords": "min-max optimization;GANs", "primary_area": "", "supplementary_material": "/attachment/2872edea10cfe70b999214699b027faa0f776c93.zip", "author": "Oren Mangoubi;Sushant Sachdeva;Nisheeth K Vishnoi", "authorids": "~Oren_Mangoubi1;~Sushant_Sachdeva1;~Nisheeth_K_Vishnoi1", "gender": "M;M;M", "homepage": ";https://www.cs.toronto.edu/~sachdeva/;http://cs.yale.edu/homes/vishnoi/Home.html", "dblp": "158/6707;25/9221;02/2229", "google_scholar": ";sb1-HK8AAAAJ;", "orcid": ";0000-0002-5393-9324;", "linkedin": ";sachdevasushant;", "or_profile": "~Oren_Mangoubi1;~Sushant_Sachdeva1;~Nisheeth_K_Vishnoi1", "aff": "Worcester Polytechnic Institute;University of Toronto;Google", "aff_domain": "wpi.edu;toronto.edu;google.com", "position": "Assistant Professor;Associate Professor;Visiting researcher", "bibtex": "@misc{\nmangoubi2021a,\ntitle={A Provably Convergent and Practical Algorithm for Min-Max Optimization with Applications to {\\{}GAN{\\}}s},\nauthor={Oren Mangoubi and Sushant Sachdeva and Nisheeth K Vishnoi},\nyear={2021},\nurl={https://openreview.net/forum?id=0BaWDGvCa5p}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=0BaWDGvCa5p", "pdf_size": 0, "rating": "4;6;6", "confidence": "4;3;2", "wc_review": "561;289;168", "wc_reply_reviewers": "198;0;0", "wc_reply_authors": "1100;683;551", "reply_reviewers": "2;0;0", "reply_authors": "4;2;2", "rating_avg": [ 5.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 339.3333333333333, 164.34178477253502 ], "wc_reply_reviewers_avg": [ 66.0, 93.33809511662427 ], "wc_reply_authors_avg": [ 778.0, 233.9786315029644 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 2.6666666666666665, 0.9428090415820634 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2751721560093196159&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Worcester Polytechnic Institute;University of Toronto;Google", "aff_unique_dep": ";;Google", "aff_unique_url": "https://www.wpi.edu;https://www.utoronto.ca;https://www.google.com", "aff_unique_abbr": "WPI;U of T;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;Canada" }, { "id": "0DALDI-xyW4", "title": "A new accelerated gradient method inspired by continuous-time perspective", "track": "main", "status": "Reject", "tldr": "", "abstract": "Nesterov's accelerated method are widely used in problems with machine learning background including deep learning. To give more insight about the acceleration phenomenon, an ordinary differential equation was obtained from Nesterov's accelerated method by taking step sizes approaching zero, and the relationship between Nesterov's method and the differential equation is still of research interest. In this work, we give the precise order of the iterations of Nesterov's accelerated method converging to the solution of derived differential equation as step sizes go to zero. We then present a new accelerated method with higher order. The new method is more stable than ordinary method for large step size and converges faster. We further apply the new method to matrix completion problem and show its better performance through numerical experiments.", "keywords": "accelerated gradient method;matrix completion;first-order methods;differential equation", "primary_area": "", "supplementary_material": "/attachment/e8b17693d1deed0bca9902438e551504fe8ca536.zip", "author": "Yasong Feng;Weiguo Gao", "authorids": "~Yasong_Feng1;wggao@fudan.edu.cn", "gender": ";", "homepage": ";", "dblp": "250/2394;", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Yasong_Feng1;wggao@fudan.edu.cn", "aff": "Fudan University;", "aff_domain": "fdu.edu;", "position": "PhD student;", "bibtex": "@misc{\nfeng2021a,\ntitle={A new accelerated gradient method inspired by continuous-time perspective},\nauthor={Yasong Feng and Weiguo Gao},\nyear={2021},\nurl={https://openreview.net/forum?id=0DALDI-xyW4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=0DALDI-xyW4", "pdf_size": 0, "rating": "4;4;4;4", "confidence": "4;3;4;3", "wc_review": "234;557;236;240", "wc_reply_reviewers": "120;83;97;0", "wc_reply_authors": "388;435;660;597", "reply_reviewers": "1;1;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 316.75, 138.72522301297627 ], "wc_reply_reviewers_avg": [ 75.0, 45.271403777660794 ], "wc_reply_authors_avg": [ 520.0, 112.00223212061445 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:5p6vUpGaBkcJ:scholar.google.com/&scioq=A+new+accelerated+gradient+method+inspired+by+continuous-time+perspective&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Fudan University", "aff_unique_dep": "", "aff_unique_url": "https://www.fudan.edu.cn", "aff_unique_abbr": "Fudan", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "0EJjoRbFEcX", "title": "Understanding Classifiers with Generative Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Although deep neural networks are effective on supervised learning tasks, they have been shown to be brittle. They are prone to overfitting on their training distribution and are easily fooled by small adversarial perturbations. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.\nWe propose a generative model of the features extracted by a classifier, and show using rigorous hypothesis testing that errors tend to occur when features are assigned low-probability by our model. From this observation, we develop a detection criteria that we test against different sources of classification mistakes: mistakes made on the test set due to poor model generalization, adversarial samples and out-of-distribution samples. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.", "keywords": "OOD detection;adversarial samples detection;deep learning;classification", "primary_area": "", "supplementary_material": "", "author": "La\u00ebtitia Shao;Yang Song;Stefano Ermon", "authorids": "~La\u00ebtitia_Shao1;~Yang_Song1;~Stefano_Ermon1", "gender": "F;M;M", "homepage": ";https://yang-song.net;http://cs.stanford.edu/~ermon/", "dblp": ";;47/8135", "google_scholar": ";o_J2CroAAAAJ;", "orcid": ";;", "linkedin": "laetitiashao/;;", "or_profile": "~La\u00ebtitia_Shao1;~Yang_Song1;~Stefano_Ermon1", "aff": "Google;Stanford University;Stanford University", "aff_domain": "google.com;stanford.edu;stanford.edu", "position": "AI Resident;PhD student;Assistant Professor", "bibtex": "@misc{\nshao2021understanding,\ntitle={Understanding Classifiers with Generative Models},\nauthor={La{\\\"e}titia Shao and Yang Song and Stefano Ermon},\nyear={2021},\nurl={https://openreview.net/forum?id=0EJjoRbFEcX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=0EJjoRbFEcX", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;4;4;4", "wc_review": "478;201;476;244", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "297;160;328;0", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 349.75, 128.15688627615762 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 196.25, 129.74662808720694 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.75, 0.4330127018922193 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15458095681072856359&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1", "aff_unique_norm": "Google;Stanford University", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.stanford.edu", "aff_unique_abbr": "Google;Stanford", "aff_campus_unique_index": "0;1;1", "aff_campus_unique": "Mountain View;Stanford", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "0F_OC_oROWb", "title": "RSO: A Gradient Free Sampling Based Approach For Training Deep Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose RSO (random search optimization), a gradient free, sampling based approach for training deep neural networks. To this end, RSO adds a perturbation to a weight in a deep neural network and tests if it reduces the loss on a mini-batch. If this reduces the loss, the weight is updated, otherwise the existing weight is retained. Surprisingly, we find that repeating this process a few times for each weight is sufficient to train a deep neural network. The number of weight updates for RSO is an order of magnitude lesser when compared to backpropagation with SGD. RSO can make aggressive weight updates in each step as there is no concept of learning rate. The weight update step for individual layers is also not coupled with the magnitude of the loss. RSO is evaluated on classification tasks on MNIST and CIFAR-10 datasets with deep neural networks of 6 to 10 layers where it achieves an accuracy of 99.1% and 81.8% respectively. We also find that after updating the weights just 5 times, the algorithm obtains a classification accuracy of 98% on MNIST.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Rohun Tripathi;Bharat Singh", "authorids": "~Rohun_Tripathi1;~Bharat_Singh2", "gender": ";", "homepage": ";http://bharatsingh.net", "dblp": ";28/7685", "google_scholar": ";ig0q5c4AAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Rohun_Tripathi1;~Bharat_Singh2", "aff": ";Amazon", "aff_domain": ";amazon.com", "position": ";Researcher", "bibtex": "@misc{\ntripathi2021rso,\ntitle={{\\{}RSO{\\}}: A Gradient Free Sampling Based Approach For Training Deep Neural Networks},\nauthor={Rohun Tripathi and Bharat Singh},\nyear={2021},\nurl={https://openreview.net/forum?id=0F_OC_oROWb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=0F_OC_oROWb", "pdf_size": 0, "rating": "3;6;7;8", "confidence": "5;2;4;5", "wc_review": "534;248;504;509", "wc_reply_reviewers": "0;0;0;57", "wc_reply_authors": "305;285;374;647", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;2", "rating_avg": [ 6.0, 1.8708286933869707 ], "confidence_avg": [ 4.0, 1.224744871391589 ], "wc_review_avg": [ 448.75, 116.45895199597153 ], "wc_reply_reviewers_avg": [ 14.25, 24.681724007856502 ], "wc_reply_authors_avg": [ 402.75, 144.83158322686387 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.10910894511799618, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6520507311091221741&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Amazon", "aff_unique_dep": "Amazon.com, Inc.", "aff_unique_url": "https://www.amazon.com", "aff_unique_abbr": "Amazon", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "0Hj3tFCSjUd", "title": "Energy-based View of Retrosynthesis", "track": "main", "status": "Reject", "tldr": "", "abstract": "Retrosynthesis\u2014the process of identifying a set of reactants to synthesize a target molecule\u2014is of vital importance to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. However, the inner connections of these models are rarely discussed, and rigorous evaluations of these models are largely in need. In this paper, we propose a framework that unifies sequence- and graph-based methods as energy-based models (EBMs) with different energy functions. This unified point of view establishes connections between different models and identifies the differences between them, thereby promoting the understanding of model design. We also provide a comprehensive assessment of performance to the community. Moreover, we present a novel \u201cdual\u201d variant within the framework that performs consistent training over Bayesian forward- and backward-prediction by constraining the agreement between the two directions. This model improves the state of the art for template-free approaches where the reaction type is unknown and known.\n", "keywords": "Applications;Retrosynthesis;Energy-based Model", "primary_area": "", "supplementary_material": "", "author": "Ruoxi Sun;Hanjun Dai;Li Li;Steven Kearnes;Bo Dai", "authorids": "~Ruoxi_Sun2;~Hanjun_Dai1;~Li_Li8;~Steven_Kearnes1;~Bo_Dai1", "gender": "F;M;M;M;", "homepage": ";https://hanjun-dai.github.io;;;https://bo-dai.github.io/", "dblp": "72/7683;144/7311;;;64/2903", "google_scholar": "ut1-7LAAAAAJ;obpl7GQAAAAJ;MsImb-AAAAAJ;;TIKl_foAAAAJ", "orcid": ";;;;0009-0002-8070-574X", "linkedin": ";hanjun-dai;;;", "or_profile": "~Ruoxi_Sun2;~Hanjun_Dai1;~Li_Li8;~Steven_Kearnes1;~Bo_Dai1", "aff": "Google;Google Research;Google;;Google Brain", "aff_domain": "google.com;google.com;google.com;;google.com", "position": "Google;Researcher;Software Engineer;;Research Scientist", "bibtex": "@misc{\nsun2021energybased,\ntitle={Energy-based View of Retrosynthesis},\nauthor={Ruoxi Sun and Hanjun Dai and Li Li and Steven Kearnes and Bo Dai},\nyear={2021},\nurl={https://openreview.net/forum?id=0Hj3tFCSjUd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=0Hj3tFCSjUd", "pdf_size": 0, "rating": "5;5;5;8", "confidence": "4;5;4;4", "wc_review": "361;544;582;199", "wc_reply_reviewers": "0;205;0;0", "wc_reply_authors": "765;437;773;101", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 421.5, 153.24245495292746 ], "wc_reply_reviewers_avg": [ 51.25, 88.76760388790495 ], "wc_reply_authors_avg": [ 519.0, 276.8031791724943 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 29, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13512415224084211507&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "On the Universality of the Double Descent Peak in Ridgeless Regression", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2611", "id": "0IO5VdnSAaH", "poster": "", "openreview": "https://openreview.net/forum?id=0IO5VdnSAaH", "slides": "https://iclr.cc/virtual/2021/poster/2611", "video": "https://iclr.cc/virtual/2021/poster/2611", "tldr": "", "abstract": "We prove a non-asymptotic distribution-independent lower bound for the expected mean squared generalization error caused by label noise in ridgeless linear regression. Our lower bound generalizes a similar known result to the overparameterized (interpolating) regime. In contrast to most previous works, our analysis applies to a broad class of input distributions with almost surely full-rank feature matrices, which allows us to cover various types of deterministic or random feature maps. Our lower bound is asymptotically sharp and implies that in the presence of label noise, ridgeless linear regression does not perform well around the interpolation threshold for any of these feature maps. We analyze the imposed assumptions in detail and provide a theory for analytic (random) feature maps. Using this theory, we can show that our assumptions are satisfied for input distributions with a (Lebesgue) density and feature maps given by random deep neural networks with analytic activation functions like sigmoid, tanh, softplus or GELU. As further examples, we show that feature maps from random Fourier features and polynomial kernels also satisfy our assumptions. We complement our theory with further experimental and analytic results.", "keywords": "Double Descent;Interpolation Peak;Linear Regression;Random Features;Random Weights Neural Networks", "primary_area": "", "supplementary_material": "/attachment/d744d5f1846b482e075f9bc52c43c2e729fc5760.zip", "author": "David Holzm\u00fcller", "authorids": "~David_Holzm\u00fcller1", "gender": "M", "homepage": "https://www.isa.uni-stuttgart.de/en/institute/team/Holzmueller/", "dblp": "207/7947", "google_scholar": "https://scholar.google.de/citations?user=pIT7A7QAAAAJ", "orcid": "0000-0002-9443-0049", "linkedin": "david-holzm%C3%BCller-164a9b256/", "or_profile": "~David_Holzm\u00fcller1", "aff": "University of Stuttgart", "aff_domain": "uni-stuttgart.de", "position": "PhD student", "bibtex": "@inproceedings{\nholzm{\\\"u}ller2021on,\ntitle={On the Universality of the Double Descent Peak in Ridgeless Regression},\nauthor={David Holzm{\\\"u}ller},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0IO5VdnSAaH}\n}", "github": "[![github](/images/github_icon.svg) dholzmueller/universal_double_descent](https://github.com/dholzmueller/universal_double_descent)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "3;2;3;3", "wc_review": "357;224;159;137", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "544;291;35;19", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 219.25, 85.72156963098611 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 222.25, 214.83874766903665 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6446983561543714244&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=0IO5VdnSAaH", "email": "uni-stuttgart.de", "author_num": 1, "aff_unique_index": "0", "aff_unique_norm": "University of Stuttgart", "aff_unique_dep": "", "aff_unique_url": "https://www.uni-stuttgart.de", "aff_unique_abbr": "USTuttgart", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "title": "ALFWorld: Aligning Text and Embodied Environments for Interactive Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2973", "id": "0IOX0YcCdTn", "poster": "", "openreview": "https://openreview.net/forum?id=0IOX0YcCdTn", "slides": "https://iclr.cc/virtual/2021/poster/2973", "video": "https://iclr.cc/virtual/2021/poster/2973", "author_site": "Mohit Shridhar, Eric Yuan, Marc-Alexandre Cote, Yonatan Bisk, Adam Trischler, Matthew Hausknecht", "tldr": "", "abstract": "Given a simple request like Put a washed apple in the kitchen fridge, humans can reason in purely abstract terms by imagining action sequences and scoring their likelihood of success, prototypicality, and efficiency, all without moving a muscle. Once we see the kitchen in question, we can update our abstract plans to fit the scene. Embodied agents require the same abilities, but existing work does not yet provide the infrastructure necessary for both reasoning abstractly and executing concretely. We address this limitation by introducing ALFWorld, a simulator that enables agents to learn abstract, text-based policies in TextWorld (C\u00f4t\u00e9 et al., 2018) and then execute goals from the ALFRED benchmark (Shridhar et al., 2020) in a rich visual environment. ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions. In turn, as we demonstrate empirically, this fosters better agent generalization than training only in the visually grounded environment. BUTLER\u2019s simple, modular design factors the problem to allow researchers to focus on models for improving every piece of the pipeline (language understanding, planning, navigation, and visual scene understanding).", "keywords": "Textworld;Text-based Games;Embodied Agents;Language Grounding;Generalization;Imitation Learning;ALFRED", "primary_area": "", "supplementary_material": "/attachment/398e5a7448d9310f0c8a324cff107ff0f41b29f6.zip", "author": "Mohit Shridhar;Xingdi Yuan;Marc-Alexandre Cote;Yonatan Bisk;Adam Trischler;Matthew Hausknecht", "authorids": "~Mohit_Shridhar1;~Xingdi_Yuan2;~Marc-Alexandre_Cote1;~Yonatan_Bisk1;~Adam_Trischler1;~Matthew_Hausknecht1", "gender": "M;M;M;M;M;M", "homepage": "http://mohitshridhar.com/;https://www.microsoft.com/en-us/research/people/macote;http://www.YonatanBisk.com;https://www.microsoft.com/en-us/research/people/adtrisch/;https://mhauskn.github.io/;https://xingdi-eric-yuan.github.io/", "dblp": "203/8577.html;118/9636;38/9282;177/9137;26/7488;40/10147", "google_scholar": "CrfsfFSiS0kC;https://scholar.google.ca/citations?user=L83CE5gAAAAJ;bWoGh8UAAAAJ;https://scholar.google.ca/citations?user=EvUM6UUAAAAJ;lutJce0AAAAJ;hYfE-B8AAAAJ", "orcid": "0000-0001-7382-763X;;0000-0002-2111-9081;;;", "linkedin": ";;yonatanbisk/;;;", "or_profile": "~Mohit_Shridhar1;~Marc-Alexandre_Cote1;~Yonatan_Bisk1;~Adam_Trischler1;~Matthew_Hausknecht1;~Eric_Yuan1", "aff": "NVIDIA;Microsoft;Carnegie Mellon University;;Microsoft Research;Microsoft Research", "aff_domain": "nvidia.com;microsoft.com;cmu.edu;;microsoft.com;microsoft.com", "position": "NVIDIA;Principal Researcher;Assistant Professor;;Researcher;Senior Research Engineer", "bibtex": "@inproceedings{\nshridhar2021alfworld,\ntitle={{\\{}ALFW{\\}}orld: Aligning Text and Embodied Environments for Interactive Learning},\nauthor={Mohit Shridhar and Xingdi Yuan and Marc-Alexandre Cote and Yonatan Bisk and Adam Trischler and Matthew Hausknecht},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0IOX0YcCdTn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "4;6;7", "confidence": "5;4;3", "wc_review": "668;757;159", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1006;500;197", "reply_reviewers": "0;0;0", "reply_authors": "3;1;1", "rating_avg": [ 5.666666666666667, 1.247219128924647 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 528.0, 263.44006275938114 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 567.6666666666666, 333.7207748336258 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.9819805060619659, "gs_citation": 466, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11544973336902610716&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=0IOX0YcCdTn", "email": "nvidia.com;microsoft.com;cmu.edu;;microsoft.com;microsoft.com", "author_num": 6, "aff_unique_index": "0;1;2;1;1", "aff_unique_norm": "NVIDIA;Microsoft;Carnegie Mellon University", "aff_unique_dep": "NVIDIA Corporation;Microsoft Corporation;", "aff_unique_url": "https://www.nvidia.com;https://www.microsoft.com;https://www.cmu.edu", "aff_unique_abbr": "NVIDIA;Microsoft;CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "0Jr4rjA6glk", "title": "FAST DIFFERENTIALLY PRIVATE-SGD VIA JL PROJECTIONS", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations are the only known\nalgorithms for private training of large scale neural networks. This algorithm requires computation\nof per-sample gradients norms which is extremely slow and memory intensive in practice. In this\npaper, we present a new framework to design differentially private optimizers called DP-SGD-JL and\nDP-Adam-JL. Our approach uses Johnson\u2013Lindenstrauss (JL) projections to quickly approximate\nthe per-sample gradient norms without exactly computing them, thus making the training time and\nmemory requirements of our optimizers closer to that of their non-DP versions.\nOur algorithms achieve state-of-the-art privacy-vs-accuracy tradeoffs on MNIST and CIFAR10\ndatasets while being significantly faster. Unlike previous attempts to make DP-SGD faster which\nwork only on fully-connected or convolutional layers, our algorithms work for any network in a\nblack-box manner which is the main contribution of this paper. To illustrate this, on IMDb\ndataset, we train a Recurrent Neural Network (RNN) to achieve good privacy-vs-accuracy tradeoff,\nwhereas existing DP optimizers are either inefficient or inapplicable. On RNNs, our algorithms are\norders of magnitude faster than DP-SGD for large batch sizes.\nThe privacy analysis of our algorithms is more involved than DP-SGD, we use the recently proposed\nf-DP framework of Dong et al. (2019). In summary, we design new differentially private training\nalgorithms which are fast, achieve state-of-the-art privacy-vs-accuracy tradeoffs and generalize to all\nnetwork architectures.", "keywords": "Deep Learning;Differential Privacy;Optimization Algorithms", "primary_area": "", "supplementary_material": "", "author": "Zhiqi Bu;Sivakanth Gopi;Janardhan Kulkarni;Yin Tat Lee;Uthaipon Tantipongpipat", "authorids": "~Zhiqi_Bu1;sigopi@microsoft.com;~Janardhan_Kulkarni2;~Yin_Tat_Lee1;~Uthaipon_Tantipongpipat1", "gender": "M;;M;;M", "homepage": "https://sites.google.com/view/zhiqi-bu;;;;https://www.uthaipon.com", "dblp": "245/2573;;54/1978;;215/5265", "google_scholar": "MEvTLxIAAAAJ;;_fxnybwAAAAJ;;nzO_5FMAAAAJ", "orcid": ";;;;", "linkedin": ";;;;uthaipon/", "or_profile": "~Zhiqi_Bu1;sigopi@microsoft.com;~Janardhan_Kulkarni2;~Yin_Tat_Lee1;~Uthaipon_Tantipongpipat1", "aff": "University of Pennsylvania;;Microsoft Research, Redmond;;Twitter", "aff_domain": "upenn.edu;;microsoft.com;;twitter.com", "position": "PhD student;;Researcher;;Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=0Jr4rjA6glk", "pdf_size": 0, "rating": "4;7;7", "confidence": "3;2;3", "wc_review": "291;192;128", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 2.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 203.66666666666666, 67.05387551978053 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:osopMgkB5G0J:scholar.google.com/&scioq=FAST+DIFFERENTIALLY+PRIVATE-SGD+VIA+JL+PROJECTIONS&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Pennsylvania;Microsoft;Twitter, Inc.", "aff_unique_dep": ";Microsoft Research;", "aff_unique_url": "https://www.upenn.edu;https://www.microsoft.com/en-us/research;https://twitter.com", "aff_unique_abbr": "UPenn;MSR;Twitter", "aff_campus_unique_index": "1", "aff_campus_unique": ";Redmond", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "0LlujmaN0R_", "title": "Truthful Self-Play", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present a general framework for evolutionary learning to emergent unbiased state representation without any supervision. Evolutionary frameworks such as self-play converge to bad local optima in case of multi-agent reinforcement learning in non-cooperative partially observable environments with communication due to information asymmetry. Our proposed framework is a simple modification of self-play inspired by mechanism design, also known as {\\em reverse game theory}, to elicit truthful signals and make the agents cooperative. The key idea is to add imaginary rewards using the peer prediction method, i.e., a mechanism for evaluating the validity of information exchanged between agents in a decentralized environment. Numerical experiments with predator prey, traffic junction and StarCraft tasks demonstrate that the state-of-the-art performance of our framework.\n", "keywords": "Comm-POSG;Imaginary Rewards", "primary_area": "", "supplementary_material": "/attachment/0edd289e2c0df998337b3a7e65bbc3b7708422c8.zip", "author": "Shohei Ohsawa", "authorids": "~Shohei_Ohsawa1", "gender": "M", "homepage": "", "dblp": "32/9489", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~Shohei_Ohsawa1", "aff": "Daisy, inc.", "aff_domain": "daisy.id", "position": "Founder & CEO", "bibtex": "@misc{\nohsawa2021truthful,\ntitle={Truthful Self-Play},\nauthor={Shohei Ohsawa},\nyear={2021},\nurl={https://openreview.net/forum?id=0LlujmaN0R_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=0LlujmaN0R_", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;3;4;2", "wc_review": "943;1174;502;223", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 710.5, 370.8156550093321 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.8528028654224418, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:KKykEshhD9oJ:scholar.google.com/&scioq=Truthful+Self-Play&hl=en&as_sdt=0,33", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "Daisy Inc.", "aff_unique_dep": "", "aff_unique_url": "", "aff_unique_abbr": "" }, { "id": "0MjC3uMthAb", "title": "Learning Flexible Classifiers with Shot-CONditional Episodic (SCONE) Training", "track": "main", "status": "Reject", "tldr": "", "abstract": "Early few-shot classification work advocates for episodic training, i.e. training over learning episodes each posing a few-shot classification task. However, the role of this training regime remains poorly understood, and its usefulness is still debated. Standard classification training methods (``pre-training'') followed by episodic fine-tuning have recently achieved strong results. This work aims to understand the role of this episodic fine-tuning phase through an exploration of the effect of the ``shot'' setting (number of examples per class) that is used during fine-tuning. We discover that fine-tuning on episodes of a particular shot can specialize the pre-trained model to solving episodes of that shot at the expense of performance on other shots, in agreement with a trade-off recently observed in the context of end-to-end episodic training. To amend this, we propose a shot-conditional form of episodic fine-tuning, inspired from recent work that trains a single model on a distribution of losses. Our investigation shows that this improves overall performance, without suffering disproportionately on any shot. We also examine the usefulness of this approach on the large-scale Meta-Dataset benchmark where test episodes exhibit varying shots and imbalanced classes. We find that our flexible model improves performance in that challenging environment.", "keywords": "few-shot classification;few-shot learning;episodic training;meta-learning", "primary_area": "", "supplementary_material": "", "author": "Eleni Triantafillou;Vincent Dumoulin;Hugo Larochelle;Richard Zemel", "authorids": "~Eleni_Triantafillou1;~Vincent_Dumoulin1;~Hugo_Larochelle1;~Richard_Zemel1", "gender": "F;M;M;M", "homepage": "http://www.cs.toronto.edu/~eleni/;;https://mila.quebec/en/directory/hugo-larochelle;http://www.cs.columbia.edu/~zemel", "dblp": "183/8430;133/8606;86/3862.html;16/6366", "google_scholar": "Y5x2ZgQAAAAJ;https://scholar.google.ca/citations?user=mZfgLA4AAAAJ;https://scholar.google.ca/citations?user=U89FHq4AAAAJ;https://scholar.google.ca/citations?user=iBeDoRAAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Eleni_Triantafillou1;~Vincent_Dumoulin1;~Hugo_Larochelle1;~Richard_Zemel1", "aff": "University of Toronto;Google;Universit\u00e9 de Sherbrooke;Department of Computer Science, University of Toronto", "aff_domain": "toronto.edu;google.com;usherbrooke.ca;cs.toronto.edu", "position": "PhD student;Research Scientist;Adjunct Professor;Full Professor", "bibtex": "@misc{\ntriantafillou2021learning,\ntitle={Learning Flexible Classifiers with Shot-{\\{}CON{\\}}ditional Episodic ({\\{}SCONE{\\}}) Training},\nauthor={Eleni Triantafillou and Vincent Dumoulin and Hugo Larochelle and Richard Zemel},\nyear={2021},\nurl={https://openreview.net/forum?id=0MjC3uMthAb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=0MjC3uMthAb", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;1;3;5", "wc_review": "594;232;403;751", "wc_reply_reviewers": "620;0;50;0", "wc_reply_authors": "1389;283;837;794", "reply_reviewers": "2;0;1;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 1.479019945774904 ], "wc_review_avg": [ 495.0, 195.55689709135805 ], "wc_reply_reviewers_avg": [ 167.5, 262.0472285676763 ], "wc_reply_authors_avg": [ 825.75, 391.45968821834003 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.15289415743128767, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17973637788742773610&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "University of Toronto;Google;Universit\u00e9 de Sherbrooke", "aff_unique_dep": ";Google;", "aff_unique_url": "https://www.utoronto.ca;https://www.google.com;https://www.usherbrooke.ca", "aff_unique_abbr": "U of T;Google;UdeS", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Mountain View;Toronto", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "Canada;United States" }, { "title": "Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3039", "id": "0N8jUH4JMv6", "poster": "", "openreview": "https://openreview.net/forum?id=0N8jUH4JMv6", "slides": "https://iclr.cc/virtual/2021/poster/3039", "video": "https://iclr.cc/virtual/2021/poster/3039", "author_site": "Tolga Ergen, Mert Pilanci", "tldr": "", "abstract": "We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for several two- and three-layer CNN architectures. We first prove that two-layer CNNs can be globally optimized via an $\\ell_2$ norm regularized convex program. We then show that multi-layer circular CNN training problems with a single ReLU layer are equivalent to an $\\ell_1$ regularized convex program that encourages sparsity in the spectral domain. We also extend these results to three-layer CNNs with two ReLU layers. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers.", "keywords": "Convex optimization;non-convex optimization;group sparsity;$\\ell_1$ norm;convex duality;polynomial time;deep learning", "primary_area": "", "supplementary_material": "/attachment/23c27d5b1b70ef482cf6bbe10447fa784d606670.zip", "author": "Tolga Ergen;Mert Pilanci", "authorids": "~Tolga_Ergen1;~Mert_Pilanci3", "gender": "M;M", "homepage": "https://tolgaergen.github.io/;https://stanford.edu/~pilanci/", "dblp": "202/7477.html;45/8056", "google_scholar": "https://scholar.google.com.tr/citations?user=T1pWaCsAAAAJ;aSAS-aAAAAAJ", "orcid": "0000-0003-4806-0224;", "linkedin": ";mert-pilanci-ba615743/", "or_profile": "~Tolga_Ergen1;~Mert_Pilanci3", "aff": "Stanford University;Stanford University", "aff_domain": "stanford.edu;stanford.edu", "position": "PhD student;Assistant Professor", "bibtex": "@inproceedings{\nergen2021implicit,\ntitle={Implicit Convex Regularizers of {\\{}CNN{\\}} Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time},\nauthor={Tolga Ergen and Mert Pilanci},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0N8jUH4JMv6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7", "confidence": "4;2;3", "wc_review": "540;411;533", "wc_reply_reviewers": "0;351;0", "wc_reply_authors": "1119;1283;404", "reply_reviewers": "0;2;0", "reply_authors": "2;4;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 494.6666666666667, 59.230247527949956 ], "wc_reply_reviewers_avg": [ 117.0, 165.46298679765212 ], "wc_reply_authors_avg": [ 935.3333333333334, 381.62838235936044 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 2.3333333333333335, 1.247219128924647 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 54, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12847966933936756932&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=0N8jUH4JMv6", "email": "stanford.edu;stanford.edu", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Stanford University", "aff_unique_dep": "", "aff_unique_url": "https://www.stanford.edu", "aff_unique_abbr": "Stanford", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Stanford", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "0NQdxInFWT_", "title": "Active Deep Probabilistic Subsampling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Subsampling a signal of interest can reduce costly data transfer, battery drain, radiation exposure and acquisition time in a wide range of problems. The recently proposed Deep Probabilistic Subsampling (DPS) method effectively integrates subsampling in an end-to-end deep learning model, but learns a static pattern for all datapoints. We generalize DPS to a sequential method that actively picks the next sample based on the information acquired so far; dubbed Active-DPS (A-DPS). We validate that A-DPS improves over DPS for MNIST classification at high subsampling rates. We observe that A-DPS learns to actively adapt based on the previously sampled elements, yielding different sampling sequences across the dataset. Moreover, we demonstrate strong performance in active acquisition Magnetic Resonance Image (MRI) reconstruction, outperforming DPS and other deep learning methods.", "keywords": "Compressed Sensing;subsampling;active acquisition;accelerated MRI", "primary_area": "", "supplementary_material": "", "author": "Hans van Gorp;Iris A.M. Huijben;Bastiaan S. Veeling;Nicola Pezzotti;Ruud Van Sloun", "authorids": "~Hans_van_Gorp1;~Iris_A.M._Huijben1;~Bastiaan_S._Veeling1;~Nicola_Pezzotti2;~Ruud_Van_Sloun1", "gender": "M;;F;;F", "homepage": ";https://nicola17.github.io/;https://www.tue.nl/en/research/researchers/ruud-van-sloun;;", "dblp": "296/9596;;162/9715.html;https://dblp.uni-trier.de/pers/hd/v/Veeling:Bastiaan_S=;247/0968", "google_scholar": "S0kwrtQAAAAJ;https://scholar.google.co.uk/citations?user=61To93wAAAAJ;gQQJgocAAAAJ;qStzdQsAAAAJ;https://scholar.google.nl/citations?user=1ReBr6sAAAAJ", "orcid": "0000-0003-4823-2874;;;;0000-0002-2629-3898", "linkedin": "hans-van-gorp-87a954136/;;;;", "or_profile": "~Hans_van_Gorp1;~Nicola_Pezzotti2;~Ruud_Van_Sloun1;~Bastiaan_Veeling1;~Iris_Anne_Marie_Huijben1", "aff": "Philips Research;Philips Research;Eindhoven University of Technology;Philips Research;Eindhoven University of Technology", "aff_domain": "philips.com;philips.com;tue.nl;philips.com;tue.nl", "position": "Researcher;Postdoc;Assistant Professor;Research Scientist;PhD student", "bibtex": "@misc{\ngorp2021active,\ntitle={Active Deep Probabilistic Subsampling},\nauthor={Hans van Gorp and Iris A.M. Huijben and Bastiaan S. Veeling and Nicola Pezzotti and Ruud Van Sloun},\nyear={2021},\nurl={https://openreview.net/forum?id=0NQdxInFWT_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=0NQdxInFWT_", "pdf_size": 0, "rating": "6;6;6", "confidence": "4;3;4", "wc_review": "377;472;627", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "313;468;699", "reply_reviewers": "0;0;0", "reply_authors": "2;2;2", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 492.0, 103.03721010715822 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 493.3333333333333, 158.59872494933734 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 27, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17221435126218428736&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;0;1;0;1", "aff_unique_norm": "Philips Research;Eindhoven University of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.philips.com/research;https://www.tue.nl", "aff_unique_abbr": "Philips Research;TU/e", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "Netherlands" }, { "title": "Gradient Origin Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3137", "id": "0O_cQfw6uEh", "poster": "", "openreview": "https://openreview.net/forum?id=0O_cQfw6uEh", "slides": "https://iclr.cc/virtual/2021/poster/3137", "video": "https://iclr.cc/virtual/2021/poster/3137", "author_site": "Sam Bond-Taylor, Chris G Willcocks", "tldr": "", "abstract": "This paper proposes a new type of generative model that is able to quickly learn a latent representation without an encoder. This is achieved using empirical Bayes to calculate the expectation of the posterior, which is implemented by initialising a latent vector with zeros, then using the gradient of the log-likelihood of the data with respect to this zero vector as new latent points. The approach has similar characteristics to autoencoders, but with a simpler architecture, and is demonstrated in a variational autoencoder equivalent that permits sampling. This also allows implicit representation networks to learn a space of implicit functions without requiring a hypernetwork, retaining their representation advantages across datasets. The experiments show that the proposed method converges faster, with significantly lower reconstruction error than autoencoders, while requiring half the parameters.", "keywords": "Deep Learning;Generative Models;Implicit Representation", "primary_area": "", "supplementary_material": "/attachment/cde873132a2f451f4abacb3d06d819888bb5428b.zip", "author": "Sam Bond-Taylor;Chris G. Willcocks", "authorids": "~Sam_Bond-Taylor1;christopher.g.willcocks@durham.ac.uk", "gender": ";", "homepage": "https://www.dur.ac.uk/research/directory/staff/?id=18951;", "dblp": "https://dblp.uni-trier.de/pid/270/0020;", "google_scholar": "https://scholar.google.co.uk/citations?user=xQ4rXyoAAAAJ;", "orcid": "0000-0003-1538-7909;", "linkedin": ";", "or_profile": "~Sam_Bond-Taylor1;christopher.g.willcocks@durham.ac.uk", "aff": "Durham University;", "aff_domain": "durham.ac.uk;", "position": "PhD student;", "bibtex": "@inproceedings{\nbond-taylor2021gradient,\ntitle={Gradient Origin Networks},\nauthor={Sam Bond-Taylor and Chris G. Willcocks},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0O_cQfw6uEh}\n}", "github": "[![github](/images/github_icon.svg) cwkx/GON](https://github.com/cwkx/GON) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=0O_cQfw6uEh)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;7;7", "confidence": "4;4;4", "wc_review": "253;395;813", "wc_reply_reviewers": "0;91;0", "wc_reply_authors": "630;521;1084", "reply_reviewers": "0;1;0", "reply_authors": "1;2;2", "rating_avg": [ 6.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 487.0, 237.69448177580114 ], "wc_reply_reviewers_avg": [ 30.333333333333332, 42.897811391983886 ], "wc_reply_authors_avg": [ 745.0, 243.80456654186497 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 17, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=861384408190875414&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=0O_cQfw6uEh", "email": "durham.ac.uk;", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "Durham University", "aff_unique_dep": "", "aff_unique_url": "https://www.dur.ac.uk", "aff_unique_abbr": "Durham", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "title": "Learning Parametrised Graph Shift Operators", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2932", "id": "0OlrLvrsHwQ", "poster": "", "openreview": "https://openreview.net/forum?id=0OlrLvrsHwQ", "slides": "https://iclr.cc/virtual/2021/poster/2932", "video": "https://iclr.cc/virtual/2021/poster/2932", "author_site": "George Dasoulas, Johannes Lutzeyer, Michalis Vazirgiannis", "tldr": "", "abstract": "In many domains data is currently represented as graphs and therefore, the graph representation of this data becomes increasingly important in machine learning. Network data is, implicitly or explicitly, always represented using a graph shift operator (GSO) with the most common choices being the adjacency, Laplacian matrices and their normalisations. In this paper, a novel parametrised GSO (PGSO) is proposed, where specific parameter values result in the most commonly used GSOs and message-passing operators in graph neural network (GNN) frameworks. The PGSO is suggested as a replacement of the standard GSOs that are used in state-of-the-art GNN architectures and the optimisation of the PGSO parameters is seamlessly included in the model training. It is proved that the PGSO has real eigenvalues and a set of real eigenvectors independent of the parameter values and spectral bounds on the PGSO are derived. PGSO parameters are shown to adapt to the sparsity of the graph structure in a study on stochastic blockmodel networks, where they are found to automatically replicate the GSO regularisation found in the literature. On several real-world datasets the accuracy of state-of-the-art GNN architectures is improved by the inclusion of the PGSO in both node- and graph-classification tasks. ", "keywords": "graph neural networks;graph shift operators;graph classification;node classification;graph representation learning", "primary_area": "", "supplementary_material": "", "author": "George Dasoulas;Johannes F. Lutzeyer;Michalis Vazirgiannis", "authorids": "~George_Dasoulas1;~Johannes_F._Lutzeyer1;~Michalis_Vazirgiannis1", "gender": ";M;M", "homepage": ";https://johanneslutzeyer.com/;", "dblp": ";253/8868;v/MVazirgiannis", "google_scholar": "WPFAXNAAAAAJ;OfT4ns8AAAAJ;https://scholar.google.gr/citations?user=aWGJYcMAAAAJ", "orcid": ";;", "linkedin": ";johannes-lutzeyer-213b7480/;", "or_profile": "~George_Dasoulas1;~Johannes_F._Lutzeyer1;~Michalis_Vazirgiannis1", "aff": "Ecole polytechnique;Ecole Polytechnique;Ecole Polytechnique, France", "aff_domain": "polytechnique.edu;polytechnique.edu;polytechnique.fr", "position": "PhD student;Postdoc;Full Professor", "bibtex": "@inproceedings{\ndasoulas2021learning,\ntitle={Learning Parametrised Graph Shift Operators},\nauthor={George Dasoulas and Johannes F. Lutzeyer and Michalis Vazirgiannis},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0OlrLvrsHwQ}\n}", "github": "[![github](/images/github_icon.svg) gdasoulas/pgso](https://github.com/gdasoulas/pgso)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;7;7;7", "confidence": "4;3;4;4", "wc_review": "143;205;457;379", "wc_reply_reviewers": "512;0;0;0", "wc_reply_authors": "819;443;720;629", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 296.0, 126.98425099200294 ], "wc_reply_reviewers_avg": [ 128.0, 221.70250336881628 ], "wc_reply_authors_avg": [ 652.75, 138.49255395146702 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 32, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8306422072823409247&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=0OlrLvrsHwQ", "email": "polytechnique.edu;polytechnique.edu;polytechnique.fr", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Ecole Polytechnique", "aff_unique_dep": "", "aff_unique_url": "https://www.polytechnique.edu", "aff_unique_abbr": "X", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "France" }, { "title": "Generalized Energy Based Models", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3223", "id": "0PtUPB9z6qK", "poster": "", "openreview": "https://openreview.net/forum?id=0PtUPB9z6qK", "slides": "https://iclr.cc/virtual/2021/poster/3223", "video": "https://iclr.cc/virtual/2021/poster/3223", "author_site": "Michael Arbel, Liang Zhou, Arthur Gretton", "tldr": "", "abstract": "We introduce the Generalized Energy Based Model (GEBM) for generative modelling. These models combine two trained components: a base distribution (generally an implicit model), which can learn the support of data with low intrinsic dimension in a high dimensional space; and an energy function, to refine the probability mass on the learned support. \nBoth the energy function and base jointly constitute the final model, unlike GANs, which retain only the base distribution (the \"generator\"). \nGEBMs are trained by alternating between learning the energy and the base. \nWe show that both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base.\nSamples from the posterior on the latent space of the trained model can be obtained via MCMC, thus finding regions in this space that produce better quality samples.\nEmpirically, the GEBM samples on image-generation tasks are of much better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity. When using normalizing flows as base measures, GEBMs succeed on density modelling tasks returning comparable performance to direct maximum likelihood of the same networks.", "keywords": "Sampling;MCMC;Generative Models;Adversarial training;Optimization;Density estimation", "primary_area": "", "supplementary_material": "/attachment/767e353c4eeb2cd94664ff2e5b0266e17d5aed93.zip", "author": "Michael Arbel;Liang Zhou;Arthur Gretton", "authorids": "~Michael_Arbel1;liang.zhou.18@ucl.ac.uk;~Arthur_Gretton1", "gender": "M;;M", "homepage": "https://michaelarbel.github.io/;;http://www.gatsby.ucl.ac.uk/~gretton/", "dblp": "200/8609;;56/2574", "google_scholar": "NsOqVtkAAAAJ;;OUv7J6QAAAAJ", "orcid": ";;", "linkedin": "michael-arbel-0a38a655/;;", "or_profile": "~Michael_Arbel1;liang.zhou.18@ucl.ac.uk;~Arthur_Gretton1", "aff": "University College London;;University College London", "aff_domain": "ucl.ac.uk;;ucl.ac.uk", "position": "PhD student;;Professor", "bibtex": "@inproceedings{\narbel2021generalized,\ntitle={Generalized Energy Based Models},\nauthor={Michael Arbel and Liang Zhou and Arthur Gretton},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0PtUPB9z6qK}\n}", "github": "[![github](/images/github_icon.svg) MichaelArbel/GeneralizedEBM](https://github.com/MichaelArbel/GeneralizedEBM)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "5;6;6", "confidence": "3;3;4", "wc_review": "279;384;326", "wc_reply_reviewers": "0;538;0", "wc_reply_authors": "596;2241;871", "reply_reviewers": "0;2;0", "reply_authors": "2;4;2", "rating_avg": [ 5.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 329.6666666666667, 42.94440850939994 ], "wc_reply_reviewers_avg": [ 179.33333333333334, 253.615632185575 ], "wc_reply_authors_avg": [ 1236.0, 719.4558128659929 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 2.6666666666666665, 0.9428090415820634 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 145, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8950051300346719301&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=0PtUPB9z6qK", "email": "ucl.ac.uk;;ucl.ac.uk", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "University College London", "aff_unique_dep": "", "aff_unique_url": "https://www.ucl.ac.uk", "aff_unique_abbr": "UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "0SPUQoRMAvc", "title": "Semantic-Guided Representation Enhancement for Self-supervised Monocular Trained Depth Estimation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Self-supervised depth estimation has shown its great effectiveness in producing high quality depth maps given only image sequences as input. However, its performance usually drops when estimating on border areas or objects with thin structures due to the limited depth representation ability. In this paper, we address this problem by proposing a semantic-guided depth representation enhancement method, which promotes both local and global depth feature representations by leveraging rich contextual information. In stead of a single depth network as used in conventional paradigms, we propose an extra semantic segmentation branch to offer extra contextual features for depth estimation. Based on this framework, we enhance the local feature representation by sampling and feeding the point-based features that locate on the semantic edges to an individual Semantic-guided Edge Enhancement module (SEEM), which is specifically designed for promoting depth estimation on the challenging semantic borders. Then, we improve the global feature representation by proposing a semantic-guided multi-level attention mechanism, which enhances the semantic and depth features by exploring pixel-wise correlations in the multi-level depth decoding scheme. Extensive experiments validate the distinct superiority of our method in capturing highly accurate depth on the challenging image areas such as semantic category borders and thin objects. Both quantitative and qualitative experiments on KITTI show that our method outperforms the state-of-the-art methods.", "keywords": "Self-supervised depth estimation;semantic-guided depth;multitask learning;semantic-guided attention mechanism", "primary_area": "", "supplementary_material": "", "author": "Rui Li;Qing Mao;Pei Wang;Xiantuo He;Yu Zhu;Jinqiu Sun;Yanning Zhang", "authorids": "~Rui_Li10;maoqing@mail.nwpu.edu.cn;wangpei23@mail.nwpu.edu.cn;xiantuohe@foxmail.com;yuzhu@nwpu.edu.cn;sunjinqiu@nwpu.edu.cn;~Yanning_Zhang1", "gender": ";;;;;;F", "homepage": ";;;;;;http://teacher.nwpu.edu.cn/ynzhang", "dblp": ";;;;;;14/6655", "google_scholar": ";;;;;;", "orcid": ";;;;;;", "linkedin": ";;;;;;", "or_profile": "~Rui_Li10;maoqing@mail.nwpu.edu.cn;wangpei23@mail.nwpu.edu.cn;xiantuohe@foxmail.com;yuzhu@nwpu.edu.cn;sunjinqiu@nwpu.edu.cn;~Yanning_Zhang1", "aff": ";;;;;;Northwestern Polytechnical University", "aff_domain": ";;;;;;nwpu.edu.cn", "position": ";;;;;;Full Professor", "bibtex": "@misc{\nli2021semanticguided,\ntitle={Semantic-Guided Representation Enhancement for Self-supervised Monocular Trained Depth Estimation},\nauthor={Rui Li and Qing Mao and Pei Wang and Xiantuo He and Yu Zhu and Jinqiu Sun and Yanning Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=0SPUQoRMAvc}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=0SPUQoRMAvc", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "5;4;5;4", "wc_review": "462;335;539;573", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1647;1072;1164;1195", "reply_reviewers": "0;0;0;0", "reply_authors": "3;2;2;2", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 477.25, 91.444997129422 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1269.5, 222.5943620130573 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13211674730976814095&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Northwestern Polytechnical University", "aff_unique_dep": "", "aff_unique_url": "https://www.nwpu.edu.cn", "aff_unique_abbr": "NWPU", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "0Su7gvitc1H", "title": "ARMCMC: Online Model Parameters full probability Estimation in Bayesian Paradigm", "track": "main", "status": "Reject", "tldr": "", "abstract": "Although the Bayesian paradigm provides a rigorous framework to estimate the full probability distribution over unknown parameters, its online implementation can be challenging due to heavy computational costs. This paper proposes Adaptive Recursive Markov Chain Monte Carlo (ARMCMC) which estimates full probability density of model parameters while alleviating shortcomings of conventional online approaches. These shortcomings include: being solely able to account for Gaussian noise, being applicable to systems with linear in the parameters (LIP) constraint, or having requirements on persistence excitation (PE). In ARMCMC, we propose a variable jump distribution, which depends on a temporal forgetting factor. This allows one to adjust the trade-off between exploitation and exploration, depending on whether there is an abrupt change to the parameter being estimated. We prove that ARMCMC requires fewer samples to achieve the same precision and reliability compared to conventional MCMC approaches. We demonstrate our approach on two challenging benchmark: the estimation of parameters in a soft bending actuator and the Hunt-Crossley dynamic model. Our method shows at-least 70% improvement in parameter point estimation accuracy and approximately 55% reduction in tracking error of the value of interest compared to recursive least squares and conventional MCMC.", "keywords": "Bayesian estimation;Full probability distribution;MCMC;Hybrid non-Gaussian system", "primary_area": "", "supplementary_material": "/attachment/550edfd142cc71998f3be1e0cd82d60d00ae316b.zip", "author": "Pedram Agand;Mo Chen;Hamid D. Taghirad", "authorids": "~Pedram_Agand1;~Mo_Chen1;taghirad@kntu.ac.ir", "gender": "M;M;", "homepage": "https://upaspro.com/pedram-agand/;http://www.sfu.ca/~mochen/;", "dblp": "207/0639;;", "google_scholar": "https://scholar.google.ca/citations?user=URfHnY4AAAAJ;https://scholar.google.ca/citations?user=19UAgLUAAAAJ;", "orcid": ";0000-0001-8506-3665;", "linkedin": "agand/;;", "or_profile": "~Pedram_Agand1;~Mo_Chen1;taghirad@kntu.ac.ir", "aff": "Simon Fraser University;Simon Fraser University;", "aff_domain": "sfu.ca;sfu.ca;", "position": "PhD student;Assistant Professor;", "bibtex": "@misc{\nagand2021armcmc,\ntitle={{\\{}ARMCMC{\\}}: Online Model Parameters full probability Estimation in Bayesian Paradigm},\nauthor={Pedram Agand and Mo Chen and Hamid D. Taghirad},\nyear={2021},\nurl={https://openreview.net/forum?id=0Su7gvitc1H}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=0Su7gvitc1H", "pdf_size": 0, "rating": "5;6;7", "confidence": "4;4;4", "wc_review": "763;1115;522", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "725;803;714", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 800.0, 243.50085557686788 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 747.3333333333334, 39.617616732402716 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:JT9rMP3-ZUkJ:scholar.google.com/&scioq=ARMCMC:+Online+Model+Parameters+full+probability+Estimation+in+Bayesian+Paradigm&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Simon Fraser University", "aff_unique_dep": "", "aff_unique_url": "https://www.sfu.ca", "aff_unique_abbr": "SFU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Canada" }, { "id": "0WWj8muw_rj", "title": "Adaptive Gradient Methods Can Be Provably Faster than SGD with Random Shuffling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adaptive gradient methods have been shown to outperform SGD in many tasks of training neural networks. However, the acceleration effect is yet to be explained in the non-convex setting since the best convergence rate of adaptive gradient methods is worse than that of SGD in literature. In this paper, we prove that adaptive gradient methods exhibit an $\\small\\tilde{O}(T^{-1/2})$-convergence rate for finding first-order stationary points under the strong growth condition, which improves previous best convergence results of adaptive gradient methods and random shuffling SGD by factors of $\\small O(T^{-1/4})$ and $\\small O(T^{-1/6})$, respectively. In particular, we study two variants of AdaGrad with random shuffling for finite sum minimization. Our analysis suggests that the combination of random shuffling and adaptive learning rates gives rise to better convergence.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/7aaef28a6147221a68418053bb3bd8a8f2e4b0d5.zip", "author": "Xunpeng Huang;Vicky Jiaqi Zhang;Hao Zhou;Lei Li", "authorids": "~Xunpeng_Huang1;~Vicky_Jiaqi_Zhang2;zhouhao.nlp@bytedance.com;~Lei_Li11", "gender": "M;;;M", "homepage": ";;;https://www.cs.cmu.edu/~leili", "dblp": "204/2943;;;13/7007-5.html", "google_scholar": ";;;BYXqAlwAAAAJ", "orcid": ";;;0000-0003-3095-9776", "linkedin": ";;;", "or_profile": "~Xunpeng_Huang1;~Vicky_Jiaqi_Zhang2;zhouhao.nlp@bytedance.com;~Lei_Li11", "aff": ";;;ByteDance AI Lab", "aff_domain": ";;;bytedance.com", "position": ";;;Director", "bibtex": "@misc{\nhuang2021adaptive,\ntitle={Adaptive Gradient Methods Can Be Provably Faster than {\\{}SGD{\\}} with Random Shuffling},\nauthor={Xunpeng Huang and Vicky Jiaqi Zhang and Hao Zhou and Lei Li},\nyear={2021},\nurl={https://openreview.net/forum?id=0WWj8muw_rj}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=0WWj8muw_rj", "pdf_size": 0, "rating": "3;4;4;7", "confidence": "4;4;4;4", "wc_review": "191;1620;423;411", "wc_reply_reviewers": "210;981;0;0", "wc_reply_authors": "438;1680;388;218", "reply_reviewers": "1;4;0;0", "reply_authors": "2;5;1;1", "rating_avg": [ 4.5, 1.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 661.25, 561.1873016204127 ], "wc_reply_reviewers_avg": [ 297.75, 403.6832762203557 ], "wc_reply_authors_avg": [ 681.0, 582.5092273947255 ], "reply_reviewers_avg": [ 1.25, 1.6393596310755 ], "reply_authors_avg": [ 2.25, 1.6393596310755 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ePRkxeGtqwQJ:scholar.google.com/&scioq=Adaptive+Gradient+Methods+Can+Be+Provably+Faster+than+SGD+with+Random+Shuffling&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "ByteDance", "aff_unique_dep": "AI Lab", "aff_unique_url": "https://www.bytedance.com", "aff_unique_abbr": "ByteDance", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "title": "Evolving Reinforcement Learning Algorithms", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3056", "id": "0XXpJ4OtjW", "poster": "", "openreview": "https://openreview.net/forum?id=0XXpJ4OtjW", "slides": "https://iclr.cc/virtual/2021/poster/3056", "video": "https://iclr.cc/virtual/2021/poster/3056", "author_site": "John Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust", "tldr": "", "abstract": "We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.", "keywords": "reinforcement learning;evolutionary algorithms;meta-learning;genetic programming", "primary_area": "", "supplementary_material": "/attachment/ee0e0d71258074de1386100e5e43ff26745650de.zip", "author": "John D Co-Reyes;Yingjie Miao;Daiyi Peng;Esteban Real;Quoc V Le;Sergey Levine;Honglak Lee;Aleksandra Faust", "authorids": "~John_D_Co-Reyes1;~Yingjie_Miao1;~Daiyi_Peng1;ereal@google.com;~Quoc_V_Le1;~Sergey_Levine1;~Honglak_Lee2;~Aleksandra_Faust1", "gender": "M;;M;;M;M;;F", "homepage": ";;http://www.daiyip.org;;;https://people.eecs.berkeley.edu/~svlevine/;;http://www.afaust.info", "dblp": "198/1129;22/10043;;;29/6166;80/7594;;135/8420", "google_scholar": ";ScqM05wAAAAJ;_8Egwg8AAAAJ;;;8R35rCwAAAAJ;;RK72t68AAAAJ", "orcid": ";;;;;;;0000-0002-3268-8685", "linkedin": ";yingjiemiao/;;;;;;aleksandrafaust", "or_profile": "~John_D_Co-Reyes1;~Yingjie_Miao1;~Daiyi_Peng1;ereal@google.com;~Quoc_V_Le1;~Sergey_Levine1;~Honglak_Lee2;~Aleksandra_Faust1", "aff": "University of California, Berkeley;Google DeepMind;Google;;Google;Google;;Google Brain", "aff_domain": "berkeley.edu;google.com;google.com;;google.com;google.com;;google.com", "position": "PhD student;Software Engineer;Researcher;;Scientist;Research Scientist;;Principal Researcher", "bibtex": "@inproceedings{\nco-reyes2021evolving,\ntitle={Evolving Reinforcement Learning Algorithms},\nauthor={John D Co-Reyes and Yingjie Miao and Daiyi Peng and Esteban Real and Quoc V Le and Sergey Levine and Honglak Lee and Aleksandra Faust},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0XXpJ4OtjW}\n}", "github": "[![github](/images/github_icon.svg) google/brain_autorl](https://github.com/google/brain_autorl/tree/main/evolving_rl) + [![Papers with Code](/images/pwc_icon.svg) 4 community implementations](https://paperswithcode.com/paper/?openreview=0XXpJ4OtjW)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;7;9", "confidence": "3;3;4", "wc_review": "339;180;829", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "247;258;262", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 7.333333333333333, 1.247219128924647 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 449.3333333333333, 276.20081261446154 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 255.66666666666666, 6.342099196813483 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.944911182523068, "gs_citation": 99, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7437762203145145199&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=0XXpJ4OtjW", "email": "berkeley.edu;google.com;google.com;;google.com;google.com;;google.com", "author_num": 8, "aff_unique_index": "0;1;1;1;1;1", "aff_unique_norm": "University of California, Berkeley;Google", "aff_unique_dep": ";Google DeepMind", "aff_unique_url": "https://www.berkeley.edu;https://deepmind.com", "aff_unique_abbr": "UC Berkeley;DeepMind", "aff_campus_unique_index": "0;2;2;2;2", "aff_campus_unique": "Berkeley;;Mountain View", "aff_country_unique_index": "0;1;0;0;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "0Zxk3ynq7jE", "title": "An Empirical Exploration of Open-Set Recognition via Lightweight Statistical Pipelines", "track": "main", "status": "Reject", "tldr": "", "abstract": "Machine-learned safety-critical systems need to be self-aware and reliably know their unknowns in the open-world. This is often explored through the lens of anomaly/outlier detection or out-of-distribution modeling. One popular formulation is that of open-set classification, where an image classifier trained for 1-of-$K$ classes should also recognize images belonging to a $(K+1)^{th}$ \"other\" class, not present in the training set. Recent work has shown that, somewhat surprisingly, most if not all existing open-world methods do not work well on high-dimensional open-world images (Shafaei et al. 2019). In this paper, we carry out an empirical exploration of open-set classification, and find that combining classic statistical methods with carefully computed features can dramatically outperform prior work. We extract features from off-the-shelf (OTS) state-of-the-art networks for the underlying $K$-way closed-world task. We leverage insights from the retrieval community for computing feature descriptors that are low-dimensional (via pooling and PCA) and normalized (via L2-normalization), enabling the modeling of training data densities via classic statistical tools such as kmeans and Gaussian Mixture Models (GMMs).", "keywords": "open-set recognition;anomaly detection;statistical models;Gaussian Mixture Models;open-world image classification;open-world semantic segmentation", "primary_area": "", "supplementary_material": "/attachment/7d05c8b54986b9d08fc4f128999e46f7a35ecdf4.zip", "author": "Shu Kong;Deva Ramanan", "authorids": "~Shu_Kong1;~Deva_Ramanan1", "gender": "M;M", "homepage": "https://aimerykong.github.io/;https://www.cs.cmu.edu/~deva/", "dblp": "26/11141;49/488", "google_scholar": "sm9FdLoAAAAJ;9B8PoXUAAAAJ", "orcid": "0000-0002-1362-5937;", "linkedin": "aimerykong/;", "or_profile": "~Shu_Kong1;~Deva_Ramanan1", "aff": "Carnegie Mellon University;School of Computer Science, Carnegie Mellon University", "aff_domain": "cmu.edu;cs.cmu.edu", "position": "Postdoc Fellow;Full Professor", "bibtex": "@misc{\nkong2021an,\ntitle={An Empirical Exploration of Open-Set Recognition via Lightweight Statistical Pipelines},\nauthor={Shu Kong and Deva Ramanan},\nyear={2021},\nurl={https://openreview.net/forum?id=0Zxk3ynq7jE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=0Zxk3ynq7jE", "pdf_size": 0, "rating": "3;3;4;7", "confidence": "5;3;4;4", "wc_review": "156;413;625;456", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "260;443;523;63", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.6393596310755 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 412.5, 167.95907239562857 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 322.25, 177.45897413205114 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5776774303471944239&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "0_YzHnuthDf", "title": "Invariant Batch Normalization for Multi-source Domain Generalization", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We consider the domain generalization problem, where the test domain differs from the training domain. For deep neural networks, we show that the batch normalization layer is a highly unstable component under such domain shifts, and we identify two sources for its instability. Based on this observation, we propose a new learning formulation that can learn robust neural networks so that the corresponding batch normalization layers are invariant under domain shifts. Experimental results on three standard domain generalization benchmarks demonstrate that our method can learn neural network models with significantly more stable batch normalization layers on unseen domains, and the improved stability leads to superior generalization performances.", "keywords": "Domain generalization;Invariant learning;Batch Normalization", "primary_area": "", "supplementary_material": "", "author": "Qing LIAN;LIN Yong;Tong Zhang", "authorids": "~Qing_LIAN3;~LIN_Yong1;~Tong_Zhang2", "gender": "M;;M", "homepage": "https://www.lianqing11.github.io;;http://tongzhang-ml.org", "dblp": "234/4406;;07/4227-1", "google_scholar": ";;LurWtuYAAAAJ", "orcid": ";;0000-0002-5511-2558", "linkedin": ";;", "or_profile": "~Qing_LIAN3;~LIN_Yong1;~Tong_Zhang2", "aff": "Hong Kong University of Science and Technology;;Hong Kong University of Science and Technology", "aff_domain": "ust.hk;;ust.hk", "position": "PhD student;;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=0_YzHnuthDf", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;4;4;5", "wc_review": "465;454;251;550", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 430.0, 109.82030777593005 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5324835041390461990&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Hong Kong University of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ust.hk", "aff_unique_abbr": "HKUST", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hong Kong SAR", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "0_ao8yS2eBw", "title": "Solving NP-Hard Problems on Graphs with Extended AlphaGo Zero", "track": "main", "status": "Reject", "tldr": "", "abstract": "There have been increasing challenges to solve combinatorial optimization problems by machine learning. \nKhalil et al. (NeurIPS 2017) proposed an end-to-end reinforcement learning framework, which automatically learns graph embeddings to construct solutions to a wide range of problems.\nHowever, it sometimes performs poorly on graphs having different characteristics than training graphs.\nTo improve its generalization ability to various graphs, we propose a novel learning strategy based on AlphaGo Zero, a Go engine that achieved a superhuman level without the domain knowledge of the game.\nWe redesign AlphaGo Zero for combinatorial optimization problems, taking into account several differences from two-player games.\nIn experiments on five NP-hard problems such as {\\sc MinimumVertexCover} and {\\sc MaxCut}, our method, with only a policy network, shows better generalization than the previous method to various instances that are not used for training, including random graphs, synthetic graphs, and real-world graphs.\nFurthermore, our method is significantly enhanced by a test-time Monte Carlo Tree Search which makes full use of the policy network and value network.\nWe also compare recently-developed graph neural network (GNN) models, with an interesting insight into a suitable choice of GNN models for each task.", "keywords": "Graph neural network;Combinatorial optimization;Reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/0d06ed159522beb55ce96d052686023e7e7dc6b5.zip", "author": "Kenshin Abe;Zijian Xu;Issei Sato;Masashi Sugiyama", "authorids": "~Kenshin_Abe1;~Zijian_Xu1;~Issei_Sato1;~Masashi_Sugiyama1", "gender": ";M;M;M", "homepage": ";;;http://www.ms.k.u-tokyo.ac.jp/sugi/", "dblp": "https://dblp.uni-trier.de/pid/241/9512.html;45/3629-2.html;13/2665;35/1228", "google_scholar": ";;i4t2aUEAAAAJ;https://scholar.google.co.jp/citations?user=GkYIrlIAAAAJ", "orcid": ";;;0000-0001-6658-6743", "linkedin": ";;;", "or_profile": "~Kenshin_Abe1;~Zijian_Xu1;~Issei_Sato1;~Masashi_Sugiyama1", "aff": "The University of Tokyo;The University of Tokyo;the University of Tokyo;The University of Tokyo", "aff_domain": "u-tokyo.ac.jp;u-tokyo.ac.jp;u-tokyo.ac.jp;u-tokyo.ac.jp", "position": "MS student;MS student;Associate Professor;Full Professor", "bibtex": "@misc{\nabe2021solving,\ntitle={Solving {\\{}NP{\\}}-Hard Problems on Graphs with Extended AlphaGo Zero},\nauthor={Kenshin Abe and Zijian Xu and Issei Sato and Masashi Sugiyama},\nyear={2021},\nurl={https://openreview.net/forum?id=0_ao8yS2eBw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=0_ao8yS2eBw", "pdf_size": 0, "rating": "4;4;5", "confidence": "4;4;4", "wc_review": "1034;236;453", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "227;200;190", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 574.3333333333334, 336.89002491746305 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 205.66666666666666, 15.627610892974722 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 58, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10975749770621137310&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Japan" }, { "title": "Large-width functional asymptotics for deep Gaussian neural networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2635", "id": "0aW6lYOYB7d", "poster": "", "openreview": "https://openreview.net/forum?id=0aW6lYOYB7d", "slides": "https://iclr.cc/virtual/2021/poster/2635", "video": "https://iclr.cc/virtual/2021/poster/2635", "author_site": "Daniele Bracale, Stefano Favaro, Sandra Fortini, Stefano Peluchetti", "tldr": "", "abstract": "In this paper, we consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions. Extending previous results (Matthews et al., 2018a;b;Yang, 2019) we adopt a function-space perspective, i.e. we look at neural networks as infinite-dimensional random elements on the input space $\\mathbb{R}^I$. Under suitable assumptions on the activation function we show that: i) a network defines a continuous Gaussian process on the input space $\\mathbb{R}^I$; ii) a network with re-scaled weights converges weakly to a continuous Gaussian process in the large-width limit; iii) the limiting Gaussian process has almost surely locally $\\gamma$-H\u00f6lder continuous paths, for $0 < \\gamma <1$. Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and Gaussian processes by establishing weak convergence in function-space with respect to a stronger metric.", "keywords": "deep learning theory;infinitely wide neural network;Gaussian process;stochastic process", "primary_area": "", "supplementary_material": "/attachment/4471337c84e84a22c7d965a598c7ea256646a232.zip", "author": "Daniele Bracale;Stefano Favaro;Sandra Fortini;Stefano Peluchetti", "authorids": "daniele.bracale@edu.unito.it;~Stefano_Favaro1;sandra.fortini@unibocconi.it;~Stefano_Peluchetti1", "gender": ";M;;M", "homepage": ";https://www.carloalberto.org/person/stefano-favaro/;;https://stefanopeluchetti.com", "dblp": ";148/7052;;128/1385", "google_scholar": ";UjIKIf8AAAAJ;;w3Gi3TEAAAAJ", "orcid": ";0000-0003-0936-9421;;", "linkedin": ";;;stefanopeluchetti/", "or_profile": "daniele.bracale@edu.unito.it;~Stefano_Favaro1;sandra.fortini@unibocconi.it;~Stefano_Peluchetti1", "aff": ";University of Torino;;Cogent Labs", "aff_domain": ";unito.it;;cogent.co.jp", "position": ";Full Professor;;Principal Research Scientist", "bibtex": "@inproceedings{\nbracale2021largewidth,\ntitle={Large-width functional asymptotics for deep Gaussian neural networks},\nauthor={Daniele Bracale and Stefano Favaro and Sandra Fortini and Stefano Peluchetti},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0aW6lYOYB7d}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "4;4;3;4", "wc_review": "150;512;1882;592", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "282;218;1930;722", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;3;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 784.0, 655.4403100206761 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 788.0, 687.2874216803331 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.4714045207910316, "gs_citation": 20, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14976254705917922866&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "pdf": "https://openreview.net/pdf?id=0aW6lYOYB7d", "email": ";unito.it;;cogent.co.jp", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "University of Turin;Cogent Labs", "aff_unique_dep": ";", "aff_unique_url": "https://www.unito.it;https://www.cogentlabs.com", "aff_unique_abbr": "UniTO;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Italy;United States" }, { "id": "0aZG2VcWLY", "title": "Signal Coding and Reconstruction using Spike Trains", "track": "main", "status": "Reject", "tldr": "", "abstract": "In many animal sensory pathways, the transformation from external stimuli to spike trains is essentially deterministic. In this context, a new mathematical framework for coding and reconstruction, based on a biologically plausible model of the spiking neuron, is presented. The framework considers encoding of a signal through spike trains generated by an ensemble of neurons via a standard convolve-then-threshold mechanism, albeit with a wide variety of convolution kernels. Neurons are distinguished by their convolution kernels and threshold values. Reconstruction is posited as a convex optimization minimizing energy. Formal conditions under which perfect reconstruction of the signal from the spike trains is possible are then identified. Coding experiments on a large audio dataset are presented to demonstrate the strength of the framework.", "keywords": "spike trains;signal encoding;reconstruction;kernel;representer theorem;compression;convolutional matching pursuit;COMP", "primary_area": "", "supplementary_material": "", "author": "Anik Chattopadhyay;Arunava Banerjee", "authorids": "~Anik_Chattopadhyay1;~Arunava_Banerjee2", "gender": "M;", "homepage": ";https://www.cise.ufl.edu/~arunava/", "dblp": ";40/6110", "google_scholar": ";f5CUbRIAAAAJ", "orcid": ";", "linkedin": "crystalonix/;", "or_profile": "~Anik_Chattopadhyay1;~Arunava_Banerjee2", "aff": "University of Florida;University of Florida", "aff_domain": "ufl.edu;ufl.edu", "position": "PhD student;Associate Professor", "bibtex": "@misc{\nchattopadhyay2021signal,\ntitle={Signal Coding and Reconstruction using Spike Trains},\nauthor={Anik Chattopadhyay and Arunava Banerjee},\nyear={2021},\nurl={https://openreview.net/forum?id=0aZG2VcWLY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=0aZG2VcWLY", "pdf_size": 0, "rating": "3;3;5;7", "confidence": "4;4;4;3", "wc_review": "366;243;587;318", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "837;747;361;0", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;0", "rating_avg": [ 4.5, 1.6583123951777 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 378.5, 128.11030403523364 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 486.25, 332.8448399780294 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.75, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:K7OvwdqV1E4J:scholar.google.com/&scioq=Signal+Coding+and+Reconstruction+using+Spike+Trains&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Florida", "aff_unique_dep": "", "aff_unique_url": "https://www.ufl.edu", "aff_unique_abbr": "UF", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Zero-Cost Proxies for Lightweight NAS", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2861", "id": "0cmMMy8J5q", "poster": "", "openreview": "https://openreview.net/forum?id=0cmMMy8J5q", "slides": "https://iclr.cc/virtual/2021/poster/2861", "video": "https://iclr.cc/virtual/2021/poster/2861", "author_site": "Mohamed Abdelfattah, Abhinav Mehrotra, \u0141ukasz Dudziak, Nicholas Lane", "tldr": "", "abstract": "Neural Architecture Search (NAS) is quickly becoming the standard methodology to design neural network models. However, NAS is typically compute-intensive because multiple models need to be evaluated before choosing the best one. To reduce the computational power and time needed, a proxy task is often used for evaluating each model instead of full training. In this paper, we evaluate conventional reduced-training proxies and quantify how well they preserve ranking between neural network models during search when compared with the rankings produced by final trained accuracy. We propose a series of zero-cost proxies, based on recent pruning literature, that use just a single minibatch of training data to compute a model's score. Our zero-cost proxies use 3 orders of magnitude less computation but can match and even outperform conventional proxies. For example, Spearman's rank correlation coefficient between final validation accuracy and our best zero-cost proxy on NAS-Bench-201 is 0.82, compared to 0.61 for EcoNAS (a recently proposed reduced-training proxy). Finally, we use these zero-cost proxies to enhance existing NAS search algorithms such as random search, reinforcement learning, evolutionary search and predictor-based search. For all search methodologies and across three different NAS datasets, we are able to significantly improve sample efficiency, and thereby decrease computation, by using our zero-cost proxies. For example on NAS-Bench-101, we achieved the same accuracy 4$\\times$ quicker than the best previous result. Our code is made public at: https://github.com/mohsaied/zero-cost-nas.", "keywords": "NAS;AutoML;proxy;pruning;efficient", "primary_area": "", "supplementary_material": "", "author": "Mohamed S Abdelfattah;Abhinav Mehrotra;\u0141ukasz Dudziak;Nicholas Donald Lane", "authorids": "~Mohamed_S_Abdelfattah1;a.mehrotra1@samsung.com;~\u0141ukasz_Dudziak1;~Nicholas_Donald_Lane1", "gender": "M;;M;", "homepage": "https://mohsaied.github.io/;;;", "dblp": "124/7095;;228/7987;", "google_scholar": "https://scholar.google.ca/citations?user=q4wBpWAAAAAJ;;R47NvpoAAAAJ;", "orcid": ";;;", "linkedin": "mabdelfattah/;;;", "or_profile": "~Mohamed_S_Abdelfattah1;a.mehrotra1@samsung.com;~\u0141ukasz_Dudziak1;~Nicholas_Donald_Lane1", "aff": "Samsung AI Center;;Samsung;", "aff_domain": "samsung.com;;samsung.com;", "position": "Principal Scientist;;Software Engineer;", "bibtex": "@inproceedings{\nabdelfattah2021zerocost,\ntitle={Zero-Cost Proxies for Lightweight {\\{}NAS{\\}}},\nauthor={Mohamed S Abdelfattah and Abhinav Mehrotra and {\\L}ukasz Dudziak and Nicholas Donald Lane},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0cmMMy8J5q}\n}", "github": "[![github](/images/github_icon.svg) mohsaied/zero-cost-nas](https://github.com/mohsaied/zero-cost-nas) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=0cmMMy8J5q)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;4;4;4", "wc_review": "545;405;218;638", "wc_reply_reviewers": "334;0;0;0", "wc_reply_authors": "972;322;272;420", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 451.5, 158.27902577410566 ], "wc_reply_reviewers_avg": [ 83.5, 144.62624243200125 ], "wc_reply_authors_avg": [ 496.5, 279.6439700762382 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 372, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9734890465405015230&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=0cmMMy8J5q", "email": "samsung.com;;samsung.com;", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Samsung", "aff_unique_dep": "AI Center", "aff_unique_url": "https://www.samsung.com/global/careers/ai-center/", "aff_unique_abbr": "Samsung AI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "id": "0fqoSxXBwI6", "title": "Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations", "track": "main", "status": "Reject", "tldr": "", "abstract": "3D object representation learning is a fundamental challenge in computer vision to draw inferences about the 3D world. Recent advances in deep learning have shown their efficiency in 3D object recognition, among which view-based methods have performed best so far. However, feature learning of multiple views in existing methods is mostly trained in a supervised fashion, which often requires a large amount of data labels with high cost. Hence, it is critical to learn multi-view feature representations in a self-supervised fashion. To this end, we propose a novel self-supervised learning paradigm of Multi-View Transformation Equivariant Representations (MV-TER), exploiting the equivariant transformations of a 3D object and its projected multiple views. Specifically, we perform a 3D transformation on a 3D object, and obtain multiple views before and after transformation via projection. Then, we self-train a representation learning module to capture the intrinsic 3D object representation by decoding 3D transformation parameters from the fused feature representations of multiple views before and after transformation. Experimental results demonstrate that the proposed MV-TER significantly outperforms the state-of-the-art view-based approaches in 3D object classification and retrieval tasks.", "keywords": "Self-supervised Learning;Multi-View Learning", "primary_area": "", "supplementary_material": "", "author": "Xiang Gao;Wei Hu;Guo-Jun Qi", "authorids": "~Xiang_Gao2;~Wei_Hu6;~Guo-Jun_Qi1", "gender": "M;F;M", "homepage": ";http://www.wict.pku.edu.cn/huwei/;http://maple-lab.net/gqi/", "dblp": ";52/173-3.html;41/943", "google_scholar": ";https://scholar.google.com.hk/citations?user=5oFf8Q4AAAAJ;https://scholar.google.com.tw/citations?user=Nut-uvoAAAAJ", "orcid": "0000-0002-2679-4019;0000-0002-9860-0922;0000-0003-3508-1851", "linkedin": "gyshgx868/;;", "or_profile": "~Xiang_Gao2;~Wei_Hu6;~Guo-Jun_Qi1", "aff": "Peking University;;Futurewei Technologies", "aff_domain": "pku.edu.cn;;futurewei.com", "position": "PhD student;;Chief AI Scientist and Technical VP", "bibtex": "@misc{\ngao2021selfsupervised,\ntitle={Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations},\nauthor={Xiang Gao and Wei Hu and Guo-Jun Qi},\nyear={2021},\nurl={https://openreview.net/forum?id=0fqoSxXBwI6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=0fqoSxXBwI6", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "5;2;4;4", "wc_review": "601;187;212;425", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "428;128;363;498", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 356.25, 168.88069013359697 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 354.25, 139.07619314605932 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4736842105263159, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15008296666104174976&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Peking University;Futurewei Technologies", "aff_unique_dep": ";", "aff_unique_url": "http://www.pku.edu.cn;https://www.futurewei.com", "aff_unique_abbr": "Peking U;Futurewei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "China;United States" }, { "id": "0gfSzsRDZFw", "title": "Ablation Path Saliency", "track": "main", "status": "Reject", "tldr": "", "abstract": "We consider the saliency problem for black-box classification. In image classification, this means highlighting the part of the image that is most relevant for the current decision.\nWe cast the saliency problem as finding an optimal ablation path between two images. An ablation path consists of a sequence of ever smaller masks, joining the current image to a reference image in another decision region. The optimal path will stay as long as possible in the current decision region. This approach extends the ablation tests in [Sturmfels et al. (2020)]. The gradient of the corresponding objective function is closely related to the integrated gradient method [Sundararajan et al. (2017)]. In the saturated case (when the classifier outputs a binary value) our method would reduce to the meaningful perturbation approach [Fong & Vedaldi (2017)], since crossing the decision boundary as late as\npossible would then be equivalent to finding the smallest possible mask lying on\nthe decision boundary.\nOur interpretation provides geometric understanding of existing saliency methods, and suggests a novel approach based on ablation path optimisation.", "keywords": "image classification;interpretability;feature attribution;saliency;ablation", "primary_area": "", "supplementary_material": "", "author": "Olivier Verdier;Justus Sagem\u00fcller", "authorids": "~Olivier_Verdier1;~Justus_Sagem\u00fcller1", "gender": ";M", "homepage": "https://www.olivierverdier.com/;https://github.com/leftaroundabout", "dblp": ";245/6186", "google_scholar": "https://scholar.google.co.uk/citations?user=CtXeVOIAAAAJ;", "orcid": "0000-0003-3699-6244;0000-0003-1882-1096", "linkedin": "olivierverdier/;", "or_profile": "~Olivier_Verdier1;~Justus_Sagem\u00fcller1", "aff": ";Western Norway University of Applied Sciences", "aff_domain": ";hvl.no", "position": ";PhD student", "bibtex": "@misc{\nverdier2021ablation,\ntitle={Ablation Path Saliency},\nauthor={Olivier Verdier and Justus Sagem{\\\"u}ller},\nyear={2021},\nurl={https://openreview.net/forum?id=0gfSzsRDZFw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=0gfSzsRDZFw", "pdf_size": 0, "rating": "4;4;6", "confidence": "4;4;3", "wc_review": "452;317;129", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "423;380;117", "reply_reviewers": "0;0;0", "reply_authors": "1;2;1", "rating_avg": [ 4.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 299.3333333333333, 132.45460438286855 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 306.6666666666667, 135.25860005518646 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9999999999999998, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:euHnA4ik31YJ:scholar.google.com/&scioq=Ablation+Path+Saliency&hl=en&as_sdt=0,5", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "Western Norway University of Applied Sciences", "aff_unique_dep": "", "aff_unique_url": "https://www.hin.no/en/", "aff_unique_abbr": "WNMUAS", "aff_country_unique_index": "0", "aff_country_unique": "Norway" }, { "id": "0h9cYBqucS6", "title": "Communication-Computation Efficient Secure Aggregation for Federated Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Federated learning has been spotlighted as a way to train neural network models using data distributed over multiple clients without a need to share private data. Unfortunately, however, it has been shown that data privacy could not be fully guaranteed as adversaries may be able to extract certain information on local data from the model parameters transmitted during federated learning. A recent solution based on the secure aggregation primitive enables privacy-preserving federated learning, but at the expense of significant extra communication/computational resources. In this paper, we propose communication-computation efficient secure aggregation which reduces the amount of communication/computational resources at least by a factor of $\\sqrt{n/ \\log n}$ relative to the existing secure solution without sacrificing data privacy, where $n$ is the number of clients. The key idea behind the suggested scheme is to design the topology of the secret-sharing nodes (denoted by the assignment graph $G$) as sparse random graphs instead of the complete graph corresponding to the existing solution. We first obtain a sufficient condition on $G$ to guarantee reliable and private federated learning. Afterwards, we suggest using the Erd\\H{o}s-R\u00e9nyi graph as $G$, and provide theoretical guarantees on the reliability/privacy of the proposed scheme. Through extensive real-world experiments, we demonstrate that our scheme, using only 50% of the resources required in the conventional scheme, maintains virtually the same levels of reliability and data privacy in practical federated learning systems.", "keywords": "Federated Learning;Privacy;Graphs;Secure Aggregation;Communication-Efficient;Computation-Efficient", "primary_area": "", "supplementary_material": "/attachment/b9ca4fd1981f7ba9558167c970bda292ac43c8bf.zip", "author": "Beongjun Choi;Jy-yong Sohn;Dong-Jun Han;Jaekyun Moon", "authorids": "bbzang10@kaist.ac.kr;~Jy-yong_Sohn1;~Dong-Jun_Han1;~Jaekyun_Moon2", "gender": ";M;M;M", "homepage": ";https://itml.yonsei.ac.kr/professor;https://sites.google.com/view/djhan930/home?authuser=0;http://comstolab.kaist.ac.kr/people.html", "dblp": ";188/6303;201/0078;78/2744", "google_scholar": ";https://scholar.google.co.kr/citations?user=Cs75s1MAAAAJ;https://scholar.google.co.kr/citations?user=-YR-GxUAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "bbzang10@kaist.ac.kr;~Jy-yong_Sohn1;~Dong-Jun_Han1;~Jaekyun_Moon2", "aff": ";Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;KAIST", "aff_domain": ";kaist.ac.kr;kaist.ac.kr;kaist.edu", "position": ";Postdoc;PhD student;Full Professor", "bibtex": "@misc{\nchoi2021communicationcomputation,\ntitle={Communication-Computation Efficient Secure Aggregation for Federated Learning},\nauthor={Beongjun Choi and Jy-yong Sohn and Dong-Jun Han and Jaekyun Moon},\nyear={2021},\nurl={https://openreview.net/forum?id=0h9cYBqucS6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=0h9cYBqucS6", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "5;3;4;3", "wc_review": "701;266;643;464", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "555;287;285;243", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 518.5, 169.95072815377992 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 342.5, 123.93849280994182 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7608859102526822, "gs_citation": 98, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18213063719382862298&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "0hMthVxlS89", "title": "Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We prove under commonly used assumptions the convergence of actor-critic reinforcement learning algorithms, which simultaneously learn a policy function, the actor, and a value function, the critic. Both functions can be deep neural networks of arbitrary complexity. Our framework allows showing convergence of the well known Proximal Policy Optimization (PPO) and of the recently introduced RUDDER. For the convergence proof we employ recently introduced techniques from the two time-scale stochastic approximation theory. Our results are valid for actor-critic methods that use episodic samples and that have a policy that becomes more greedy during learning. Previous convergence proofs assume linear function approximation, cannot treat episodic examples, or do not consider that policies become greedy. The latter is relevant since optimal policies are typically deterministic. ", "keywords": "reinforcement learning;actor critic algorithms;policy gradient methods;stochastic approximation;PPO;RUDDER", "primary_area": "", "supplementary_material": "", "author": "Markus Holzleitner;Lukas Gruber;Jose Arjona-Medina;Johannes Brandstetter;Sepp Hochreiter", "authorids": "~Markus_Holzleitner1;~Lukas_Gruber2;~Jose_Arjona-Medina1;~Johannes_Brandstetter1;~Sepp_Hochreiter1", "gender": ";Not Specified;M;M;M", "homepage": ";https://www.jku.at/en/institute-for-machine-learning/;;https://www.jku.at/en/institute-for-machine-learning/about-us/team/sepp-hochreiter/;http://www.arjonamedina.com", "dblp": "271/0626;18/7703;251/8691;h/SeppHochreiter.html;", "google_scholar": "518MXv8AAAAJ;;KiRvOHcAAAAJ;https://scholar.google.at/citations?user=tvUH3WMAAAAJ;", "orcid": ";;;0000-0001-7449-2528;0000-0002-5033-4725", "linkedin": ";;;https://linkedin.com/in/sepp-hochreiter-41514846;", "or_profile": "~Markus_Holzleitner1;~Lukas_Gruber2;~Johannes_Brandstetter1;~Sepp_Hochreiter1;~Jos\u00e9_Arjona-Medina1", "aff": "Johannes Kepler University Linz;Johannes Kepler University Linz;Johannes Kepler University Linz;Johannes Kepler University Linz;Johannes Kepler Universit\u00e4t Linz", "aff_domain": "jku.at;jku.at;jku.at;jku.at;jku.at", "position": "Postdoc;PhD student;Assistant Professor;Full Professor;Lecturer", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=0hMthVxlS89", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "3;3;4;4", "wc_review": "513;645;366;1144", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 667.0, 292.54486835355704 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 50, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7519444300300460296&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0;0;1", "aff_unique_norm": "Johannes Kepler University;Johannes Kepler University Linz", "aff_unique_dep": ";", "aff_unique_url": "https://www.jku.at;https://www.jku.at", "aff_unique_abbr": "JKU;JKU", "aff_campus_unique_index": "0;0;0;0;0", "aff_campus_unique": "Linz", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "Austria" }, { "id": "0i0IjXuq6J5", "title": "About contrastive unsupervised representation learning for classification and its convergence", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Contrastive representation learning has been recently proved to be very efficient for self-supervised training. These methods have been successfully used to train encoders which perform comparably to supervised training on downstream classification tasks. \nA few works have started to build a theoretical framework around contrastive learning in which guarantees for its performance can be proven. We provide extensions of these results to training with multiple negative samples and for multiway classification. \nFurthermore, we provide convergence guarantees for the minimization of the contrastive training error with gradient descent of an overparametrized deep neural encoder, and provide some numerical experiments that complement our theoretical findings.", "keywords": "Theoretical guarantees;Unsupervised learning;Contrastive learning;Overparametrized models", "primary_area": "", "supplementary_material": "/attachment/cc3fb27d660c38e854c4348bdefefb4d8dd59b27.zip", "author": "Ibrahim Merad;Yiyang Yu;Emmanuel Bacry;St\u00e9phane Ga\u00efffas", "authorids": "~Ibrahim_Merad1;~Yiyang_Yu1;~Emmanuel_Bacry1;~St\u00e9phane_Ga\u00efffas1", "gender": ";F;M;M", "homepage": ";https://yiyang-yu.github.io;http://www.cmap.polytechnique.fr/~bacry/;https://stephanegaiffas.github.io", "dblp": "279/9971;;71/5652;58/9890", "google_scholar": ";;;", "orcid": ";;;", "linkedin": "ibrahim-merad-b10664109/;;;", "or_profile": "~Ibrahim_Merad1;~Yiyang_Yu1;~Emmanuel_Bacry1;~St\u00e9phane_Ga\u00efffas1", "aff": "Universit\u00e9 Paris Cit\u00e9;CNRS;Univerist\u00e9 Paris-Dauphine;University of Paris", "aff_domain": "u-paris.fr;cnrs.fr;dauphine.fr;lpsm.paris", "position": "PhD student;PhD student;Senior researcher;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=0i0IjXuq6J5", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "5;3;3;2", "wc_review": "255;275;139;214", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 3.25, 1.0897247358851685 ], "wc_review_avg": [ 220.75, 52.06906471216859 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.9233805168766388, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4492205505537695978&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Universit\u00e9 Paris Cit\u00e9;Centre National de la Recherche Scientifique;Universit\u00e9 Paris-Dauphine;University of Paris", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.universite-paris.fr;https://www.cnrs.fr;https://www.univ-paris-dauphine.fr;https://www.universite-paris.fr", "aff_unique_abbr": "UPC;CNRS;UPD;UP", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "France" }, { "id": "0jPp4dKp3PL", "title": "Integrating linguistic knowledge into DNNs: Application to online grooming detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "Online grooming (OG) of children is a pervasive issue in an increasingly interconnected world. We explore various complementary methods to incorporate Corpus Linguistics (CL) knowledge into accurate and interpretable Deep Learning (DL) models. They provide an implicit text normalisation that adapts embedding spaces to the groomers' usage of language, and they focus the DNN's attention onto the expressions of OG strategies. We apply these integration to two architecture types and improve on the state-of-the-art on a new OG corpus.", "keywords": "Machine Learning;Corpus Linguistics", "primary_area": "", "supplementary_material": "/attachment/2eede23a8c7da310411134b0c5afc7f314374e32.zip", "author": "Jay Morgan;Adeline Paiement;Nuria Lorenzo-Dus;Anina Kinzel;Matteo Di Cristofaro", "authorids": "~Jay_Morgan1;~Adeline_Paiement1;n.lorenzo-dus@swansea.ac.uk;a.l.kinzel@swansea.ac.uk;mdc@infogrep.it", "gender": "M;F;;;", "homepage": ";;;;", "dblp": ";;;;", "google_scholar": ";;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Jay_Morgan1;~Adeline_Paiement1;n.lorenzo-dus@swansea.ac.uk;a.l.kinzel@swansea.ac.uk;mdc@infogrep.it", "aff": "Swansea University;University of Toulon;;;", "aff_domain": "swansea.ac.uk;univ-tln.fr;;;", "position": "PhD student;Associate Professor;;;", "bibtex": "@misc{\nmorgan2021integrating,\ntitle={Integrating linguistic knowledge into {\\{}DNN{\\}}s: Application to online grooming detection},\nauthor={Jay Morgan and Adeline Paiement and Nuria Lorenzo-Dus and Anina Kinzel and Matteo Di Cristofaro},\nyear={2021},\nurl={https://openreview.net/forum?id=0jPp4dKp3PL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=0jPp4dKp3PL", "pdf_size": 0, "rating": "4;5;6", "confidence": "4;4;4", "wc_review": "1007;263;279", "wc_reply_reviewers": "20;30;7", "wc_reply_authors": "2490;897;592", "reply_reviewers": "1;1;1", "reply_authors": "4;3;1", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 516.3333333333334, 347.01520940090734 ], "wc_reply_reviewers_avg": [ 19.0, 9.41629792788369 ], "wc_reply_authors_avg": [ 1326.3333333333333, 832.2044353536107 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 2.6666666666666665, 1.247219128924647 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6771019546455864582&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Swansea University;University of Toulon", "aff_unique_dep": ";", "aff_unique_url": "https://www.swansea.ac.uk;https://www.univ-toulon.fr", "aff_unique_abbr": "Swansea;UT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;France" }, { "id": "0jqRSnFnmL_", "title": "Alpha-DAG: a reinforcement learning based algorithm to learn Directed Acyclic Graphs", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Directed acyclic graphs (DAGs) are widely used to model the casual relationships among random variables in many disciplines. One major class of algorithms for DAGs is called `search-and-score', which attempts to maximize some goodness-of-fit measure and returns a DAG with the best score. However, most existing methods highly rely on their model assumptions and cannot be applied to the more general real-world problems. This paper proposes a novel Reinforcement-Learning-based searching algorithm, Alpha-DAG, which gradually finds the optimal order to add edges by learning from the historical searching trajectories. At each decision window, the agent adds the edge with the largest scoring improvement to the current graph. The advantage of Alpha-DAG is supported by the numerical comparison against some state-of-the-art competitors in both synthetic and real examples.", "keywords": "Directed acyclic graph;reinforcement learning;Q Learning;Graph Auto-Encoder", "primary_area": "", "supplementary_material": "/attachment/170cf974284927b2f9b96c8427caa4b15988325b.zip", "author": "Fan Zhou;Yifeng Pan;Shenghua Zhu;Xin HE", "authorids": "~Fan_Zhou7;~Yifeng_Pan3;~Shenghua_Zhu1;~Xin_HE6", "gender": ";;;M", "homepage": ";https://github.com/pp-payphone;https://sites.google.com/view/guoqinghe;https://github.com/PrintHelloWorldpy", "dblp": ";;;", "google_scholar": "4QJkjl0AAAAJ;;aduqO4EAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Fan_Zhou7;~Yifeng_Pan3;~Xin_HE6;~Zhu_Shenghua1", "aff": "Shanghai University of Finance and Economics;Shanghai University of Finance and Economics;Shanghai University of Finance and Economics;Shanghai University of Finance and Economics", "aff_domain": "shufe.edu;sufe.edu;shufe.edu;sufe.edu", "position": "Associate Professor;MS student;Associate Professor;MS student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=0jqRSnFnmL_", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;4;3;4", "wc_review": "561;427;520;253", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 440.25, 118.51028436384752 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:4FI47_y5ssUJ:scholar.google.com/&scioq=Alpha-DAG:+a+reinforcement+learning+based+algorithm+to+learn+Directed+Acyclic+Graphs&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Shanghai University of Finance and Economics", "aff_unique_dep": "", "aff_unique_url": "http://www.sufe.edu.cn", "aff_unique_abbr": "SUFE", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "0migj5lyUZl", "title": "A Strong On-Policy Competitor To PPO", "track": "main", "status": "Reject", "tldr": "", "abstract": "As a recognized variant and improvement for Trust Region Policy Optimization (TRPO), proximal policy optimization (PPO) has been widely used with several advantages: efficient data utilization, easy implementation and good parallelism. In this paper, a first-order gradient on-policy learning algorithm called Policy Optimization with Penalized Point Probability Distance (POP3D), which is a lower bound to the square of total variance divergence is proposed as another powerful variant. The penalty item has dual effects, prohibiting policy updates from overshooting and encouraging more explorations. Carefully controlled experiments on both discrete and continuous benchmarks verify our approach is highly competitive to PPO.", "keywords": "proximal policy optimization;deep reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/519f63fe1dc1eab83f201b4f3c442beed6d68d29.zip", "author": "Xiangxiang Chu", "authorids": "~Xiangxiang_Chu1", "gender": "M", "homepage": "https://cxxgtxy.github.io/", "dblp": "207/8002", "google_scholar": "jn21pUsAAAAJ", "orcid": "0000-0003-2548-0605", "linkedin": "", "or_profile": "~Xiangxiang_Chu1", "aff": "MeiTuan", "aff_domain": "meituan.com", "position": "Senior Engineer", "bibtex": "@misc{\nchu2021a,\ntitle={A Strong On-Policy Competitor To {\\{}PPO{\\}}},\nauthor={Xiangxiang Chu},\nyear={2021},\nurl={https://openreview.net/forum?id=0migj5lyUZl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=0migj5lyUZl", "pdf_size": 0, "rating": "5;5;5", "confidence": "4;4;4", "wc_review": "452;104;944", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "785;226;742", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 500.0, 344.6041206950375 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 584.3333333333334, 253.98731376892738 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4244597032456705517&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Meituan", "aff_unique_dep": "", "aff_unique_url": "https://www.meituan.com", "aff_unique_abbr": "MeiTuan", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "0n3BaVlNsHI", "title": "DJMix: Unsupervised Task-agnostic Augmentation for Improving Robustness", "track": "main", "status": "Reject", "tldr": "", "abstract": "Convolutional Neural Networks (CNNs) are vulnerable to unseen noise on input images at the test time, and thus improving the robustness is crucial. In this paper, we propose DJMix, a data augmentation method to improve the robustness by mixing each training image and its discretized one. Discretization is done in an unsupervised manner by an autoencoder, and the mixed images are nearly impossible to distinguish from the original images. Therefore, DJMix can easily be adapted to various image recognition tasks. We verify the effectiveness of our method using classification, semantic segmentation, and detection using clean and noisy test images.", "keywords": "robustness;uncertainty;discretization;data augmentation", "primary_area": "", "supplementary_material": "", "author": "Ryuichiro Hataya;Hideki Nakayama", "authorids": "~Ryuichiro_Hataya1;~Hideki_Nakayama1", "gender": "Unspecified;M", "homepage": "https://mosko.tokyo;https://www.nlab.ci.i.u-tokyo.ac.jp/index-e.html", "dblp": "238/1068;09/1592", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;lZAYGJoAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Ryuichiro_Hataya1;~Hideki_Nakayama1", "aff": "The University of Tokyo;The University of Tokyo", "aff_domain": "u-tokyo.ac.jp;u-tokyo.ac.jp", "position": "PhD student;Associate Professor", "bibtex": "@misc{\nhataya2021djmix,\ntitle={{\\{}DJM{\\}}ix: Unsupervised Task-agnostic Augmentation for Improving Robustness},\nauthor={Ryuichiro Hataya and Hideki Nakayama},\nyear={2021},\nurl={https://openreview.net/forum?id=0n3BaVlNsHI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=0n3BaVlNsHI", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;4;4;4", "wc_review": "1174;206;620;195", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "332;133;181;166", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 548.75, 399.57188026686765 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 203.0, 76.47548626847691 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Bm1w5fe7K9QJ:scholar.google.com/&scioq=DJMix:+Unsupervised+Task-agnostic+Augmentation+for+Improving+Robustness&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "id": "0naHZ3gZSzo", "title": "Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm", "track": "main", "status": "Reject", "tldr": "", "abstract": "Modern machine learning algorithms usually involve tuning multiple (from one to thousands) hyperparameters which play a pivotal role in terms of model generalizability. Globally choosing appropriate values of hyperparameters is extremely computationally challenging. Black-box optimization and gradient-based algorithms are two dominant approaches to hyperparameter optimization while they have totally distinct advantages. How to design a new hyperparameter optimization technique inheriting all benefits from both approaches is still an open problem. To address this challenging problem, in this paper, we propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG). Specifically, we first exactly formulate hyperparameter optimization as an $\\mathcal{A}$-based constrained optimization problem, where $\\mathcal{A}$ is a black-box optimization algorithm (such as deep neural network). Then, we use the average zeroth-order hyper-gradients to update hyperparameters. We provide the feasibility analysis of using HOZOG to achieve hyperparameter optimization. The experimental results on three representative hyperparameter (the size is from 1 to 1250) optimization tasks demonstrate the benefits of HOZOG in terms of \\textit{simplicity, scalability, flexibility, effectiveness and efficiency} compared with the state-of-the-art hyperparameter optimization methods.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/fe35ca540e449a5c45c7c3c11f6ab35db9c666e4.zip", "author": "Bin Gu;Guodong Liu;Yanfu Zhang;Xiang Geng;Heng Huang", "authorids": "~Bin_Gu1;~Guodong_Liu2;~Yanfu_Zhang1;~Xiang_Geng1;~Heng_Huang1", "gender": "M;M;;M;M", "homepage": "https://mbzuai.ac.ae/study/faculty/bin-gu/;;;;https://www.cs.umd.edu/~heng/", "dblp": "29/1758-1;;;222/7968;03/281", "google_scholar": "Vo8OgCgAAAAJ;Xgwse5AAAAAJ;;n6QnFS0AAAAJ;4OqLaDwAAAAJ", "orcid": "0000-0001-6049-1815;;;;", "linkedin": ";guodong-liu-56a671107/;;;", "or_profile": "~Bin_Gu1;~Guodong_Liu2;~Yanfu_Zhang1;~Xiang_Geng1;~Heng_Huang1", "aff": "Mohamed bin Zayed University of Artificial Intelligence;University of Pittsburgh;;Nanjing University;University of Pittsburgh", "aff_domain": "mbzuai.ac.ae;pitt.edu;;nju.edu.cn;pitt.edu", "position": "Assistant Professor;PhD student;;PhD student;Full Professor", "bibtex": "@misc{\ngu2021optimizing,\ntitle={Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm},\nauthor={Bin Gu and Guodong Liu and Yanfu Zhang and Xiang Geng and Heng Huang},\nyear={2021},\nurl={https://openreview.net/forum?id=0naHZ3gZSzo}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=0naHZ3gZSzo", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "5;4;4;4", "wc_review": "504;690;756;758", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "215;169;150;146", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 677.0, 103.56157588603989 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 170.0, 27.39525506360545 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 21, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10423514711168198185&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;2;1", "aff_unique_norm": "Mohamed bin Zayed University of Artificial Intelligence;University of Pittsburgh;Nanjing University", "aff_unique_dep": ";;", "aff_unique_url": "https://mbzuai.ac.ae;https://www.pitt.edu;https://www.nju.edu.cn", "aff_unique_abbr": "MBZUAI;Pitt;Nanjing U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2;1", "aff_country_unique": "United Arab Emirates;United States;China" }, { "title": "Mastering Atari with Discrete World Models", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2742", "id": "0oabwyZbOu", "poster": "", "openreview": "https://openreview.net/forum?id=0oabwyZbOu", "slides": "https://iclr.cc/virtual/2021/poster/2742", "video": "https://iclr.cc/virtual/2021/poster/2742", "author_site": "Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba", "tldr": "", "abstract": "Intelligent agents need to generalize from past experience to achieve goals in complex environments. World models facilitate such generalization and allow learning behaviors from imagined outcomes to increase sample-efficiency. While learning world models from image inputs has recently become feasible for some tasks, modeling Atari games accurately enough to derive successful behaviors has remained an open challenge for many years. We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model. The world model uses discrete representations and is trained separately from the policy. DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model. With the same computational budget and wall-clock time, Dreamer V2 reaches 200M frames and surpasses the final performance of the top single-GPU agents IQN and Rainbow. DreamerV2 is also applicable to tasks with continuous actions, where it learns an accurate world model of a complex humanoid robot and solves stand-up and walking from only pixel inputs.", "keywords": "Atari;world models;model-based reinforcement learning;reinforcement learning;planning;actor critic", "primary_area": "", "supplementary_material": "", "author": "Danijar Hafner;Timothy P Lillicrap;Mohammad Norouzi;Jimmy Ba", "authorids": "~Danijar_Hafner1;~Timothy_P_Lillicrap1;~Mohammad_Norouzi1;~Jimmy_Ba1", "gender": ";M;M;M", "homepage": "https://danijar.com;http://contrastiveconvergence.net/~timothylillicrap/index.php;https://norouzi.github.io/;http://jimmylba.github.io", "dblp": "184/8088;37/10849;https://dblp.org/pers/hd/n/Norouzi_0002:Mohammad;https://dblp.org/pers/b/Ba:Jimmy.html", "google_scholar": "VINmGpYAAAAJ;https://scholar.google.co.uk/citations?user=htPVdRMAAAAJ;Lncr-VoAAAAJ;https://scholar.google.ca/citations?user=ymzxRhAAAAAJ", "orcid": "0000-0002-9534-7271;;;", "linkedin": ";;;", "or_profile": "~Danijar_Hafner1;~Timothy_P_Lillicrap1;~Mohammad_Norouzi1;~Jimmy_Ba1", "aff": "University of Toronto;Google DeepMind;Google Brain;Department of Computer Science, University of Toronto", "aff_domain": "cs.toronto;deepmind.com;google.com;cs.toronto.edu", "position": "PhD student;Research Scientist;Research Scientist;Assistant Professor", "bibtex": "@inproceedings{\nhafner2021mastering,\ntitle={Mastering Atari with Discrete World Models},\nauthor={Danijar Hafner and Timothy P Lillicrap and Mohammad Norouzi and Jimmy Ba},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0oabwyZbOu}\n}", "github": "[![github](/images/github_icon.svg) danijar/dreamerv2](https://github.com/danijar/dreamerv2) + [![Papers with Code](/images/pwc_icon.svg) 8 community implementations](https://paperswithcode.com/paper/?openreview=0oabwyZbOu)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "4;5;8;9", "confidence": "4;4;5;5", "wc_review": "440;510;489;816", "wc_reply_reviewers": "0;167;0;0", "wc_reply_authors": "1041;1419;855;352", "reply_reviewers": "0;1;0;0", "reply_authors": "2;3;2;1", "rating_avg": [ 6.5, 2.0615528128088303 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 563.75, 147.83500093009098 ], "wc_reply_reviewers_avg": [ 41.75, 72.31312121600062 ], "wc_reply_authors_avg": [ 916.75, 384.2033153162528 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.9701425001453319, "gs_citation": 1065, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2696098032395844049&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=0oabwyZbOu", "email": "cs.toronto;deepmind.com;google.com;cs.toronto.edu", "author_num": 4, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "University of Toronto;Google", "aff_unique_dep": ";Google DeepMind", "aff_unique_url": "https://www.utoronto.ca;https://deepmind.com", "aff_unique_abbr": "U of T;DeepMind", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Mountain View;Toronto", "aff_country_unique_index": "0;1;2;0", "aff_country_unique": "Canada;United Kingdom;United States" }, { "id": "0owsv3F-fM", "title": "Cross-Modal Domain Adaptation for Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Domain adaptation is a promising direction for deploying RL agents in real-world applications, where vision-based robotics tasks constitute an important part. Cur-rent methods that train polices on simulated images not only require a delicately crafted simulator, but also add extra burdens to the training process. In this paper, we propose a method that can learn a mapping from high-dimensional images to low-level simulator states, allowing agents trained on the source domain of state input to transfer well to the target domain of image input. By fully leveraging the sequential information in the trajectories and incorporating the policy to guide the training process, our method overcomes the intrinsic ill-posedness in cross-modal domain adaptation when structural constraints from the same modality are unavailable. Experiments on MuJoCo environments show that the policy, once combined with the mapping function, can be deployed directly in the target domain with only a small performance gap, while current methods designed for same-modal domain adaptation fail on this problem.", "keywords": "Domain Adaptation;Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Xiong-Hui Chen;Shengyi Jiang;Feng Xu;Yang Yu", "authorids": "~Xiong-Hui_Chen1;~Shengyi_Jiang2;~Feng_Xu6;~Yang_Yu5", "gender": "M;M;M;M", "homepage": "http://www.lamda.nju.edu.cn/chenxh/;http://www.lamda.nju.edu.cn/jiangsy;;http://www.lamda.nju.edu.cn/yuy", "dblp": "241/7938;67/3929;https://dblp.uni-trier.de/pid/03/2611;46/2181-1", "google_scholar": "H5pguCYAAAAJ;;;PG2lDSwAAAAJ", "orcid": ";0000-0002-4443-0753;0009-0004-6809-1866;", "linkedin": ";;feng-xu-3183b216b;", "or_profile": "~Xiong-Hui_Chen1;~Shengyi_Jiang2;~Feng_Xu6;~Yang_Yu2", "aff": "Nanjing University;Nanjing University;Nanjing University;Nanjing University", "aff_domain": "nju.edu.cn;nju.edu.cn;nju.edu.cn;nju.edu.cn", "position": "PhD student;MS student;PhD student;Professor", "bibtex": "@misc{\nchen2021crossmodal,\ntitle={Cross-Modal Domain Adaptation for Reinforcement Learning},\nauthor={Xiong-Hui Chen and Shengyi Jiang and Feng Xu and Yang Yu},\nyear={2021},\nurl={https://openreview.net/forum?id=0owsv3F-fM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=0owsv3F-fM", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;4;3;3", "wc_review": "399;1310;281;335", "wc_reply_reviewers": "260;268;191;115", "wc_reply_authors": "884;1020;468;705", "reply_reviewers": "1;1;1;1", "reply_authors": "2;2;2;2", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 581.25, 422.8122366961486 ], "wc_reply_reviewers_avg": [ 208.5, 61.727222519727874 ], "wc_reply_authors_avg": [ 769.25, 206.71402347204216 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4107214539234110623&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Nanjing University", "aff_unique_dep": "", "aff_unique_url": "https://www.nju.edu.cn", "aff_unique_abbr": "Nanjing U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "0p-aRvcVs-U", "title": "$\\alpha$VIL: Learning to Leverage Auxiliary Tasks for Multitask Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Multitask Learning is a Machine Learning paradigm that aims to train a range of (usually related) tasks with the help of a shared model. While the goal is often to improve the joint performance of all training tasks, another approach is to focus on the performance of a specific target task, while treating the remaining ones as auxiliary data from which to possibly leverage positive transfer towards the target during training. In such settings, it becomes important to estimate the positive or negative influence auxiliary tasks will have on the target. While many ways have been proposed to estimate task weights before or during training they typically rely on heuristics or extensive search of the weighting space. We propose a novel method called $\\alpha$-Variable Importance Learning ($\\alpha$VIL) that is able to adjust task weights dynamically during model training, by making direct use of task-specific updates of the underlying model's parameters between training epochs. Experiments indicate that $\\alpha$VIL is able to outperform other Multitask Learning approaches in a variety of settings. To our knowledge, this is the first attempt at making direct use of model updates for task weight estimation.", "keywords": "multitask learning;meta-optimization;deep learning", "primary_area": "", "supplementary_material": "", "author": "Rafael Kourdis;Gabriel Gordon-Hall;Philip John Gorinski", "authorids": "rafael.kourdis@gmail.com;ggordonhall@gmail.com;~Philip_John_Gorinski1", "gender": ";;M", "homepage": ";;https://philip.gorinski.com", "dblp": ";;165/0736", "google_scholar": ";;https://scholar.google.co.uk/citations?hl=en", "orcid": ";;", "linkedin": ";;pjgorinski/", "or_profile": "rafael.kourdis@gmail.com;ggordonhall@gmail.com;~Philip_John_Gorinski1", "aff": ";;Huawei Noah's Ark Lab", "aff_domain": ";;huawei.com", "position": ";;Research Scientist", "bibtex": "@misc{\nkourdis2021alphavil,\ntitle={{\\$}{\\textbackslash}alpha{\\$}{\\{}VIL{\\}}: Learning to Leverage Auxiliary Tasks for Multitask Learning},\nauthor={Rafael Kourdis and Gabriel Gordon-Hall and Philip John Gorinski},\nyear={2021},\nurl={https://openreview.net/forum?id=0p-aRvcVs-U}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=0p-aRvcVs-U", "pdf_size": 0, "rating": "3;4;4", "confidence": "4;4;4", "wc_review": "462;558;253", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "326;536;106", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 424.3333333333333, 127.33246072999437 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 322.6666666666667, 175.5625877635159 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:IOH83tycpBcJ:scholar.google.com/&scioq=%24%5Calpha%24VIL:+Learning+to+Leverage+Auxiliary+Tasks+for+Multitask+Learning&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Huawei", "aff_unique_dep": "Noah's Ark Lab", "aff_unique_url": "https://www.huawei.com", "aff_unique_abbr": "Huawei", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "title": "Monotonic Kronecker-Factored Lattice", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2667", "id": "0pxiMpCyBtr", "poster": "", "openreview": "https://openreview.net/forum?id=0pxiMpCyBtr", "slides": "https://iclr.cc/virtual/2021/poster/2667", "video": "https://iclr.cc/virtual/2021/poster/2667", "author_site": "William Bakst, Nobuyuki Morioka, Erez Louidor", "tldr": "", "abstract": "It is computationally challenging to learn flexible monotonic functions that guarantee model behavior and provide interpretability beyond a few input features, and in a time where minimizing resource use is increasingly important, we must be able to learn such models that are still efficient. In this paper we show how to effectively and efficiently learn such functions using Kronecker-Factored Lattice ($\\mathrm{KFL}$), an efficient reparameterization of flexible monotonic lattice regression via Kronecker product. Both computational and storage costs scale linearly in the number of input features, which is a significant improvement over existing methods that grow exponentially. We also show that we can still properly enforce monotonicity and other shape constraints. The $\\mathrm{KFL}$ function class consists of products of piecewise-linear functions, and the size of the function class can be further increased through ensembling. We prove that the function class of an ensemble of $M$ base $\\mathrm{KFL}$ models strictly increases as $M$ increases up to a certain threshold. Beyond this threshold, every multilinear interpolated lattice function can be expressed. Our experimental results demonstrate that $\\mathrm{KFL}$ trains faster with fewer parameters while still achieving accuracy and evaluation speeds comparable to or better than the baseline methods and preserving monotonicity guarantees on the learned model.", "keywords": "Theory;Regularization;Algorithms;Classification;Regression;Matrix and Tensor Factorization;Fairness;Evaluation;Efficiency;Machine Learning", "primary_area": "", "supplementary_material": "", "author": "William Taylor Bakst;Nobuyuki Morioka;Erez Louidor", "authorids": "~William_Taylor_Bakst1;~Nobuyuki_Morioka1;~Erez_Louidor1", "gender": "M;M;M", "homepage": ";;", "dblp": ";39/3539;98/2397", "google_scholar": "g4LKzDcAAAAJ;;N4hseCwAAAAJ", "orcid": ";;", "linkedin": "wbakst;;", "or_profile": "~William_Taylor_Bakst1;~Nobuyuki_Morioka1;~Erez_Louidor1", "aff": "Google;Google;Google", "aff_domain": "google.com;google.com;google.com", "position": "Research Engineer;Software Engineer;Software Engineer", "bibtex": "@inproceedings{\nbakst2021monotonic,\ntitle={Monotonic Kronecker-Factored Lattice},\nauthor={William Taylor Bakst and Nobuyuki Morioka and Erez Louidor},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0pxiMpCyBtr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer5;AnonReviewer3", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "2;1;3;3", "wc_review": "977;125;216;192", "wc_reply_reviewers": "40;0;105;0", "wc_reply_authors": "208;230;284;249", "reply_reviewers": "1;0;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 2.25, 0.82915619758885 ], "wc_review_avg": [ 377.5, 347.7243879856574 ], "wc_reply_reviewers_avg": [ 36.25, 42.920711783473486 ], "wc_reply_authors_avg": [ 242.75, 27.887048965424793 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5313604418511366990&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=0pxiMpCyBtr", "email": "google.com;google.com;google.com", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "0qbEq5UBfGD", "title": "Latent Space Semi-Supervised Time Series Data Clustering", "track": "main", "status": "Reject", "tldr": "", "abstract": "Time series data is abundantly available in the real world, but there is a distinct lack of large, labeled datasets available for many types of learning tasks. Semi-supervised models, which can leverage small amounts of expert-labeled data along with a larger unlabeled dataset, have been shown to improve performance over unsupervised learning models. Existing semi-supervised time series clustering algorithms suffer from lack of scalability as they are limited to perform learning operations within the original data space. We propose an autoencoder-based semi-supervised learning model along with multiple semi-supervised objective functions which can be used to improve the quality of the autoencoder\u2019s learned latent space via the addition of a small number of labeled examples. Experiments on a variety of datasets show that our methods can usually improve k-Means clustering performance. Our methods achieve a maximum average ARI of 0.897, a 140% increase over an unsupervised CAE model. Our methods also achieve a maximum improvement of 44% over a semi-supervised model.\n", "keywords": "Semi-supervised clustering;clustering;deep learning;autoencoder", "primary_area": "", "supplementary_material": "/attachment/781ba678ddb9841276270ba94a18e1a57e290021.zip", "author": "Andrew Hill;Katerina Kechris;Russell Bowler;Farnoush Kashani", "authorids": "~Andrew_Hill2;katerina.kechris@cuanschutz.edu;bowlerr@njhealth.org;~Farnoush_Kashani1", "gender": ";;;M", "homepage": "https://cse.ucdenver.edu/~bdlab/;;;http://cse.ucdenver.edu/~bdlab/", "dblp": ";;;34/508", "google_scholar": ";;;49T9qwYAAAAJ", "orcid": ";;;0000-0003-4102-9873", "linkedin": "andrew-hill-6b3369157/;;;farnoush-banaei-kashani-3614454/", "or_profile": "~Andrew_Hill2;katerina.kechris@cuanschutz.edu;bowlerr@njhealth.org;~Farnoush_Kashani1", "aff": ";;;University of Colorado, Denver", "aff_domain": ";;;ucdenver.edu", "position": ";;;Associate Professor", "bibtex": "@misc{\nhill2021latent,\ntitle={Latent Space Semi-Supervised Time Series Data Clustering},\nauthor={Andrew Hill and Katerina Kechris and Russell Bowler and Farnoush Kashani},\nyear={2021},\nurl={https://openreview.net/forum?id=0qbEq5UBfGD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=0qbEq5UBfGD", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "3;5;2;3", "wc_review": "2806;224;188;304", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "888;337;520;264", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.25, 1.0897247358851685 ], "wc_review_avg": [ 880.5, 1112.480449266413 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 502.25, 241.44810519032862 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.48420012470625223, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:z1HdNuVhlNgJ:scholar.google.com/&scioq=Latent+Space+Semi-Supervised+Time+Series+Data+Clustering&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Colorado", "aff_unique_dep": "", "aff_unique_url": "https://www.cu.edu", "aff_unique_abbr": "CU", "aff_campus_unique_index": "0", "aff_campus_unique": "Denver", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "0rNLjXgchOC", "title": "Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Hessian captures important properties of the deep neural network loss landscape. We observe that eigenvectors and eigenspaces of the layer-wise Hessian for neural network objective have several interesting structures -- top eigenspaces for different models have high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the weight matrix of the corresponding layer. These structures, as well as the low rank structure of the Hessian observed in previous studies, can be explained by approximating the Hessian using Kronecker factorization. Our new understanding can also explain why some of these structures become weaker when the network is trained with batch normalization. Finally, we show that the Kronecker factorization can be combined with PAC-Bayes techniques to get better generalization bounds.", "keywords": "Hessian;neural network;Kronecker factorization;PAC-Bayes bound;eigenspace;eigenvalue", "primary_area": "", "supplementary_material": "/attachment/9a118dec4a9ae7cdf07813d9a36143852831345c.zip", "author": "Yikai Wu;Xingyu Zhu;Chenwei Wu;Annie N. Wang;Rong Ge", "authorids": "~Yikai_Wu1;~Xingyu_Zhu1;~Chenwei_Wu1;~Annie_N._Wang1;~Rong_Ge1", "gender": "M;M;M;;M", "homepage": "https://yikai-wu.github.io;;https://users.cs.duke.edu/~cwwu/;;https://users.cs.duke.edu/~rongge/", "dblp": "202/3087-1;132/4210-3.html;https://dblp.uni-trier.de/pers/hd/w/Wu_0002:Chenwei;;89/6869-1.html", "google_scholar": "O-V-kjoAAAAJ;Dlya3HMAAAAJ;WoB6M2cAAAAJ;8Pg7CpwAAAAJ;https://scholar.google.com.tw/citations?user=MVxcjEoAAAAJ", "orcid": "0000-0002-8933-6293;0000-0003-1997-4668;0000-0002-5226-7431;;", "linkedin": ";;chenwei-wu-22754012b/;annie-w-928955101/;", "or_profile": "~Yikai_Wu1;~Xingyu_Zhu1;~Chenwei_Wu1;~Annie_N._Wang1;~Rong_Ge1", "aff": "Duke University;Duke University;Duke University;Duke University;Duke University", "aff_domain": "duke.edu;duke.edu;duke.edu;duke.edu;duke.edu", "position": "Undergrad student;Undergrad student;PhD student;Undergrad student;Assistant Professor", "bibtex": "@misc{\nwu2021dissecting,\ntitle={Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks},\nauthor={Yikai Wu and Xingyu Zhu and Chenwei Wu and Annie N. Wang and Rong Ge},\nyear={2021},\nurl={https://openreview.net/forum?id=0rNLjXgchOC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=0rNLjXgchOC", "pdf_size": 0, "rating": "4;4;4;7", "confidence": "4;5;2;4", "wc_review": "311;1290;413;509", "wc_reply_reviewers": "0;807;0;0", "wc_reply_authors": "656;1921;435;343", "reply_reviewers": "0;1;0;0", "reply_authors": "1;3;1;1", "rating_avg": [ 4.75, 1.299038105676658 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 630.75, 387.0041181951427 ], "wc_reply_reviewers_avg": [ 201.75, 349.441250427021 ], "wc_reply_authors_avg": [ 838.75, 635.1072251990211 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 26, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16035966435576849017&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "Duke University", "aff_unique_dep": "", "aff_unique_url": "https://www.duke.edu", "aff_unique_abbr": "Duke", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "0vO-u0sucRF", "title": "Information Theoretic Meta Learning with Gaussian Processes", "track": "main", "status": "Reject", "tldr": "", "abstract": "We formulate meta learning using information theoretic concepts such as mutual information and the information bottleneck. The idea is to learn a stochastic representation or encoding of the task description, given by a training or support set, that is highly informative about predicting the validation set. By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning. We particularly develop new memory-based meta learning algorithms based on Gaussian processes and derive extensions that combine memory and gradient-based meta learning. We demonstrate our method on few-shot regression and classification by using standard benchmarks such as Omniglot, mini-Imagenet and Augmented Omniglot.\n", "keywords": "Meta Learning;Information Bottleneck;Gaussian Processes;Few-shot learning;Variational Inference", "primary_area": "", "supplementary_material": "", "author": "Michalis Titsias;Sotirios Nikoloutsopoulos;Alexandre Galashov", "authorids": "~Michalis_Titsias1;snikolou@aueb.gr;~Alexandre_Galashov1", "gender": "M;;M", "homepage": "https://mtitsias.github.io/;;https://galashov.com", "dblp": "19/5385;;", "google_scholar": "https://scholar.google.gr/citations?user=B-SbkAwAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Michalis_Titsias1;snikolou@aueb.gr;~Alexandre_Galashov1", "aff": "Google DeepMind;;Ecole Polytechnique", "aff_domain": "google.com;;polytechnique.edu", "position": "Research Scientist;;MS student", "bibtex": "@misc{\ntitsias2021information,\ntitle={Information Theoretic Meta Learning with Gaussian Processes},\nauthor={Michalis Titsias and Sotirios Nikoloutsopoulos and Alexandre Galashov},\nyear={2021},\nurl={https://openreview.net/forum?id=0vO-u0sucRF}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=0vO-u0sucRF", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "3;3;4;3", "wc_review": "355;1000;738;466", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "459;817;735;587", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 639.75, 250.35212701313324 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 649.5, 137.4436248066821 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 19, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15477261339965175201&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1", "aff_unique_norm": "Google;Ecole Polytechnique", "aff_unique_dep": "Google DeepMind;", "aff_unique_url": "https://deepmind.com;https://www.polytechnique.edu", "aff_unique_abbr": "DeepMind;X", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;France" }, { "id": "0xdQXkz69x9", "title": "Attacking Few-Shot Classifiers with Adversarial Support Sets", "track": "main", "status": "Reject", "tldr": "", "abstract": "Few-shot learning systems, especially those based on meta-learning, have recently made significant advances, and are now being considered for real world problems in healthcare, personalization, and science. In this paper, we examine the robustness of such deployed few-shot learning systems when they are fed an imperceptibly perturbed few-shot dataset, showing that the resulting predictions on test inputs can become worse than chance. This is achieved by developing a novel Adversarial Support Set Attack which crafts a poisoned set of examples. When even a small subset of malicious data points is inserted into the support set of a meta-learner, accuracy is significantly reduced. For example, the average classification accuracy of CNAPs on the Aircraft dataset in the META-DATASET benchmark drops from 69.2% to 9.1% when only 20% of the support set is poisoned by imperceptible perturbations. We evaluate the new attack on a variety of few-shot classification algorithms including MAML, prototypical networks, and CNAPs, on both small scale (miniImageNet) and large scale (META-DATASET) few-shot classification problems. Interestingly, adversarial support sets produced by attacking a meta-learning based few-shot classifier can also reduce the accuracy of a fine-tuning based few-shot classifier when both models use similar feature extractors.", "keywords": "meta-learning;few-shot learning;adversarial attacks;poisoning", "primary_area": "", "supplementary_material": "/attachment/e29a5ba405da8ff14e5f54486b98e227a99b3143.zip", "author": "Elre Talea Oldewage;John F Bronskill;Richard E Turner", "authorids": "~Elre_Talea_Oldewage1;~John_F_Bronskill1;~Richard_E_Turner1", "gender": "F;M;M", "homepage": "http://mlg.eng.cam.ac.uk/?portfolio=elre-oldewage;;https://rich-turner-group.github.io/", "dblp": ";;40/5352", "google_scholar": ";https://scholar.google.co.nz/citations?user=aH2jZsoAAAAJ;https://scholar.google.co.uk/citations?user=DgLEyZgAAAAJ", "orcid": "0000-0002-0568-8700;;", "linkedin": ";;", "or_profile": "~Elre_Talea_Oldewage1;~John_F_Bronskill1;~Richard_E_Turner1", "aff": "University of Cambridge;University of Cambridge;University of Cambridge", "aff_domain": "cam.ac.uk;cam.ac.uk;cam.ac.uk", "position": "PhD student;Research Associate;Professor", "bibtex": "@misc{\noldewage2021attacking,\ntitle={Attacking Few-Shot Classifiers with Adversarial Support Sets},\nauthor={Elre Talea Oldewage and John F Bronskill and Richard E Turner},\nyear={2021},\nurl={https://openreview.net/forum?id=0xdQXkz69x9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=0xdQXkz69x9", "pdf_size": 0, "rating": "4;6;6;6", "confidence": "4;3;3;4", "wc_review": "345;406;352;240", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "758;1169;700;1118", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;2", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 335.75, 60.10979537479728 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 936.25, 209.04111437705262 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11327364495960871970&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Cambridge", "aff_unique_dep": "", "aff_unique_url": "https://www.cam.ac.uk", "aff_unique_abbr": "Cambridge", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Cambridge", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "0z1HScLBEpb", "title": "UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper focuses on cooperative value-based multi-agent reinforcement learning (MARL) in the paradigm of centralized training with decentralized execution (CTDE). Current state-of-the-art value-based MARL methods leverage CTDE to learn a centralized joint-action value function as a monotonic mixing of each agent's utility function, which enables easy decentralization. However, this monotonic restriction leads to inefficient exploration in tasks with nonmonotonic returns due to suboptimal approximations of the values of joint actions. To address this, we present a novel MARL approach called Universal Value Exploration (UneVEn), which uses universal successor features (USFs) to learn policies of tasks related to the target task, but with simpler reward functions in a sample efficient manner. UneVEn uses novel action-selection schemes between randomly sampled related tasks during exploration, which enables the monotonic joint-action value function of the target task to place more importance on useful joint actions. Empirical results on a challenging cooperative predator-prey task requiring significant coordination amongst agents show that UneVEn significantly outperforms state-of-the-art baselines.", "keywords": "multi-agent reinforcement learning;deep Q-learning;universal value functions;successor features;relative overgeneralization", "primary_area": "", "supplementary_material": "", "author": "Tarun Gupta;Anuj Mahajan;Bei Peng;Wendelin Boehmer;Shimon Whiteson", "authorids": "~Tarun_Gupta3;~Anuj_Mahajan1;~Bei_Peng2;~Wendelin_Boehmer1;~Shimon_Whiteson1", "gender": "M;M;;M;", "homepage": ";https://anuj-mahajan.github.io/;;https://reinforceAI.net;", "dblp": "38/6099-2;99/3800;;08/9988;https://dblp.uni-trier.de/pers/w/Whiteson:Shimon.html", "google_scholar": "yW1VlzwAAAAJ;https://scholar.google.co.in/citations?user=a3AbXGcAAAAJ;;https://scholar.google.de/citations?user=wI5MV8IAAAAJ;", "orcid": ";;;0000-0002-4398-6792;", "linkedin": "tarun1995gupta/;anuj-m-bb0a26175/;;wendelin-boehmer;", "or_profile": "~Tarun_Gupta3;~Anuj_Mahajan1;~Bei_Peng2;~Wendelin_Boehmer1;~Shimon_Whiteson1", "aff": "University of Oxford;University of Oxford;;Delft University of Technology;University of Oxford", "aff_domain": "ox.ac.uk;ox.ac.uk;;tudelft.nl;ox.ac.uk", "position": "PhD student;PhD student;;Assistant Professor;Professor", "bibtex": "@misc{\ngupta2021uneven,\ntitle={Une{\\{}VE{\\}}n: Universal Value Exploration for Multi-Agent Reinforcement Learning},\nauthor={Tarun Gupta and Anuj Mahajan and Bei Peng and Wendelin Boehmer and Shimon Whiteson},\nyear={2021},\nurl={https://openreview.net/forum?id=0z1HScLBEpb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=0z1HScLBEpb", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "5;2;4;5", "wc_review": "268;435;546;586", "wc_reply_reviewers": "300;0;50;310", "wc_reply_authors": "1444;109;724;1376", "reply_reviewers": "1;0;1;1", "reply_authors": "3;1;2;2", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 4.0, 1.224744871391589 ], "wc_review_avg": [ 458.75, 123.2423932743924 ], "wc_reply_reviewers_avg": [ 165.0, 141.15594213493105 ], "wc_reply_authors_avg": [ 913.25, 542.7860421013054 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.1873171623163388, "gs_citation": 58, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5502927213602777519&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 12, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "University of Oxford;Delft University of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://www.tudelft.nl", "aff_unique_abbr": "Oxford;TU Delft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "United Kingdom;Netherlands" }, { "title": "Undistillable: Making A Nasty Teacher That CANNOT teach students", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3114", "id": "0zvfm-nZqQs", "poster": "", "openreview": "https://openreview.net/forum?id=0zvfm-nZqQs", "slides": "https://iclr.cc/virtual/2021/poster/3114", "video": "https://iclr.cc/virtual/2021/poster/3114", "author_site": "Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang", "tldr": "", "abstract": "Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-trained teacher models to (usually more lightweight) student models. However, in certain situations, this technique is more of a curse than a blessing. For instance, KD poses a potential risk of exposing intellectual properties (IPs): even if a trained machine learning model is released in ``black boxes'' (e.g., as executable software or APIs without open-sourcing code), it can still be replicated by KD through imitating input-output behaviors. To prevent this unwanted effect of KD, this paper introduces and investigates a concept called $\\textit{Nasty Teacher}$: a specially trained teacher network that yields nearly the same performance as a normal one, but would significantly degrade the performance of student models learned by imitating it. We propose a simple yet effective algorithm to build the nasty teacher, called $\\textit{self-undermining knowledge distillation}$. Specifically, we aim to maximize the difference between the output of the nasty teacher and a normal pre-trained network. Extensive experiments on several datasets demonstrate that our method is effective on both standard KD and data-free KD, providing the desirable KD-immunity to model owners for the first time. We hope our preliminary study can draw more awareness and interest in this new practical problem of both social and legal importance. Our codes and pre-trained models can be found at: $\\url{https://github.com/VITA-Group/Nasty-Teacher}$.", "keywords": "knowledge distillation;avoid knowledge leaking", "primary_area": "", "supplementary_material": "", "author": "Haoyu Ma;Tianlong Chen;Ting-Kuei Hu;Chenyu You;Xiaohui Xie;Zhangyang Wang", "authorids": "~Haoyu_Ma1;~Tianlong_Chen1;~Ting-Kuei_Hu1;~Chenyu_You1;~Xiaohui_Xie2;~Zhangyang_Wang1", "gender": "M;M;M;M;;M", "homepage": "https://www.ics.uci.edu/~haoyum3/;https://tianlong-chen.github.io;;https://chenyuyou.me/;https://www.ics.uci.edu/~xhx/;https://vita-group.github.io", "dblp": "144/1634;;149/5032;191/9432;;119/4026", "google_scholar": "8jugwosAAAAJ;LE3ctn0AAAAJ;;hy_wB7cAAAAJ;1CR0meYAAAAJ;pxFyKAIAAAAJ", "orcid": "0000-0001-6646-2644;0000-0001-7774-8197;;0000-0001-8365-7822;;", "linkedin": "haoyu-ma-53517915a/;tianlong-chen-783862167/;;chenyu-you-b07475a4/;;", "or_profile": "~Haoyu_Ma1;~Tianlong_Chen1;~Ting-Kuei_Hu1;~Chenyu_You1;~Xiaohui_Xie2;~Zhangyang_Wang1", "aff": "Adobe Research;University of Texas, Austin;;Yale University;University of California, Irvine;University of Texas, Austin", "aff_domain": "adobe.com;utexas.edu;;yale.edu;uci.edu;utexas.edu", "position": "Intern;PhD student;;PhD student;Full Professor;Assistant Professor", "bibtex": "@inproceedings{\nma2021undistillable,\ntitle={Undistillable: Making A Nasty Teacher That {\\{}CANNOT{\\}} teach students},\nauthor={Haoyu Ma and Tianlong Chen and Ting-Kuei Hu and Chenyu You and Xiaohui Xie and Zhangyang Wang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=0zvfm-nZqQs}\n}", "github": "[![github](/images/github_icon.svg) VITA-Group/Nasty-Teacher](https://github.com/VITA-Group/Nasty-Teacher)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "4;4;4;4", "wc_review": "312;421;499;296", "wc_reply_reviewers": "18;54;19;24", "wc_reply_authors": "347;305;127;245", "reply_reviewers": "1;1;1;1", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 382.0, 82.92466460589395 ], "wc_reply_reviewers_avg": [ 28.75, 14.7542366796795 ], "wc_reply_authors_avg": [ 256.0, 82.83115355951527 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 55, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3474115554286885687&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=0zvfm-nZqQs", "email": "adobe.com;utexas.edu;;yale.edu;uci.edu;utexas.edu", "author_num": 6, "aff_unique_index": "0;1;2;3;1", "aff_unique_norm": "Adobe;University of Texas at Austin;Yale University;University of California, Irvine", "aff_unique_dep": "Adobe Research;;;", "aff_unique_url": "https://research.adobe.com;https://www.utexas.edu;https://www.yale.edu;https://www.uci.edu", "aff_unique_abbr": "Adobe;UT Austin;Yale;UCI", "aff_campus_unique_index": "1;2;1", "aff_campus_unique": ";Austin;Irvine", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "1-Mh-cWROZ", "title": "Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design", "track": "main", "status": "Reject", "tldr": "", "abstract": "Designing novel protein sequences consistent with a desired 3D structure or fold, often referred to as the inverse protein folding problem, is a central, but non-trivial, task in protein engineering. It has a wide range of applications in energy, biomedicine, and materials science. However, challenges exist due to the complex sequence-fold relationship and difficulties associated with modeling 3D folds. To overcome these challenges, we propose Fold2Seq, a novel transformer-based generative framework for designing protein sequences conditioned on a specific fold. Our model learns a fold embedding from the density of the secondary structural elements in 3D voxels, and then models the complex sequence-structure relationship by learning a joint sequence-fold embedding. Experiments on high-resolution, complete, and single-structure test set demonstrate improved performance of Fold2Seq in terms of speed and reliability for sequence design, compared to existing baselines including the state-of-the-art RosettaDesign and other neural net-based approaches. The unique advantages of fold-based Fold2Seq becomes more evident on diverse real-world test sets comprised of low-resolution, incomplete, or ensemble structures, in comparison to a structure-based model. ", "keywords": "Joint Embedding Learning;Generative Model;Transformer Autoencoder;Inverse Protein Folding;Sequence Design", "primary_area": "", "supplementary_material": "", "author": "Yue Cao;Payel Das;Pin-Yu Chen;Vijil Chenthamarakshan;Igor Melnyk;Yang Shen", "authorids": "~Yue_Cao4;~Payel_Das1;~Pin-Yu_Chen1;~Vijil_Chenthamarakshan1;~Igor_Melnyk1;~Yang_Shen4", "gender": "M;F;M;M;M;", "homepage": ";;http://www.pinyuchen.com;https://researcher.watson.ibm.com/researcher/view.php?person=us-ecvijil;https://imelnyk.github.io/;https://shen-lab.github.io/", "dblp": ";56/7926;39/8969;;;95/5308-1.html", "google_scholar": "Q0f5JRAAAAAJ;;jxwlCUUAAAAJ;g9hboJ0AAAAJ;4vDRTWwAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;0000-0003-1039-8369;;;0000-0002-1703-7796", "linkedin": ";;pin-yu-chen-940062a2;;;", "or_profile": "~Yue_Cao4;~Payel_Das1;~Pin-Yu_Chen1;~Vijil_Chenthamarakshan1;~Igor_Melnyk1;~Yang_Shen4", "aff": "Texas A&M;IBM, International Business Machines;International Business Machines;International Business Machines;International Business Machines;Texas A&M University - College Station", "aff_domain": "tamu.edu;us.ibm.com;ibm.com;ibm.com;ibm.com;tamu.edu", "position": "PhD student;Principal Researcher;Research Staff Member;Senior Technical Staff member;Researcher;Assistant Professor", "bibtex": "@misc{\ncao2021foldseq,\ntitle={Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design},\nauthor={Yue Cao and Payel Das and Pin-Yu Chen and Vijil Chenthamarakshan and Igor Melnyk and Yang Shen},\nyear={2021},\nurl={https://openreview.net/forum?id=1-Mh-cWROZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer5;AnonReviewer2", "site": "https://openreview.net/forum?id=1-Mh-cWROZ", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "5;4;4;5", "wc_review": "871;354;339;391", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 488.75, 221.50211624271222 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 29, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9442126458531954169&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;2;2;2;0", "aff_unique_norm": "Texas A&M University;International Business Machines;International Business Machines Corporation", "aff_unique_dep": ";;", "aff_unique_url": "https://www.tamu.edu;https://www.ibm.com;https://www.ibm.com", "aff_unique_abbr": "TAMU;IBM;IBM", "aff_campus_unique_index": "1", "aff_campus_unique": ";College Station", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "1-j4VLSHApJ", "title": "Learn2Weight: Weights Transfer Defense against Similar-domain Adversarial Attacks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent work in black-box adversarial attacks for NLP systems has attracted attention. Prior black-box attacks assume that attackers can observe output labels from target models based on selected inputs. In this work, inspired by adversarial transferability, we propose a new type of black-box NLP adversarial attack that an attacker can choose a similar domain and transfer the adversarial examples to the target domain and cause poor performance in target model. Based on domain adaptation theory, we then propose a defensive strategy, called Learn2Weight, which trains to predict the weight adjustments for target model in order to defense the attack of similar-domain adversarial examples. Using Amazon multi-domain sentiment classification dataset, we empirically show that Learn2Weight model is effective against the attack compared to standard black-box defense methods such as adversarial training and defense distillation. This work contributes to the growing literature on machine learning safety.", "keywords": "adversarial attack;robustness;domain adaptation;privacy-preserving machine learning", "primary_area": "", "supplementary_material": "", "author": "Siddhartha Datta", "authorids": "~Siddhartha_Datta1", "gender": "", "homepage": "http://siddharthadatta.ml/", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~Siddhartha_Datta1", "aff": "University of Oxford", "aff_domain": "ox.ac.uk", "position": "PhD student", "bibtex": "@misc{\ndatta2021learnweight,\ntitle={Learn2Weight: Weights Transfer Defense against Similar-domain Adversarial Attacks},\nauthor={Siddhartha Datta},\nyear={2021},\nurl={https://openreview.net/forum?id=1-j4VLSHApJ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer2", "site": "https://openreview.net/forum?id=1-j4VLSHApJ", "pdf_size": 0, "rating": "3;4;5", "confidence": "3;3;5", "wc_review": "434;764;497", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "641;750;703", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 565.0, 143.04544732356916 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 698.0, 44.63929509598765 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.8660254037844387, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11636176265782745179&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "10XWPuAro86", "title": "Hamiltonian Q-Learning: Leveraging Importance-sampling for Data Efficient RL", "track": "main", "status": "Reject", "tldr": "", "abstract": "Model-free reinforcement learning (RL), in particular $Q$-learning is widely used to learn optimal policies for a variety of planning and control problems. However, when the underlying state-transition dynamics are stochastic and high-dimensional, $Q$-learning requires a large amount of data and incurs a prohibitively high computational cost. In this paper, we introduce Hamiltonian $Q$-Learning, a data efficient modification of the $Q$-learning approach, which adopts an importance-sampling based technique for computing the $Q$ function. To exploit stochastic structure of the state-transition dynamics, we employ Hamiltonian Monte Carlo to update $Q$ function estimates by approximating the expected future rewards using $Q$ values associated with a subset of next states. Further, to exploit the latent low-rank structure of the dynamic system, Hamiltonian $Q$-Learning uses a matrix completion algorithm to reconstruct the updated $Q$ function from $Q$ value updates over a much smaller subset of state-action pairs. By providing an efficient way to apply $Q$-learning in stochastic, high-dimensional problems, the proposed approach broadens the scope of RL algorithms for real-world applications, including classical control tasks and environmental monitoring.", "keywords": "Data efficient RL;$Q$-Learning;Hamiltonian Monte Carlo", "primary_area": "", "supplementary_material": "", "author": "Udari Madhushani;Biswadip Dey;Naomi Leonard;Amit Chakraborty", "authorids": "~Udari_Madhushani1;~Biswadip_Dey2;~Naomi_Leonard1;~Amit_Chakraborty2", "gender": "F;M;F;M", "homepage": "https://udarimadhu.github.io/;https://d-biswa.github.io/;https://www.princeton.edu/~naomi/;", "dblp": ";;;02/3815", "google_scholar": "sN7grTMAAAAJ;jdLBoY8AAAAJ;;https://scholar.google.com/citations?hl=en", "orcid": ";0000-0003-1140-1363;;", "linkedin": ";biswadip-dey/;;", "or_profile": "~Udari_Madhushani1;~Biswadip_Dey2;~Naomi_Leonard1;~Amit_Chakraborty2", "aff": "Princeton University;Siemens Corporate Research;Princeton University;Siemens Corporate Research", "aff_domain": "princeton.edu;siemens.com;princeton.edu;siemens.com", "position": "PhD student;Senior Key Expert;Full Professor;Head, Predictive Analytics", "bibtex": "@misc{\nmadhushani2021hamiltonian,\ntitle={Hamiltonian Q-Learning: Leveraging Importance-sampling for Data Efficient {\\{}RL{\\}}},\nauthor={Udari Madhushani and Biswadip Dey and Naomi Leonard and Amit Chakraborty},\nyear={2021},\nurl={https://openreview.net/forum?id=10XWPuAro86}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=10XWPuAro86", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;4;3;4", "wc_review": "616;344;142;181", "wc_reply_reviewers": "0;46;53;34", "wc_reply_authors": "476;690;379;855", "reply_reviewers": "0;1;2;1", "reply_authors": "2;3;3;2", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 320.75, 186.54406315935117 ], "wc_reply_reviewers_avg": [ 33.25, 20.363877332178173 ], "wc_reply_authors_avg": [ 600.0, 185.29840798020905 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.5, 0.5 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11025055666696072169&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0;1", "aff_unique_norm": "Princeton University;Siemens AG", "aff_unique_dep": ";Corporate Research", "aff_unique_url": "https://www.princeton.edu;https://www.siemens.com/research", "aff_unique_abbr": "Princeton;Siemens", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;1", "aff_country_unique": "United States;Germany" }, { "id": "14nC8HNd4Ts", "title": "Synthesising Realistic Calcium Traces of Neuronal Populations Using GAN", "track": "main", "status": "Reject", "tldr": "", "abstract": "Calcium imaging has become a powerful and popular technique to monitor the activity of large populations of neurons in vivo. However, for ethical considerations and despite recent technical developments, recordings are still constrained to a limited number of trials and animals. This limits the amount of data available from individual experiments and hinders the development of analysis techniques and models for more realistic sizes of neuronal populations. The ability to artificially synthesize realistic neuronal calcium signals could greatly alleviate this problem by scaling up the number of trials. Here, we propose a Generative Adversarial Network (GAN) model to generate realistic calcium signals as seen in neuronal somata with calcium imaging. To this end, we propose CalciumGAN, a model based on the WaveGAN architecture and train it on calcium fluorescent signals with the Wasserstein distance. We test the model on artificial data with known ground-truth and show that the distribution of the generated signals closely resembles the underlying data distribution. Then, we train the model on real calcium traces recorded from the primary visual cortex of behaving mice and confirm that the deconvolved spike trains match the statistics of the recorded data. Together, these results demonstrate that our model can successfully generate realistic calcium traces, thereby providing the means to augment existing datasets of neuronal activity for enhanced data exploration and modelling.", "keywords": "calcium imaging;calcium traces;generative adversarial networks;spike train analysis", "primary_area": "", "supplementary_material": "/attachment/93863f9547cd6030e11a8f6b15cb53298223ea18.zip", "author": "Bryan M. Li;Theoklitos Amvrosiadis;Nathalie Rochefort;Arno Onken", "authorids": "~Bryan_M._Li1;t.amvrosiadis@ed.ac.uk;n.rochefort@ed.ac.uk;~Arno_Onken1", "gender": "M;;;M", "homepage": "https://bryanli.io;;;https://homepages.inf.ed.ac.uk/aonken/", "dblp": "213/8145;;;15/2035", "google_scholar": "QQrzFdAAAAAJ;;;JQh31ekAAAAJ", "orcid": "0000-0003-3144-4838;;;0000-0001-7387-5535", "linkedin": ";;;", "or_profile": "~Bryan_M._Li1;t.amvrosiadis@ed.ac.uk;n.rochefort@ed.ac.uk;~Arno_Onken1", "aff": "University of Edinburgh;;;University of Edinburgh", "aff_domain": "ed.ac.uk;;;ed.ac.uk", "position": "PhD student;;;Assistant Professor", "bibtex": "@misc{\nli2021synthesising,\ntitle={Synthesising Realistic Calcium Traces of Neuronal Populations Using {\\{}GAN{\\}}},\nauthor={Bryan M. Li and Theoklitos Amvrosiadis and Nathalie Rochefort and Arno Onken},\nyear={2021},\nurl={https://openreview.net/forum?id=14nC8HNd4Ts}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=14nC8HNd4Ts", "pdf_size": 0, "rating": "3;4;5", "confidence": "5;4;4", "wc_review": "373;303;421", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "302;632;411", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 365.6666666666667, 48.45157949495099 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 448.3333333333333, 137.28397171151803 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:4-i918fP1cIJ:scholar.google.com/&scioq=Synthesising+Realistic+Calcium+Traces+of+Neuronal+Populations+Using+GAN&hl=en&as_sdt=0,33", "gs_version_total": 4, "aff_unique_index": "0;0", "aff_unique_norm": "University of Edinburgh", "aff_unique_dep": "", "aff_unique_url": "https://www.ed.ac.uk", "aff_unique_abbr": "Edinburgh", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "160xFQdp7HR", "title": "Self-Organizing Intelligent Matter: A blueprint for an AI generating algorithm", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose an artificial life framework aimed at facilitating the emergence of intelligent organisms. In this framework there is no explicit notion of an agent: instead there is an environment made of atomic elements. These elements contain neural operations and interact through exchanges of information and through physics-like rules contained in the environment. We discuss how an evolutionary process can lead to the emergence of different organisms made of many such atomic elements which can coexist and thrive in the environment. We discuss how this forms the basis of a general AI generating algorithm. We provide a simplified implementation of such system and discuss what advances need to be made to scale it up further.", "keywords": "Artificial Life;AI Generating Algorithms", "primary_area": "", "supplementary_material": "", "author": "Karol Gregor;Frederic Besse", "authorids": "~Karol_Gregor1;~Frederic_Besse1", "gender": ";", "homepage": ";", "dblp": "51/7660;128/7851", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Karol_Gregor1;~Frederic_Besse1", "aff": "Google;", "aff_domain": "google.com;", "position": "Researcher;", "bibtex": "@misc{\ngregor2021selforganizing,\ntitle={Self-Organizing Intelligent Matter: A blueprint for an {\\{}AI{\\}} generating algorithm},\nauthor={Karol Gregor and Frederic Besse},\nyear={2021},\nurl={https://openreview.net/forum?id=160xFQdp7HR}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=160xFQdp7HR", "pdf_size": 0, "rating": "3;4;5;8", "confidence": "4;3;4;1", "wc_review": "735;359;281;421", "wc_reply_reviewers": "479;0;50;0", "wc_reply_authors": "777;309;354;267", "reply_reviewers": "1;0;1;0", "reply_authors": "2;1;2;1", "rating_avg": [ 5.0, 1.8708286933869707 ], "confidence_avg": [ 3.0, 1.224744871391589 ], "wc_review_avg": [ 449.0, 172.41229654522905 ], "wc_reply_reviewers_avg": [ 132.25, 201.23416086738354 ], "wc_reply_authors_avg": [ 426.75, 204.54385226645167 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8728715609439694, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8969881027657710876&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "16WMyqeYgw", "title": "Leveraging the Variance of Return Sequences for Exploration Policy", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "This paper introduces a novel method for constructing an upper bound for exploration policy using either the weighted variance of return sequences or the weighted temporal difference (TD) error. We demonstrate that the variance of the return sequence for a specific state-action pair is an important information source that can be leveraged to guide exploration in reinforcement learning. The intuition is that fluctuation in the return sequence indicates greater uncertainty in the near future returns. This divergence occurs because of the cyclic nature of value-based reinforcement learning; the evolving value function begets policy improvements which in turn modify the value function. Although both variance and TD errors capture different aspects of this uncertainty, our analysis shows that both can be valuable to guide exploration. We propose a two-stream network architecture to estimate weighted variance/TD errors within DQN agents for our exploration method and show that it outperforms the baseline on a wide range of Atari games.", "keywords": "Reinforcement Learning;Deep Reinforcement Learning;Exploration;Temporal Difference Error;Variance", "primary_area": "", "supplementary_material": "/attachment/4b038aabe32bad01ab7021946c8b357ec0152ea9.zip", "author": "Zerong Xi;Gita Sukthankar", "authorids": "~Zerong_Xi1;~Gita_Sukthankar1", "gender": "M;F", "homepage": ";http://www.eecs.ucf.edu/~gitars/", "dblp": ";54/1919", "google_scholar": ";087P6LMAAAAJ", "orcid": "0000-0002-6905-7860;0000-0002-6863-6609", "linkedin": "zerong-xi-08097974/;", "or_profile": "~Zerong_Xi1;~Gita_Reese_Sukthankar1", "aff": "University of Central Florida;Computer Science Department, University of Central Florida", "aff_domain": "ucf.edu;ucf.edu", "position": "PhD student;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=16WMyqeYgw", "pdf_size": 0, "rating": "2;4;5;5", "confidence": "5;4;4;3", "wc_review": "427;345;678;359", "wc_reply_reviewers": "132;0;0;0", "wc_reply_authors": "996;0;0;624", "reply_reviewers": "1;0;0;0", "reply_authors": "2;0;0;1", "rating_avg": [ 4.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 452.25, 133.97644382502472 ], "wc_reply_reviewers_avg": [ 33.0, 57.15767664977295 ], "wc_reply_authors_avg": [ 405.0, 425.820384669405 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 0.75, 0.82915619758885 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8660254037844386, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:lL_P-92PJT8J:scholar.google.com/&scioq=Leveraging+the+Variance+of+Return+Sequences+for+Exploration+Policy&hl=en&as_sdt=0,5", "gs_version_total": 4, "aff_unique_index": "0;0", "aff_unique_norm": "University of Central Florida", "aff_unique_dep": "", "aff_unique_url": "https://www.ucf.edu", "aff_unique_abbr": "UCF", "aff_campus_unique_index": "1", "aff_campus_unique": ";Orlando", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Probing BERT in Hyperbolic Spaces", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2905", "id": "17VnwXYZyhH", "poster": "", "openreview": "https://openreview.net/forum?id=17VnwXYZyhH", "slides": "https://iclr.cc/virtual/2021/poster/2905", "video": "https://iclr.cc/virtual/2021/poster/2905", "author_site": "Boli Chen, Yao Fu, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing", "tldr": "", "abstract": "Recently, a variety of probing tasks are proposed to discover linguistic properties learned in contextualized word embeddings. Many of these works implicitly assume these embeddings lay in certain metric spaces, typically the Euclidean space. This work considers a family of geometrically special spaces, the hyperbolic spaces, that exhibit better inductive biases for hierarchical structures and may better reveal linguistic hierarchies encoded in contextualized representations. We introduce a $\\textit{Poincar\u00e9 probe}$, a structural probe projecting these embeddings into a Poincar\u00e9 subspace with explicitly defined hierarchies. We focus on two probing objectives: (a) dependency trees where the hierarchy is defined as head-dependent structures; (b) lexical sentiments where the hierarchy is defined as the polarity of words (positivity and negativity). We argue that a key desideratum of a probe is its sensitivity to the existence of linguistic structures. We apply our probes on BERT, a typical contextualized embedding model. In a syntactic subspace, our probe better recovers tree structures than Euclidean probes, revealing the possibility that the geometry of BERT syntax may not necessarily be Euclidean. In a sentiment subspace, we reveal two possible meta-embeddings for positive and negative sentiments and show how lexically-controlled contextualization would change the geometric localization of embeddings. We demonstrate the findings with our Poincar\u00e9 probe via extensive experiments and visualization. Our results can be reproduced at https://github.com/FranxYao/PoincareProbe", "keywords": "Hyperbolic;BERT;Probe;Syntax;Sentiment", "primary_area": "", "supplementary_material": "/attachment/bcd581e19eda1598ab01d192fa7310a574512e1e.zip", "author": "Boli Chen;Yao Fu;Guangwei Xu;Pengjun Xie;Chuanqi Tan;Mosha Chen;Liping Jing", "authorids": "~Boli_Chen1;~Yao_Fu3;kunka.xgw@taobao.com;chengchen.xpj@taobao.com;chuanqi.tcq@alibaba-inc.com;chenmosha.cms@alibaba-inc.com;~Liping_Jing3", "gender": "M;M;;;;;", "homepage": ";https://franxyao.github.io/;;;;;", "dblp": "143/5757;;;;;;", "google_scholar": "P3IMdZ4AAAAJ;liSP4cEAAAAJ;;;;;", "orcid": ";;;;;;", "linkedin": ";;;;;;", "or_profile": "~Boli_Chen1;~Yao_Fu3;kunka.xgw@taobao.com;chengchen.xpj@taobao.com;chuanqi.tcq@alibaba-inc.com;chenmosha.cms@alibaba-inc.com;~Liping_Jing3", "aff": "Beijing Jiaotong University;University of Edinburgh;;;;;", "aff_domain": "bjtu.edu.cn;ed.ac.uk;;;;;", "position": "MS student;PhD student;;;;;", "bibtex": "@inproceedings{\nchen2021probing,\ntitle={Probing {\\{}BERT{\\}} in Hyperbolic Spaces},\nauthor={Boli Chen and Yao Fu and Guangwei Xu and Pengjun Xie and Chuanqi Tan and Mosha Chen and Liping Jing},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=17VnwXYZyhH}\n}", "github": "[![github](/images/github_icon.svg) FranxYao/PoincareProbe](https://github.com/FranxYao/PoincareProbe)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;3;3;3", "wc_review": "473;451;397;620", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "575;1215;972;745", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;2;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 485.25, 82.56626126935869 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 876.75, 240.7782953258038 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 61, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17283548434643857820&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=17VnwXYZyhH", "email": "bjtu.edu.cn;ed.ac.uk;;;;;", "author_num": 7, "aff_unique_index": "0;1", "aff_unique_norm": "Beijing Jiao Tong University;University of Edinburgh", "aff_unique_dep": ";", "aff_unique_url": "http://www.njtu.edu.cn/en;https://www.ed.ac.uk", "aff_unique_abbr": "BJTU;Edinburgh", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "China;United Kingdom" }, { "title": "No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3041", "id": "193sEnKY1ij", "poster": "", "openreview": "https://openreview.net/forum?id=193sEnKY1ij", "slides": "https://iclr.cc/virtual/2021/poster/3041", "video": "https://iclr.cc/virtual/2021/poster/3041", "author_site": "Shyamgopal Karthik, Ameya Prabhu, Puneet Dokania, Vineet Gandhi", "tldr": "", "abstract": "There has been increasing interest in building deep hierarchy-aware classifiers that aim to quantify and reduce the severity of mistakes, and not just reduce the number of errors. The idea is to exploit the label hierarchy (e.g., the WordNet ontology) and consider graph distances as a proxy for mistake severity. Surprisingly, on examining mistake-severity distributions of the top-1 prediction, we find that current state-of-the-art hierarchy-aware deep classifiers do not always show practical improvement over the standard cross-entropy baseline in making better mistakes. The reason for the reduction in average mistake-severity can be attributed to the increase in low-severity mistakes, which may also explain the noticeable drop in their accuracy. To this end, we use the classical Conditional Risk Minimization (CRM) framework for hierarchy-aware classification. Given a cost matrix and a reliable estimate of likelihoods (obtained from a trained network), CRM simply amends mistakes at inference time; it needs no extra hyperparameters and requires adding just a few lines of code to the standard cross-entropy baseline. It significantly outperforms the state-of-the-art and consistently obtains large reductions in the average hierarchical distance of top-$k$ predictions across datasets, with very little loss in accuracy. CRM, because of its simplicity, can be used with any off-the-shelf trained model that provides reliable likelihood estimates.", "keywords": "Hierarchy-Aware Classification;Conditional Risk Minimization;Post-Hoc Correction", "primary_area": "", "supplementary_material": "", "author": "Shyamgopal Karthik;Ameya Prabhu;Puneet K. Dokania;Vineet Gandhi", "authorids": "~Shyamgopal_Karthik1;~Ameya_Prabhu1;~Puneet_K._Dokania1;~Vineet_Gandhi2", "gender": ";M;M;M", "homepage": "https://sgk98.github.io/;https://drimpossible.github.io/;http://puneetkdokania.github.io/;https://faculty.iiit.ac.in/~vgandhi/", "dblp": "251/8983;181/4512;150/4211;117/2021", "google_scholar": "MofhemMAAAAJ;0kK7sSAAAAAJ;https://scholar.google.fr/citations?user=WsM7ybkAAAAJ;https://scholar.google.fr/citations?user=PVlBz8oAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Shyamgopal_Karthik1;~Ameya_Prabhu1;~Puneet_Dokania1;~Vineet_Gandhi1", "aff": "International Institute of Information Technology Hyderabad;University of Oxford;University of Oxford;International Institute of Information Technology Hyderabad", "aff_domain": "iiit.ac.in;ox.ac.uk;oxford.ac.uk;iiit.ac.in", "position": "MS student;PhD student;Senior Researcher;Assistant Professor", "bibtex": "@inproceedings{\nkarthik2021no,\ntitle={No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks},\nauthor={Shyamgopal Karthik and Ameya Prabhu and Puneet K. Dokania and Vineet Gandhi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=193sEnKY1ij}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "3;4;2;4", "wc_review": "544;213;198;296", "wc_reply_reviewers": "212;140;31;0", "wc_reply_authors": "914;290;196;709", "reply_reviewers": "1;1;1;0", "reply_authors": "2;1;1;2", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 312.75, 138.63148091252577 ], "wc_reply_reviewers_avg": [ 95.75, 84.90104534103217 ], "wc_reply_authors_avg": [ 527.25, 295.2214211401334 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0909090909090909, "gs_citation": 27, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7455201941557048589&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=193sEnKY1ij", "email": "iiit.ac.in;ox.ac.uk;oxford.ac.uk;iiit.ac.in", "author_num": 4, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "International Institute of Information Technology;University of Oxford", "aff_unique_dep": ";", "aff_unique_url": "https://iiit Hyderabad.ac.in;https://www.ox.ac.uk", "aff_unique_abbr": "IIIT Hyderabad;Oxford", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hyderabad;", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "India;United Kingdom" }, { "id": "19drPzGV691", "title": "Distributional Reinforcement Learning for Risk-Sensitive Policies", "track": "main", "status": "Reject", "tldr": "", "abstract": "We address the problem of learning a risk-sensitive policy based on the CVaR risk measure using distributional reinforcement learning. In particular, we show that applying the distributional Bellman optimality operator with respect to a risk-based action-selection strategy overestimates the dynamic, Markovian CVaR. The resulting policies can however still be overly conservative and one often prefers to learn an optimal policy based on the static, non-Markovian CVaR. To this end, we propose a modification to the existing algorithm and show that it can indeed learn a proper CVaR-optimized policy. Our proposed approach is a simple extension of standard distributional RL algorithms and can therefore take advantage of many of the recent advances in deep RL. On both synthetic and real data, we empirically show that our proposed algorithm is able to produce a family of risk-averse policies that achieves a better tradeoff between risk and the expected return.\n", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/3bb9ab33ce7d20453dfa26ea46257ed7507f2016.zip", "author": "Shiau Hong Lim;Ilyas Malik", "authorids": "~Shiau_Hong_Lim1;malikilyas1996@gmail.com", "gender": ";", "homepage": ";", "dblp": "53/3777;", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Shiau_Hong_Lim1;malikilyas1996@gmail.com", "aff": "IBM Research;", "aff_domain": "ibm.com;", "position": "Research staff member;", "bibtex": "@misc{\nlim2021distributional,\ntitle={Distributional Reinforcement Learning for Risk-Sensitive Policies},\nauthor={Shiau Hong Lim and Ilyas Malik},\nyear={2021},\nurl={https://openreview.net/forum?id=19drPzGV691}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=19drPzGV691", "pdf_size": 0, "rating": "5;5;5;7", "confidence": "3;4;3;4", "wc_review": "228;215;334;298", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "152;125;247;173", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 268.75, 49.14964394581104 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 174.25, 45.31762902006238 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 30, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6407034545221004599&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "IBM", "aff_unique_dep": "IBM Research", "aff_unique_url": "https://www.ibm.com/research", "aff_unique_abbr": "IBM", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "GANs Can Play Lottery Tickets Too", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2934", "id": "1AoMhc_9jER", "poster": "", "openreview": "https://openreview.net/forum?id=1AoMhc_9jER", "slides": "https://iclr.cc/virtual/2021/poster/2934", "video": "https://iclr.cc/virtual/2021/poster/2934", "author_site": "Xuxi Chen, Zhenyu Zhang, Yongduo Sui, Tianlong Chen", "tldr": "", "abstract": "Deep generative adversarial networks (GANs) have gained growing popularity in numerous scenarios, while usually suffer from high parameter complexities for resource-constrained real-world applications. However, the compression of GANs has less been explored. A few works show that heuristically applying compression techniques normally leads to unsatisfactory results, due to the notorious training instability of GANs. In parallel, the lottery ticket hypothesis shows prevailing success on discriminative models, in locating sparse matching subnetworks capable of training in isolation to full model performance. In this work, we for the first time study the existence of such trainable matching subnetworks in deep GANs. For a range of GANs, we certainly find matching subnetworks at $67\\%$-$74\\%$ sparsity. We observe that with or without pruning discriminator has a minor effect on the existence and quality of matching subnetworks, while the initialization weights used in the discriminator plays a significant role. We then show the powerful transferability of these subnetworks to unseen tasks. Furthermore, extensive experimental results demonstrate that our found subnetworks substantially outperform previous state-of-the-art GAN compression approaches in both image generation (e.g. SNGAN) and image-to-image translation GANs (e.g. CycleGAN). Codes available at https://github.com/VITA-Group/GAN-LTH.", "keywords": "lottery tickets;GAN compression;generative adversarial networks", "primary_area": "", "supplementary_material": "/attachment/e4c32249929dbb1120b1f1ef4f9162325b0e9b6f.zip", "author": "Xuxi Chen;Zhenyu Zhang;Yongduo Sui;Tianlong Chen", "authorids": "~Xuxi_Chen1;~Zhenyu_Zhang4;~Yongduo_Sui1;~Tianlong_Chen1", "gender": "Unspecified;M;M;M", "homepage": ";https://zhenyu.gallery;https://yongduosui.github.io/;https://tianlong-chen.github.io", "dblp": "267/9662;01/1844-15;277/5175;", "google_scholar": "afsDlKYAAAAJ;ZLyJRxoAAAAJ;VD9g6ogAAAAJ;LE3ctn0AAAAJ", "orcid": ";;0000-0003-4492-147X;0000-0001-7774-8197", "linkedin": ";zhenyu-allen-zhang-a9b1391a3/;yongduosui/;tianlong-chen-783862167/", "or_profile": "~Xuxi_Chen1;~Zhenyu_Zhang4;~Yongduo_Sui1;~Tianlong_Chen1", "aff": ";University of Science and Technology of China;University of Science and Technology of China;University of Texas, Austin", "aff_domain": ";ustc.edu;ustc.edu.cn;utexas.edu", "position": ";MS student;MS student;PhD student", "bibtex": "@inproceedings{\nchen2021gans,\ntitle={{\\{}GAN{\\}}s Can Play Lottery Tickets Too},\nauthor={Xuxi Chen and Zhenyu Zhang and Yongduo Sui and Tianlong Chen},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=1AoMhc_9jER}\n}", "github": "[![github](/images/github_icon.svg) VITA-Group/GAN-LTH](https://github.com/VITA-Group/GAN-LTH)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer5;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "3;4;4;3", "wc_review": "231;430;1634;216", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "497;824;2268;195", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;4;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 627.75, 587.0674471472593 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 946.0, 795.0110062131215 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 1.224744871391589 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 66, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1236790394387307114&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=1AoMhc_9jER", "email": ";ustc.edu;ustc.edu.cn;utexas.edu", "author_num": 4, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of Science and Technology of China;University of Texas at Austin", "aff_unique_dep": ";", "aff_unique_url": "http://www.ustc.edu.cn;https://www.utexas.edu", "aff_unique_abbr": "USTC;UT Austin", "aff_campus_unique_index": "1", "aff_campus_unique": ";Austin", "aff_country_unique_index": "0;0;1", "aff_country_unique": "China;United States" }, { "id": "1AyPW2Emp6", "title": "Tight Second-Order Certificates for Randomized Smoothing", "track": "main", "status": "Reject", "tldr": "", "abstract": "Randomized smoothing is a popular way of providing robustness guarantees against adversarial attacks: randomly-smoothed functions have a universal Lipschitz-like bound, allowing for robustness certificates to be easily computed. In this work, we show that there also exists a universal curvature-like bound for Gaussian random smoothing: given the exact value and gradient of a smoothed function, we compute a lower bound on the distance of a point to its closest adversarial example, called the Second-order Smoothing (SoS) robustness certificate. In addition to proving the correctness of this novel certificate, we show that SoS certificates are realizable and therefore tight. Interestingly, we show that the maximum achievable benefits, in terms of certified robustness, from using the additional information of the gradient norm are relatively small: because our bounds are tight, this is a fundamental negative result. The gain of SoS certificates further diminishes if we consider the estimation error of the gradient norms, for which we have developed an estimator. We therefore additionally develop a variant of Gaussian smoothing, called Gaussian dipole smoothing, which provides similar bounds to randomized smoothing with gradient information, but with much-improved sample efficiency. This allows us to achieve (marginally) improved robustness certificates on high-dimensional datasets such as CIFAR-10 and ImageNet.", "keywords": "certificates;adversarial;robustness;defenses;smoothing;curvature", "primary_area": "", "supplementary_material": "/attachment/4631f9792e3c5ef7eb745234d58acf53077262a5.zip", "author": "Alexander Levine;Aounon Kumar;Tom Goldstein;Soheil Feizi", "authorids": "~Alexander_Levine2;aounon@umd.edu;~Tom_Goldstein1;~Soheil_Feizi2", "gender": ";;M;M", "homepage": ";;https://www.cs.umd.edu/~tomg/;https://www.cs.umd.edu/~sfeizi/", "dblp": ";;25/8184;57/2132", "google_scholar": ";;KmSuVtgAAAAJ;lptAmrMAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Alexander_Levine2;aounon@umd.edu;~Tom_Goldstein1;~Soheil_Feizi2", "aff": ";;University of Maryland, College Park;University of Maryland, College Park", "aff_domain": ";;umd.edu;umd.edu", "position": ";;Associate Professor;Assistant Professor", "bibtex": "@misc{\nlevine2021tight,\ntitle={Tight Second-Order Certificates for Randomized Smoothing},\nauthor={Alexander Levine and Aounon Kumar and Tom Goldstein and Soheil Feizi},\nyear={2021},\nurl={https://openreview.net/forum?id=1AyPW2Emp6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=1AyPW2Emp6", "pdf_size": 0, "rating": "4;5;6", "confidence": "3;5;3", "wc_review": "385;254;448", "wc_reply_reviewers": "0;55;0", "wc_reply_authors": "259;291;250", "reply_reviewers": "0;1;0", "reply_authors": "1;2;1", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 362.3333333333333, 80.80566536794036 ], "wc_reply_reviewers_avg": [ 18.333333333333332, 25.927248643506744 ], "wc_reply_authors_avg": [ 266.6666666666667, 17.594190960528863 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5774367725598345954&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "University of Maryland", "aff_unique_dep": "", "aff_unique_url": "https://www/umd.edu", "aff_unique_abbr": "UMD", "aff_campus_unique_index": "0;0", "aff_campus_unique": "College Park", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "1EVb8XRBDNr", "title": "RMIX: Risk-Sensitive Multi-Agent Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Centralized training with decentralized execution (CTDE) has become an important paradigm in multi-agent reinforcement learning (MARL). Current CTDE-based methods rely on restrictive decompositions of the centralized value function across agents, which decomposes the global Q-value into individual Q values to guide individuals' behaviours. However, such expected, i.e., risk-neutral, Q value decomposition is not sufficient even with CTDE due to the randomness of rewards and the uncertainty in environments, which causes the failure of these methods to train coordinating agents in complex environments. To address these issues, we propose RMIX, a novel cooperative MARL method with the Conditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. Our main contributions are in three folds: (i) We first learn the return distributions of individuals to analytically calculate CVaR for decentralized execution; (ii) We then propose a dynamic risk level predictor for CVaR calculation to handle the temporal nature of the stochastic outcomes during executions; (iii) We finally propose risk-sensitive Bellman equation along with Individual-Global-MAX (IGM) for MARL training. Empirically, we show that our method significantly outperforms state-of-the-art methods on many challenging StarCraft II tasks, demonstrating significantly enhanced coordination and high sample efficiency.", "keywords": "Risk-sensitive learning;cooperative multi-agent reinforcement learning;reinforcement learning", "primary_area": "", "supplementary_material": "", "author": "Wei Qiu;Xinrun Wang;Runsheng Yu;Xu He;Rundong Wang;Bo An;Svetlana Obraztsova;Zinovi Rabinovich", "authorids": "~Wei_Qiu3;~Xinrun_Wang1;runshengyu@gmail.com;hexu0003@e.ntu.edu.sg;~Rundong_Wang1;~Bo_An2;~Svetlana_Obraztsova1;~Zinovi_Rabinovich1", "gender": "M;M;;;M;M;F;M", "homepage": ";https://rainwangphy.github.io/;;;;https://personal.ntu.edu.sg/boan/;https://sites.google.com/site/svobraztsova/;http://zinovi.zinovi.net", "dblp": "11/5166-1;199/6413;;;254/1228;42/6178-1.html;;93/4009", "google_scholar": "gszGlZIAAAAJ;ROANfPUAAAAJ;;;JEVpgE8AAAAJ;PEEpuNwAAAAJ;https://scholar.google.com.tw/citations?user=aorQUi0AAAAJ;https://scholar.google.com.tw/citations?user=JwJRnmAAAAAJ", "orcid": ";;;;;0000-0002-7064-7438;;", "linkedin": ";;;;;;;", "or_profile": "~Wei_Qiu3;~Xinrun_Wang1;runshengyu@gmail.com;hexu0003@e.ntu.edu.sg;~Rundong_Wang1;~Bo_An2;~Svetlana_Obraztsova1;~Zinovi_Rabinovich1", "aff": "Nanyang Technological University;Nanyang Technological University;;;Nanyang Technological University;Nanyang Technological University;Nanyang Technological University;Nanyang Technological University", "aff_domain": "ntu.edu.sg;ntu.edu.sg;;;ntu.edu.sg;ntu.edu.sg;ntu.edu.sg;ntu.edu.sg", "position": "PhD student;Postdoc;;;PhD student;Full Professor;Assistant Professor;Assistant Professor", "bibtex": "@misc{\nqiu2021rmix,\ntitle={{\\{}RMIX{\\}}: Risk-Sensitive Multi-Agent Reinforcement Learning},\nauthor={Wei Qiu and Xinrun Wang and Runsheng Yu and Xu He and Rundong Wang and Bo An and Svetlana Obraztsova and Zinovi Rabinovich},\nyear={2021},\nurl={https://openreview.net/forum?id=1EVb8XRBDNr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=1EVb8XRBDNr", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;2;3;4", "wc_review": "375;334;259;610", "wc_reply_reviewers": "613;0;4;77", "wc_reply_authors": "3334;882;800;1235", "reply_reviewers": "3;0;1;1", "reply_authors": "7;2;2;2", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 394.5, 131.18784242451738 ], "wc_reply_reviewers_avg": [ 173.5, 255.5900037168903 ], "wc_reply_authors_avg": [ 1562.75, 1035.6102971195294 ], "reply_reviewers_avg": [ 1.25, 1.0897247358851685 ], "reply_authors_avg": [ 3.25, 2.165063509461097 ], "replies_avg": [ 33, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.20751433915982243, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=861224197576803379&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;0;0", "aff_unique_norm": "Nanyang Technological University", "aff_unique_dep": "", "aff_unique_url": "https://www.ntu.edu.sg", "aff_unique_abbr": "NTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "Singapore" }, { "title": "Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3048", "id": "1Fqg133qRaI", "poster": "", "openreview": "https://openreview.net/forum?id=1Fqg133qRaI", "slides": "https://iclr.cc/virtual/2021/poster/3048", "video": "https://iclr.cc/virtual/2021/poster/3048", "author_site": "Bingchen Liu, Yizhe Zhu, Kunpeng Song, Ahmed Elgammal", "tldr": "", "abstract": "Training Generative Adversarial Networks (GAN) on high-fidelity images usually requires large-scale GPU-clusters and a vast number of training images. In this paper, we study the few-shot image synthesis task for GAN with minimum computing cost. We propose a light-weight GAN structure that gains superior quality on 1024^2 resolution. Notably, the model converges from scratch with just a few hours of training on a single RTX-2080 GPU, and has a consistent performance, even with less than 100 training samples. Two technique designs constitute our work, a skip-layer channel-wise excitation module and a self-supervised discriminator trained as a feature-encoder. With thirteen datasets covering a wide variety of image domains (The datasets and code are available at https://github.com/odegeasslbc/FastGAN-pytorch), we show our model's superior performance compared to the state-of-the-art StyleGAN2, when data and computing budget are limited.", "keywords": "deep learning;generative model;image synthesis;few-shot learning;generative adversarial network;self-supervised learning;unsupervised learning", "primary_area": "", "supplementary_material": "/attachment/f73e180ab3012dc8fc0b3e6ed46b882081eafe78.zip", "author": "Bingchen Liu;Yizhe Zhu;Kunpeng Song;Ahmed Elgammal", "authorids": "~Bingchen_Liu2;~Yizhe_Zhu2;~Kunpeng_Song1;~Ahmed_Elgammal1", "gender": "M;M;M;M", "homepage": ";http://yzzhu.net/;https://kunpengsong.github.io/;https://www.cs.rutgers.edu/~elgammal/Home.html", "dblp": ";http://dblp.uni-trier.de/pers/hd/z/Zhu:Yizhe;;e/AhmedMElgammal", "google_scholar": "uKdv6SUAAAAJ;hPXUR0cAAAAJ;JjHzj5cAAAAJ;https://scholar.google.com.tw/citations?user=DxQiCiIAAAAJ", "orcid": "0000-0002-2886-8915;;;", "linkedin": "bingchen-liu-71b38611a/;yizhe-ethan-zhu-171a06126/;;", "or_profile": "~Bingchen_Liu2;~Yizhe_Zhu2;~Kunpeng_Song1;~Ahmed_Elgammal1", "aff": "Rutgers University;;Rutgers University;Rutgers University, new brunswick", "aff_domain": "rutgers.edu;;rutgers.edu;rutgers.edu", "position": "PhD student;;PhD student;Full Professor", "bibtex": "@inproceedings{\nliu2021towards,\ntitle={Towards Faster and Stabilized {\\{}GAN{\\}} Training for High-fidelity Few-shot Image Synthesis},\nauthor={Bingchen Liu and Yizhe Zhu and Kunpeng Song and Ahmed Elgammal},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=1Fqg133qRaI}\n}", "github": "[![github](/images/github_icon.svg) odegeasslbc/FastGAN-pytorch](https://github.com/odegeasslbc/FastGAN-pytorch) + [![Papers with Code](/images/pwc_icon.svg) 6 community implementations](https://paperswithcode.com/paper/?openreview=1Fqg133qRaI)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "5;4;3;4", "wc_review": "636;1342;366;688", "wc_reply_reviewers": "155;699;0;0", "wc_reply_authors": "1421;2597;0;49", "reply_reviewers": "1;5;0;0", "reply_authors": "3;7;0;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 758.0, 358.6446709488376 ], "wc_reply_reviewers_avg": [ 213.5, 287.3573559176796 ], "wc_reply_authors_avg": [ 1016.75, 1075.979640838989 ], "reply_reviewers_avg": [ 1.5, 2.0615528128088303 ], "reply_authors_avg": [ 2.75, 2.680951323690902 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 328, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1230561477008611475&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=1Fqg133qRaI", "email": "rutgers.edu;;rutgers.edu;rutgers.edu", "author_num": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "Rutgers University", "aff_unique_dep": "", "aff_unique_url": "https://www.rutgers.edu", "aff_unique_abbr": "Rutgers", "aff_campus_unique_index": "1", "aff_campus_unique": ";New Brunswick", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Is Attention Better Than Matrix Decomposition?", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3364", "id": "1FvkSpWosOl", "poster": "", "openreview": "https://openreview.net/forum?id=1FvkSpWosOl", "slides": "https://iclr.cc/virtual/2021/poster/3364", "video": "https://iclr.cc/virtual/2021/poster/3364", "author_site": "Zhengyang Geng, Meng-Hao Guo, Hongxu None Chen, Xia Li, Ke Wei, Zhouchen Lin", "tldr": "", "abstract": "As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery. However, is hand-crafted attention irreplaceable when modeling the global context? Our intriguing finding is that self-attention is not better than the matrix decomposition~(MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank completion problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. Hamburgers with different MDs can perform favorably against the popular global context module self-attention when carefully coping with gradients back-propagated through MDs. Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context, including semantic segmentation and image generation, demonstrating significant improvements over self-attention and its variants. Code is available at https://github.com/Gsunshine/Enjoy-Hamburger.", "keywords": "attention models;matrix decomposition;computer vision", "primary_area": "", "supplementary_material": "", "author": "Zhengyang Geng;Meng-Hao Guo;Hongxu Chen;Xia Li;Ke Wei;Zhouchen Lin", "authorids": "~Zhengyang_Geng1;~Meng-Hao_Guo1;~Hongxu_Chen2;~Xia_Li3;~Ke_Wei1;~Zhouchen_Lin1", "gender": ";M;M;;M;M", "homepage": "https://gsunshine.github.io/;https://github.com/MenghaoGuo;https://github.com/NPCzzz;;https://makwei.github.io/;https://zhouchenlin.github.io", "dblp": "250/2651.html;281/7258;;;;l/ZhouchenLin", "google_scholar": "lNkw3QYAAAAJ;DnXVAgcAAAAJ;;;9nr1fe8AAAAJ;https://scholar.google.com.tw/citations?user=TanjFwoAAAAJ", "orcid": ";;;;;0000-0003-1493-7569", "linkedin": ";;;;;", "or_profile": "~Zhengyang_Geng1;~Meng-Hao_Guo1;~Hongxu_Chen2;~Xia_Li3;~Ke_Wei1;~Zhouchen_Lin1", "aff": "Peking University;Tsinghua University;Fudan University;;Fudan University;Peking University", "aff_domain": "pku.edu.cn;tsinghua.edu.cn;fudan.edu.cn;;fudan.edu.cn;pku.edu.cn", "position": "Visiting student;PhD student;PhD student;;Associate Professor;Professor", "bibtex": "@inproceedings{\ngeng2021is,\ntitle={Is Attention Better Than Matrix Decomposition?},\nauthor={Zhengyang Geng and Meng-Hao Guo and Hongxu Chen and Xia Li and Ke Wei and Zhouchen Lin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=1FvkSpWosOl}\n}", "github": "[![github](/images/github_icon.svg) Gsunshine/Enjoy-Hamburger](https://github.com/Gsunshine/Enjoy-Hamburger) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=1FvkSpWosOl)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;7;8;8", "confidence": "3;4;4;4", "wc_review": "352;142;787;265", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1756;133;790;1000", "reply_reviewers": "0;0;0;0", "reply_authors": "5;1;1;2", "rating_avg": [ 7.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 386.5, 242.96759043131658 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 919.75, 579.1296810732463 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 1.6393596310755 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.8703882797784891, "gs_citation": 199, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14362607193647727267&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=1FvkSpWosOl", "email": "pku.edu.cn;tsinghua.edu.cn;fudan.edu.cn;;fudan.edu.cn;pku.edu.cn", "author_num": 6, "aff_unique_index": "0;1;2;2;0", "aff_unique_norm": "Peking University;Tsinghua University;Fudan University", "aff_unique_dep": ";;", "aff_unique_url": "http://www.pku.edu.cn;https://www.tsinghua.edu.cn;https://www.fudan.edu.cn", "aff_unique_abbr": "Peking U;THU;Fudan", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "China" }, { "id": "1GTma8HwlYp", "title": "AUXILIARY TASK UPDATE DECOMPOSITION: THE GOOD, THE BAD AND THE NEUTRAL", "track": "main", "status": "Poster", "tldr": "", "abstract": "While deep learning has been very beneficial in data-rich settings, tasks with smaller training set\noften resort to pre-training or multitask learning to leverage data from other tasks. In this case,\ncareful consideration is needed to select tasks and model parameterizations such that updates from\nthe auxiliary tasks actually help the primary task. We seek to alleviate this burden by formulating a model-agnostic framework that performs fine-grained manipulation of the auxiliary task gradients. We propose to decompose auxiliary updates into directions which help, damage or leave the primary task loss unchanged. This allows weighting the update directions \ndifferently depending on their impact on the problem of interest. We present a novel and efficient algorithm for that\npurpose and show its advantage in practice. Our method leverages efficient automatic differentiation \nprocedures and randomized singular value decomposition for scalability. We show that our framework is \ngeneric and encompasses some prior work as particular cases. Our approach consistently outperforms strong and widely used baselines when leveraging out-of-distribution data for Text and Image classification tasks.", "keywords": "pre-training;multitask learning;deeplearning;gradient decomposition", "primary_area": "", "supplementary_material": "/attachment/0f0d441cd3fd43ccb236d60f9e2f0bb5385e32ff.zip", "author": "Lucio M. Dery;Yann Dauphin;David Grangier", "authorids": "~Lucio_M._Dery1;~Yann_Dauphin1;~David_Grangier1", "gender": "M;M;M", "homepage": "https://ldery.github.io/;https://www.dauphin.io;http://david.grangier.info/", "dblp": "211/7773;22/9988;57/1192", "google_scholar": "ggFzw0MAAAAJ;XSforroAAAAJ;CIQEGCYAAAAJ", "orcid": ";;0000-0002-8847-9532", "linkedin": ";;davidgrangier/", "or_profile": "~Lucio_M._Dery1;~Yann_Dauphin1;~David_Grangier1", "aff": "Carnegie Mellon University;Google;Google", "aff_domain": "cmu.edu;google.com;google.com", "position": "PhD student;Researcher;Researcher", "bibtex": "@inproceedings{\ndery2021auxiliary,\ntitle={{\\{}AUXILIARY{\\}} {\\{}TASK{\\}} {\\{}UPDATE{\\}} {\\{}DECOMPOSITION{\\}}: {\\{}THE{\\}} {\\{}GOOD{\\}}, {\\{}THE{\\}} {\\{}BAD{\\}} {\\{}AND{\\}} {\\{}THE{\\}} {\\{}NEUTRAL{\\}}},\nauthor={Lucio M. Dery and Yann Dauphin and David Grangier},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=1GTma8HwlYp}\n}", "github": "[![github](/images/github_icon.svg) ldery/ATTITTUD](https://github.com/ldery/ATTITTUD)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=1GTma8HwlYp", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "2;5;3;3", "wc_review": "209;445;421;565", "wc_reply_reviewers": "0;213;0;0", "wc_reply_authors": "110;853;187;605", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.25, 1.0897247358851685 ], "wc_review_avg": [ 410.0, 128.23026163897507 ], "wc_reply_reviewers_avg": [ 53.25, 92.23170550304272 ], "wc_reply_authors_avg": [ 438.75, 304.4243543148281 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.6622661785325219, "gs_citation": 30, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5872379773640363834&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;1", "aff_unique_norm": "Carnegie Mellon University;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.cmu.edu;https://www.google.com", "aff_unique_abbr": "CMU;Google", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "1IBgFQbj7y", "title": "Maximum Categorical Cross Entropy (MCCE): A noise-robust alternative loss function to mitigate racial bias in Convolutional Neural Networks (CNNs) by reducing overfitting", "track": "main", "status": "Reject", "tldr": "", "abstract": "Categorical Cross Entropy (CCE) is the most commonly used loss function in deep neural networks such as Convolutional Neural Networks (CNNs) for multi-class classification problems. In spite of the fact that CCE is highly susceptible to noise; CNN models trained without accounting for the unique noise characteristics of the input data, or noise introduced during model training, invariably suffer from overfitting affecting model generalizability. The lack of generalizability becomes especially apparent in the context of ethnicity/racial image classification problems encountered in the domain of computer vision. One such problem is the unintended discriminatory racial bias that CNN models trained using CCE fail to adequately address. In other words, CNN models trained using CCE offer a skewed representation of classification performance favoring lighter skin tones.\n\nIn this paper, we propose and empirically validate a novel noise-robust extension to the existing CCE loss function called Maximum Categorical Cross-Entropy (MCCE), which utilizes CCE loss and a novel reconstruction loss, calculated using the Maximum Entropy (ME) measures of the convolutional kernel weights and input training dataset. We compare the use of MCCE with CCE-trained models on two benchmarking datasets, colorFERET and UTKFace, using a Residual Network (ResNet) CNN architecture. MCCE-trained models reduce overfitting by 5.85% and 4.3% on colorFERET and UTKFace datasets respectively. In cross-validation testing, MCCE-trained models outperform CCE-trained models by 8.8% and 25.16% on the colorFERET and UTKFace datasets respectively. MCCE addresses and mitigates the persistent problem of inadvertent racial bias for facial recognition problems in the domain of computer vision.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/336a3e22416872b1679c1e01d45acd7c48a3b0d8.zip", "author": "Nidhi Gowdra;Roopak Sinha;Stephen MacDonell;WeiQi Yan", "authorids": "~Nidhi_Gowdra1;rsinha@aut.ac.nz;stephen.macdonell@aut.ac.nz;weiqi.yan@aut.ac.nz", "gender": ";;;", "homepage": ";;;", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Nidhi_Gowdra1;rsinha@aut.ac.nz;stephen.macdonell@aut.ac.nz;weiqi.yan@aut.ac.nz", "aff": ";;;", "aff_domain": ";;;", "position": ";;;", "bibtex": "@misc{\ngowdra2021maximum,\ntitle={Maximum Categorical Cross Entropy ({\\{}MCCE{\\}}): A noise-robust alternative loss function to mitigate racial bias in Convolutional Neural Networks ({\\{}CNN{\\}}s) by reducing overfitting},\nauthor={Nidhi Gowdra and Roopak Sinha and Stephen MacDonell and WeiQi Yan},\nyear={2021},\nurl={https://openreview.net/forum?id=1IBgFQbj7y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=1IBgFQbj7y", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;5;4;3", "wc_review": "389;381;268;123", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 290.25, 107.76682003288396 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.42640143271122083, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13485036809231672755&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0 }, { "title": "Uncertainty in Gradient Boosting via Ensembles", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2928", "id": "1Jv6b0Zq3qi", "poster": "", "openreview": "https://openreview.net/forum?id=1Jv6b0Zq3qi", "slides": "https://iclr.cc/virtual/2021/poster/2928", "video": "https://iclr.cc/virtual/2021/poster/2928", "author_site": "Andrey Malinin, Liudmila Prokhorenkova, Aleksei Ustimenko", "tldr": "", "abstract": "For many practical, high-risk applications, it is essential to quantify uncertainty in a model's predictions to avoid costly mistakes. While predictive uncertainty is widely studied for neural networks, the topic seems to be under-explored for models based on gradient boosting. However, gradient boosting often achieves state-of-the-art results on tabular data. This work examines a probabilistic ensemble-based framework for deriving uncertainty estimates in the predictions of gradient boosting classification and regression models. We conducted experiments on a range of synthetic and real datasets and investigated the applicability of ensemble approaches to gradient boosting models that are themselves ensembles of decision trees. Our analysis shows that ensembles of gradient boosting models successfully detect anomalous inputs while having limited ability to improve the predicted total uncertainty. Importantly, we also propose a concept of a virtual ensemble to get the benefits of an ensemble via only one gradient boosting model, which significantly reduces complexity. ", "keywords": "uncertainty;ensembles;gradient boosting;decision trees;knowledge uncertainty", "primary_area": "", "supplementary_material": "", "author": "Andrey Malinin;Liudmila Prokhorenkova;Aleksei Ustimenko", "authorids": "~Andrey_Malinin1;~Liudmila_Prokhorenkova1;austimenko@yandex-team.ru", "gender": "M;F;", "homepage": ";;", "dblp": "174/5705;45/11468;", "google_scholar": ";https://scholar.google.ru/citations?user=6JyZlSEAAAAJ;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Andrey_Malinin1;~Liudmila_Prokhorenkova1;austimenko@yandex-team.ru", "aff": "Yandex;Moscow Institute of Physics and Technology;", "aff_domain": "yandex.ru;mipt.edu;", "position": "Principal Researcher;Researcher;", "bibtex": "@inproceedings{\nmalinin2021uncertainty,\ntitle={Uncertainty in Gradient Boosting via Ensembles},\nauthor={Andrey Malinin and Liudmila Prokhorenkova and Aleksei Ustimenko},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=1Jv6b0Zq3qi}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "3;4;4;4", "wc_review": "385;191;321;242", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "580;284;311;59", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;2", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 284.75, 74.12953190193501 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 308.5, 184.77621600195195 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 124, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8376414671015226008&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=1Jv6b0Zq3qi", "email": "yandex.ru;mipt.edu;", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Yandex;Moscow Institute of Physics and Technology", "aff_unique_dep": ";", "aff_unique_url": "https://yandex.com;https://www.mipt.ru/en", "aff_unique_abbr": "Yandex;MIPT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Russian Federation" }, { "id": "1Kxxduqpd3E", "title": "Rotograd: Dynamic Gradient Homogenization for Multitask Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "GradNorm (Chen et al., 2018) is a broadly used gradient-based approach for training multitask networks, where different tasks share, and thus compete during learning, for the network parameters. GradNorm eases the fitting of all individual tasks by dynamically equalizing the contribution of each task to the overall gradient magnitude. However, it does not prevent the individual tasks\u2019 gradients from conflicting, i.e., pointing towards opposite directions, and thus resulting in a poor multitask performance. In this work we propose Rotograd, an extension to GradNorm that addresses this problem by dynamically homogenizing not only the gradient magnitudes but also their directions across tasks. For this purpose,Rotograd adds a layer of task-specific rotation matrices that aligns all the task gradients. Importantly, we then analyze Rotograd (and its predecessor) through the lens of game theory, providing theoretical guarantees on the algorithm stability and convergence. Finally, our experiments on several real-world datasets and network architectures show that Rotograd outperforms previous approaches for multitask learning.\n\n", "keywords": "multitask learning;deep learning;gradnorm", "primary_area": "", "supplementary_material": "", "author": "Adri\u00e1n Javaloy;Isabel Valera", "authorids": "~Adri\u00e1n_Javaloy1;~Isabel_Valera1", "gender": "M;F", "homepage": "https://adrianjav.github.io;https://ivaleram.github.io/", "dblp": "259/2011;126/1768.html", "google_scholar": "ne3evXwAAAAJ;https://scholar.google.es/citations?user=cpdQqpsAAAAJ", "orcid": "0000-0002-5184-4460;", "linkedin": "adrian-javaloy;", "or_profile": "~Adri\u00e1n_Javaloy1;~Isabel_Valera1", "aff": "Saarland University, Saarland University;Universit\u00e4t des Saarlandes", "aff_domain": "cs.uni-saarland.de;uni-saarland.de", "position": "PhD student;Full Professor", "bibtex": "@misc{\njavaloy2021rotograd,\ntitle={Rotograd: Dynamic Gradient Homogenization for Multitask Learning},\nauthor={Adri{\\'a}n Javaloy and Isabel Valera},\nyear={2021},\nurl={https://openreview.net/forum?id=1Kxxduqpd3E}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=1Kxxduqpd3E", "pdf_size": 0, "rating": "4;4;4", "confidence": "4;4;4", "wc_review": "893;272;635", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "703;474;277", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 600.0, 254.7273051716286 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 484.6666666666667, 174.07724977402674 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1", "aff_unique_norm": "Saarland University;Universit\u00e4t des Saarlandes", "aff_unique_dep": ";", "aff_unique_url": "https://www.uni-saarland.de;https://www.uni-saarland.de", "aff_unique_abbr": "UdS;UDS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "1MJPtHogkwX", "title": "A Multi-Modal and Multitask Benchmark in the Clinical Domain", "track": "main", "status": "Reject", "tldr": "", "abstract": "Healthcare represents one of the most promising application areas for machine learning algorithms, including modern methods based on deep learning. Modern deep learning algorithms perform best on large datasets and on unstructured modalities such as text or image data; advances in deep learning have often been driven by the availability of such large datasets. Here, we introduce Multi-Modal Multitask MIMIC-III (M3) \u2014 a dataset and benchmark for evaluating machine learning algorithms in the healthcare domain. This dataset contains multi-modal patient data collected from intensive care units \u2014 including physiological time series, clinical notes, ECG waveforms, and tabular inputs \u2014 and defines six clinical tasks \u2014 including predicting mortality, decompensation, readmission, and other outcomes \u2014 which serve as benchmarks for comparing algorithms. We introduce new multi-modal and multitask models for this dataset, and show that they outperform previous state-of-the-art results that only rely on a subset of all tasks and modalities. This highlights the potential of multitask and multi-modal learning to improve the performance of algorithms in the healthcare domain. More generally, we envision M3 as a general resource that will help accelerate research in applying machine learning to healthcare.", "keywords": "multi-modal;multitask;machine learning in healthcare;benchmark", "primary_area": "", "supplementary_material": "", "author": "Yong Huang;Edgar Mariano Marroquin;Volodymyr Kuleshov", "authorids": "~Yong_Huang3;~Edgar_Mariano_Marroquin1;vk379@cornell.edu", "gender": "M;M;", "homepage": ";https://www.cs.cornell.edu/~emarro/;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Yong_Huang3;~Edgar_Mariano_Marroquin1;vk379@cornell.edu", "aff": ";Department of Computer Science, Cornell University;", "aff_domain": ";cs.cornell.edu;", "position": ";PhD student;", "bibtex": "@misc{\nhuang2021a,\ntitle={A Multi-Modal and Multitask Benchmark in the Clinical Domain},\nauthor={Yong Huang and Edgar Mariano Marroquin and Volodymyr Kuleshov},\nyear={2021},\nurl={https://openreview.net/forum?id=1MJPtHogkwX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=1MJPtHogkwX", "pdf_size": 0, "rating": "5;5;5", "confidence": "5;4;4", "wc_review": "703;698;440", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "684;848;786", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 613.6666666666666, 122.81784162825133 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 772.6666666666666, 67.6132794320432 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:eSa1kxVmIpAJ:scholar.google.com/&scioq=A+Multi-Modal+and+Multitask+Benchmark+in+the+Clinical+Domain&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Cornell University", "aff_unique_dep": "Department of Computer Science", "aff_unique_url": "https://www.cornell.edu", "aff_unique_abbr": "Cornell", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "1NRMmEUyXMu", "title": "World Model as a Graph: Learning Latent Landmarks for Planning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Planning, the ability to analyze the structure of a problem in the large and decompose it into interrelated subproblems, is a hallmark of human intelligence. While deep reinforcement learning (RL) has shown great promise for solving relatively straightforward control tasks, it remains an open problem how to best incorporate planning into existing deep RL paradigms to handle increasingly complex environments. One prominent framework, Model-Based RL, learns a world model and plans using step-by-step virtual rollouts. This type of world model quickly diverges from reality when the planning horizon increases, thus struggling at long-horizon planning. How can we learn world models that endow agents with the ability to do temporally extended reasoning? In this work, we propose to learn graph-structured world models composed of sparse, multi-step transitions. We devise a novel algorithm to learn latent landmarks that are scattered (in terms of reachability) across the goal space as the nodes on the graph. In this same graph, the edges are the reachability estimates distilled from Q-functions. On a variety of high-dimensional continuous control tasks ranging from robotic manipulation to navigation, we demonstrate that our method, named L^{3}P, significantly outperforms prior work, and is oftentimes the only method capable of leveraging both the robustness of model-free RL and generalization of graph-search algorithms. We believe our work is an important step towards scalable planning in reinforcement learning. ", "keywords": "Reinforcement Learning;Planning", "primary_area": "", "supplementary_material": "", "author": "Lunjun Zhang;Ge Yang;Bradly C Stadie", "authorids": "~Lunjun_Zhang1;~Ge_Yang1;~Bradly_C_Stadie1", "gender": ";M;M", "homepage": "https://lunjunzhang.github.io/;http://www.episodeyang.com;", "dblp": "274/6535;48/4561-3;", "google_scholar": "OqD5GcgAAAAJ;vaQcF6kAAAAJ;", "orcid": ";0000-0001-7520-7055;", "linkedin": ";;", "or_profile": "~Lunjun_Zhang1;~Ge_Yang1;~Bradly_C_Stadie1", "aff": "Engineering Science, University of Toronto;Massachusetts Institute of Technology;", "aff_domain": "mail.utoronto.ca;mit.edu;", "position": "Undergrad student;Postdoc;", "bibtex": "@misc{\nzhang2021world,\ntitle={World Model as a Graph: Learning Latent Landmarks for Planning},\nauthor={Lunjun Zhang and Ge Yang and Bradly C Stadie},\nyear={2021},\nurl={https://openreview.net/forum?id=1NRMmEUyXMu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=1NRMmEUyXMu", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;5;4;3", "wc_review": "1067;795;410;591", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "438;661;435;313", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 715.75, 244.28607717182737 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 461.75, 125.60528452258687 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 89, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11617385762396360333&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1", "aff_unique_norm": "University of Toronto;Massachusetts Institute of Technology", "aff_unique_dep": "Engineering Science;", "aff_unique_url": "https://www.utoronto.ca;https://web.mit.edu", "aff_unique_abbr": "U of T;MIT", "aff_campus_unique_index": "0", "aff_campus_unique": "Toronto;", "aff_country_unique_index": "0;1", "aff_country_unique": "Canada;United States" }, { "title": "On the Dynamics of Training Attention Models", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2692", "id": "1OCTOShAmqB", "poster": "", "openreview": "https://openreview.net/forum?id=1OCTOShAmqB", "slides": "https://iclr.cc/virtual/2021/poster/2692", "video": "https://iclr.cc/virtual/2021/poster/2692", "author_site": "Haoye Lu, Yongyi Mao, Amiya Nayak", "tldr": "", "abstract": "The attention mechanism has been widely used in deep neural networks as a model component. By now, it has become a critical building block in many state-of-the-art natural language models. Despite its great success established empirically, the working mechanism of attention has not been investigated at a sufficient theoretical depth to date. In this paper, we set up a simple text classification task and study the dynamics of training a simple attention-based classification model using gradient descent. In this setting, we show that, for the discriminative words that the model should attend to, a persisting identity exists relating its embedding and the inner product of its key and the query. This allows us to prove that training must converge to attending to the discriminative words when the attention output is classified by a linear classifier. Experiments are performed, which validate our theoretical analysis and provide further insights.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Haoye Lu;Yongyi Mao;Amiya Nayak", "authorids": "~Haoye_Lu1;~Yongyi_Mao2;nayak@uottawa.ca", "gender": "M;;", "homepage": "https://haoyelu.github.io;;", "dblp": ";;", "google_scholar": "https://scholar.google.com/citations?hl=en;;", "orcid": "0000-0003-0933-2370;;", "linkedin": ";;", "or_profile": "~Haoye_Lu1;~Yongyi_Mao2;nayak@uottawa.ca", "aff": "University of Waterloo;;", "aff_domain": "uwaterloo.ca;;", "position": "PhD student;;", "bibtex": "@inproceedings{\nlu2021on,\ntitle={On the Dynamics of Training Attention Models},\nauthor={Haoye Lu and Yongyi Mao and Amiya Nayak},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=1OCTOShAmqB}\n}", "github": "[![github](/images/github_icon.svg) haoyelyu/On_the_Dynamics_of_Training_Attention_Models](https://github.com/haoyelyu/On_the_Dynamics_of_Training_Attention_Models)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "4;6;7;8", "confidence": "3;3;2;3", "wc_review": "253;881;320;598", "wc_reply_reviewers": "0;667;0;74", "wc_reply_authors": "331;2835;230;948", "reply_reviewers": "0;1;0;1", "reply_authors": "1;5;1;3", "rating_avg": [ 6.25, 1.479019945774904 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 513.0, 248.74585423680935 ], "wc_reply_reviewers_avg": [ 185.25, 279.77435104026245 ], "wc_reply_authors_avg": [ 1086.0, 1046.5187528181232 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.5, 1.6583123951777 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.29277002188455997, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15911549968138902325&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=1OCTOShAmqB", "email": "uwaterloo.ca;;", "author_num": 3, "aff_unique_index": "0", "aff_unique_norm": "University of Waterloo", "aff_unique_dep": "", "aff_unique_url": "https://uwaterloo.ca", "aff_unique_abbr": "UW", "aff_country_unique_index": "0", "aff_country_unique": "Canada" }, { "id": "1OCwJdJSnSA", "title": "Disentangled cyclic reconstruction for domain adaptation", "track": "main", "status": "Reject", "tldr": "", "abstract": "The domain adaptation problem involves learning a unique classification or regression model capable of performing on both a source and a target domain. Although the labels for the source data are available during training, the labels in the target domain are unknown. An effective way to tackle this problem lies in extracting insightful features invariant to the source and target domains. In this work, we propose splitting the information for each domain into a task-related representation and its complimentary context representation. We propose an original method to disentangle these two representations in the single-domain supervised case. We then adapt this method to the unsupervised domain adaptation problem. In particular, our method allows disentanglement in the target domain, despite the absence of training labels. This enables the isolation of task-specific information from both domains and a projection into a common representation. The task-specific representation allows efficient transfer of knowledge acquired from the source domain to the target domain. We validate the proposed method on several classical domain adaptation benchmarks and illustrate the benefits of disentanglement for domain adaptation.", "keywords": "Domain adaptation;Disentanglement", "primary_area": "", "supplementary_material": "", "author": "David Bertoin;Emmanuel Rachelson", "authorids": "~David_Bertoin1;~Emmanuel_Rachelson1", "gender": "M;M", "homepage": "https://davidbert.github.io/;https://personnel.isae-supaero.fr/emmanuel-rachelson", "dblp": ";52/6241", "google_scholar": "oAZZ-o4AAAAJ;https://scholar.google.fr/citations?user=KtG9BSgAAAAJ", "orcid": ";0000-0002-8559-1617", "linkedin": ";emmanuelrachelson/", "or_profile": "~David_Bertoin1;~Emmanuel_Rachelson1", "aff": "Institut Sup\u00e9rieur de l'A\u00e9ronautique et de l'Espace;Institut Sup\u00e9rieur de l'A\u00e9ronautique et de l'Espace", "aff_domain": "isae-supaero.fr;isae-supaero.fr", "position": "PhD student;Full Professor", "bibtex": "@misc{\nbertoin2021disentangled,\ntitle={Disentangled cyclic reconstruction for domain adaptation},\nauthor={David Bertoin and Emmanuel Rachelson},\nyear={2021},\nurl={https://openreview.net/forum?id=1OCwJdJSnSA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=1OCwJdJSnSA", "pdf_size": 0, "rating": "4;5;6", "confidence": "3;3;3", "wc_review": "437;405;365", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "905;644;457", "reply_reviewers": "0;0;0", "reply_authors": "4;4;4", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 402.3333333333333, 29.4542960458327 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 668.6666666666666, 183.72503609712228 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 4.0, 0.0 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Ywtk23pogYIJ:scholar.google.com/&scioq=Disentangled+cyclic+reconstruction+for+domain+adaptation&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Institut Sup\u00e9rieur de l'A\u00e9ronautique et de l'Espace", "aff_unique_dep": "", "aff_unique_url": "https://www.isae-supaero.fr", "aff_unique_abbr": "ISAE-SUPAERO", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "France" }, { "id": "1OP1kReyL56", "title": "Model Selection for Cross-Lingual Transfer using a Learned Scoring Function", "track": "main", "status": "Reject", "tldr": "", "abstract": "Transformers that are pre-trained on multilingual text corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer learning results. In the zero-shot cross-lingual transfer setting, only English training data is assumed, and the fine-tuned model is evaluated on another target language. No target-language validation data is assumed in this setting, however substantial variance has been observed in target language performance between different fine-tuning runs. Prior work has relied on English validation/development data to select among models that are fine-tuned with different learning rates, number of steps and other hyperparameters, often resulting in suboptimal choices. In this paper, we show that it is possible to select consistently better models when small amounts of annotated data are available in an auxiliary pivot language.\nWe propose a machine learning approach to model selection that uses the fine-tuned model's own internal representations to predict its cross-lingual capabilities. In extensive experiments we find that our approach consistently selects better models than English validation data across five languages and five well-studied NLP tasks, achieving results that are comparable to small amounts of target language development data.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Yang Chen;Alan Ritter", "authorids": "~Yang_Chen10;~Alan_Ritter1", "gender": ";M", "homepage": "https://edchengg.github.io/;http://aritter.github.io/", "dblp": "48/4792-13;47/3133", "google_scholar": "o-oBMWEAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";", "linkedin": ";", "or_profile": "~Yang_Chen10;~Alan_Ritter1", "aff": "Georgia Institute of Technology;Georgia Institute of Technology", "aff_domain": "gatech.edu;gatech.edu", "position": "PhD student;Associate Professor", "bibtex": "@misc{\nchen2021model,\ntitle={Model Selection for Cross-Lingual Transfer using a Learned Scoring Function},\nauthor={Yang Chen and Alan Ritter},\nyear={2021},\nurl={https://openreview.net/forum?id=1OP1kReyL56}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=1OP1kReyL56", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;3;4", "wc_review": "930;690;260;358", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "481;269;171;136", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 559.5, 266.7409792289141 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 264.25, 134.30073529210478 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12683465531725996388&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Georgia Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.gatech.edu", "aff_unique_abbr": "Georgia Tech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "1OQ90khuUGZ", "title": "Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games", "track": "main", "status": "Reject", "tldr": "", "abstract": "Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem, since large amounts of exploration are required to retrieve even the first reward. To tackle this problem, a common approach is to use reward shaping to help exploration. However, an important drawback of reward shaping is that agents sometimes learn to optimize the shaped reward instead of the true objective. In this paper, we present a novel technique that we call action guidance that successfully trains agents to eventually optimize the true objective in games with sparse rewards while maintaining most of the sample efficiency that comes with reward shaping. We evaluate our approach in a simplified real-time strategy (RTS) game simulator called $\\mu$RTS. ", "keywords": "reinforcement learning;real-time strategy games;sparse rewards;shaped rewards;policy gradient;sample-efficiency", "primary_area": "", "supplementary_material": "/attachment/825421c56194e27cd273f77b3a0f41b64086392d.zip", "author": "Shengyi Huang;Santiago Ontanon", "authorids": "~Shengyi_Huang1;~Santiago_Ontanon1", "gender": "M;", "homepage": "https://costa.sh/;https://sites.google.com/site/santiagoontanonvillar/", "dblp": "251/8731;https://dblp.org/pers/o/Onta=ntilde==oacute=n:Santiago.html", "google_scholar": "kl9YcpEAAAAJ;aS-DrOwAAAAJ", "orcid": ";", "linkedin": "costa-huang/;", "or_profile": "~Shengyi_Huang1;~Santiago_Ontanon1", "aff": "Drexel University;Drexel University", "aff_domain": "drexel.edu;drexel.edu", "position": "PhD student;Associate Professor", "bibtex": "@misc{\nhuang2021action,\ntitle={Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games},\nauthor={Shengyi Huang and Santiago Ontanon},\nyear={2021},\nurl={https://openreview.net/forum?id=1OQ90khuUGZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=1OQ90khuUGZ", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "4;4;4;3", "wc_review": "385;338;442;332", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "826;594;670;393", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 374.25, 44.17224807500746 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 620.75, 155.83545007475033 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16540646293767954155&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0", "aff_unique_norm": "Drexel University", "aff_unique_dep": "", "aff_unique_url": "https://www.drexel.edu", "aff_unique_abbr": "Drexel", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "1P2KAvsE59b", "title": "Robustness to Pruning Predicts Generalization in Deep Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Why over-parameterized neural networks generalize as well as they do is a central concern of theoretical analysis in machine learning today. Following Occam's razor, it has long been suggested that simpler networks generalize better than more complex ones. Successfully quantifying this principle has proved difficult given that many measures of simplicity, such as parameter norms, grow with the size of the network and thus fail to capture the observation that larger networks tend to generalize better in practice.\nIn this paper, we introduce a new, theoretically motivated measure of a network's simplicity: the smallest fraction of the network's parameters that can be kept while pruning without adversely affecting its training loss. We show that this measure is highly predictive of a model's generalization performance across a large set of convolutional networks trained on CIFAR-10. Lastly, we study the mutual information between the predictions of our new measure and strong existing measures based on models' margin, flatness of minima and optimization speed. We show that our new measure is similar to -- but more predictive than -- existing flatness-based measures.", "keywords": "Generalization;Pruning;Generalization Measures", "primary_area": "", "supplementary_material": "", "author": "Lorenz Kuhn;Clare Lyle;Aidan Gomez;Jonas Rothfuss;Yarin Gal", "authorids": "~Lorenz_Kuhn1;~Clare_Lyle1;~Aidan_Gomez1;~Jonas_Rothfuss1;~Yarin_Gal1", "gender": ";;Unspecified;M;", "homepage": "https://www.lorenzkuhn.com/;;http://gom.ai;https://las.inf.ethz.ch/people/jonas-rothfuss;http://www.cs.ox.ac.uk/people/yarin.gal/website//", "dblp": ";192/1910;;213/7319.html;67/9076", "google_scholar": "3Si7JPAAAAAJ;;https://scholar.google.ca/citations?user=2oq9614AAAAJ;EfLpX8QAAAAJ;https://scholar.google.co.uk/citations?user=SIayDoQAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Lorenz_Kuhn1;~Clare_Lyle1;~Aidan_Gomez1;~Jonas_Rothfuss1;~Yarin_Gal1", "aff": "University of Oxford;University of Oxford;;Swiss Federal Institute of Technology;University of Oxford", "aff_domain": "ox.ac.uk;ox.ac.uk;;ethz.ch;ox.ac.uk", "position": "PhD student;PhD student;;PhD student;Associate Professor", "bibtex": "@misc{\nkuhn2021robustness,\ntitle={Robustness to Pruning Predicts Generalization in Deep Neural Networks},\nauthor={Lorenz Kuhn and Clare Lyle and Aidan Gomez and Jonas Rothfuss and Yarin Gal},\nyear={2021},\nurl={https://openreview.net/forum?id=1P2KAvsE59b}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=1P2KAvsE59b", "pdf_size": 0, "rating": "5;5;5;7", "confidence": "3;4;4;4", "wc_review": "269;741;645;1015", "wc_reply_reviewers": "212;0;250;251", "wc_reply_authors": "783;949;1205;537", "reply_reviewers": "1;0;1;1", "reply_authors": "2;3;3;2", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 667.5, 267.14555957380236 ], "wc_reply_reviewers_avg": [ 178.25, 104.10661602415094 ], "wc_reply_authors_avg": [ 868.5, 243.36957492669455 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.5, 0.5 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15031798279700701056&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "University of Oxford;Swiss Federal Institute of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://www.ethz.ch", "aff_unique_abbr": "Oxford;ETH Zurich", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "United Kingdom;Switzerland" }, { "id": "1Q-CqRjUzf", "title": "On the Reproducibility of Neural Network Predictions", "track": "main", "status": "Reject", "tldr": "", "abstract": "Standard training techniques for neural networks involve multiple sources of randomness, e.g., initialization, mini-batch ordering and in some cases data augmentation. Given that neural networks are heavily over-parameterized in practice, such randomness can cause {\\em churn} -- disagreements between predictions of the two models independently trained by the same algorithm, contributing to the `reproducibility challenges' in modern machine learning. In this paper, we study this problem of churn, identify factors that cause it, and propose two simple means of mitigating it. We first demonstrate that churn is indeed an issue, even for standard image classification tasks (CIFAR and ImageNet), and study the role of the different sources of training randomness that cause churn. By analyzing the relationship between churn and prediction confidences, we pursue an approach with two components for churn reduction. First, we propose using \\emph{minimum entropy regularizers} to increase prediction confidences. Second, we present a novel variant of co-distillation approach~\\citep{anil2018large} to increase model agreement and reduce churn. We present empirical results showing the effectiveness of both techniques in reducing churn while improving the accuracy of the underlying model.", "keywords": "reproducibility;churn;confidence", "primary_area": "", "supplementary_material": "", "author": "Srinadh Bhojanapalli;Kimberly Jenney Wilber;Andreas Veit;Ankit Singh Rawat;Seungyeon Kim;Aditya Krishna Menon;Sanjiv Kumar", "authorids": "~Srinadh_Bhojanapalli1;~Kimberly_Jenney_Wilber1;~Andreas_Veit1;~Ankit_Singh_Rawat1;~Seungyeon_Kim1;~Aditya_Krishna_Menon1;~Sanjiv_Kumar1", "gender": "M;Transfemme;;M;;;M", "homepage": "https://bsrinadh.github.io/;http://kjwilber.org;http://andreasveit.eu/;https://ankitsrawat.github.io/home/;https://www.seungyeon.ai;http://www.sanjivk.com/;https://akmenon.github.io/", "dblp": "131/6700;;133/1801;https://dblp.org/pers/hd/r/Rawat:Ankit_Singh;74/7997-1.html;;89/3514", "google_scholar": "bpSF_9EAAAAJ;;UA9Hb2EAAAAJ;http://scholar.google.com/citations?user=U0_ab4cAAAAJ;zbcN_QIAAAAJ;https://scholar.google.com/citations?hl=en;", "orcid": ";;;;;;", "linkedin": ";;;;;;", "or_profile": "~Srinadh_Bhojanapalli1;~Kimberly_Jenney_Wilber1;~Andreas_Veit1;~Ankit_Singh_Rawat1;~Seungyeon_Kim1;~Sanjiv_Kumar1;~Aditya_Menon1", "aff": "Google;;Google;Google;Google;Google;Australian National University", "aff_domain": "google.com;;google.com;google.com;google.com;google.com;anu.edu.au", "position": "Research Scientist;;Senior Research Scientist;Research Scientist;Researcher;Research Scientist;Fellow", "bibtex": "@misc{\nbhojanapalli2021on,\ntitle={On the Reproducibility of Neural Network Predictions},\nauthor={Srinadh Bhojanapalli and Kimberly Jenney Wilber and Andreas Veit and Ankit Singh Rawat and Seungyeon Kim and Aditya Krishna Menon and Sanjiv Kumar},\nyear={2021},\nurl={https://openreview.net/forum?id=1Q-CqRjUzf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=1Q-CqRjUzf", "pdf_size": 0, "rating": "4;5;5", "confidence": "3;2;4", "wc_review": "532;602;1229", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "487;446;675", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 787.6666666666666, 313.37552907370133 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 536.0, 99.70289196741821 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 45, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14352289318737476197&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0;0;0;1", "aff_unique_norm": "Google;Australian National University", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.anu.edu.au", "aff_unique_abbr": "Google;ANU", "aff_campus_unique_index": "0;0;0;0;0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;0;0;0;0;1", "aff_country_unique": "United States;Australia" }, { "id": "1TIrbngpW0x", "title": "Transformers with Competitive Ensembles of Independent Mechanisms", "track": "main", "status": "Reject", "tldr": "", "abstract": "An important development in deep learning from the earliest MLPs has been a move towards architectures with structural inductive biases which enable the model to keep distinct sources of information and routes of processing well-separated. This structure is linked to the notion of independent mechanisms from the causality literature, in which a mechanism is able to retain the same processing as irrelevant aspects of the world are changed. For example, convnets enable separation over positions, while attention-based architectures (especially Transformers) learn which combination of positions to process dynamically. In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation. This potentially throws unrelated sources of information together, and limits the Transformer's ability to capture independent mechanisms. To address this, we propose Transformers with Independent Mechanisms (TIM), a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention. Additionally, we propose a competition mechanism which encourages these mechanisms to specialize over time steps, and thus be more independent. We study TIM on a large scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance. ", "keywords": "transformer;mechanism;modularity;modules;independence", "primary_area": "", "supplementary_material": "/attachment/e7d59851e0e114f54e3cbef14665cc6907f94fa0.zip", "author": "Alex Lamb;Di He;Anirudh Goyal;Guolin Ke;Chien-Feng Liao;Mirco Ravanelli;Yoshua Bengio", "authorids": "~Alex_Lamb1;~Di_He1;~Anirudh_Goyal1;~Guolin_Ke3;~Chien-Feng_Liao1;~Mirco_Ravanelli1;~Yoshua_Bengio1", "gender": "M;M;M;M;M;M;M", "homepage": "https://dihe-pku.github.io/;https://anirudh9119.github.io/;;http://yoshuabengio.org;;https://guolinke.github.io;https://sites.google.com/site/mircoravanelli/", "dblp": "74/184;172/1039;223/5933;56/953;;190/7810;138/0284", "google_scholar": "https://scholar.google.co.jp/citations?user=orVoz4IAAAAJ;krrh6OUAAAAJ;;kukA0LcAAAAJ;https://scholar.google.ca/citations?user=BFzFy1YAAAAJ;M2qJgtoAAAAJ;-6Pj3IYAAAAJ", "orcid": ";;;;;;", "linkedin": ";;;yoshuabengio/?originalSubdomain=ca;;;mirco-ravanelli-489b692a/", "or_profile": "~Di_He1;~Anirudh_Goyal1;~Chien-Feng_Liao1;~Yoshua_Bengio1;~Alex_Matthew_Lamb1;~guolin_ke1;~Mirco_Ravanellu1", "aff": "Microsoft;University of Montreal;;University of Montreal;University of Montreal;Microsoft;Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal", "aff_domain": "microsoft.com;umontreal.ca;;umontreal.ca;umontreal.ca;microsoft.com;mila.umontreal.ca", "position": "Senior Researcher;PhD student;;Full Professor;PhD student;Senior Researcher;Postdoc", "bibtex": "@misc{\nlamb2021transformers,\ntitle={Transformers with Competitive Ensembles of Independent Mechanisms},\nauthor={Alex Lamb and Di He and Anirudh Goyal and Guolin Ke and Chien-Feng Liao and Mirco Ravanelli and Yoshua Bengio},\nyear={2021},\nurl={https://openreview.net/forum?id=1TIrbngpW0x}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=1TIrbngpW0x", "pdf_size": 0, "rating": "4;4;5;7", "confidence": "4;4;5;4", "wc_review": "223;325;346;157", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 262.75, 76.7605855892202 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11357506413306313437&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;1;1;0;1", "aff_unique_norm": "Microsoft;University of Montreal", "aff_unique_dep": "Microsoft Corporation;", "aff_unique_url": "https://www.microsoft.com;https://wwwumontreal.ca", "aff_unique_abbr": "Microsoft;UM", "aff_campus_unique_index": "1", "aff_campus_unique": ";Montreal", "aff_country_unique_index": "0;1;1;1;0;1", "aff_country_unique": "United States;Canada" }, { "id": "1UtnrqVUeNE", "title": "Detecting Misclassification Errors in Neural Networks with a Gaussian Process Model", "track": "main", "status": "Reject", "tldr": "", "abstract": "As neural network classifiers are deployed in real-world applications, it is crucial that their predictions are not just accurate, but trustworthy as well. One practical solution is to assign confidence scores to each prediction, then filter out low-confidence predictions. However, existing confidence metrics are not yet sufficiently reliable for this role. This paper presents a new framework that produces more reliable confidence scores for detecting misclassification errors. This framework, RED, calibrates the classifier's inherent confidence indicators and estimates uncertainty of the calibrated confidence scores using Gaussian Processes. Empirical comparisons with other confidence estimation methods on 125 UCI datasets demonstrate that this approach is effective. An experiment on a vision task with a large deep learning architecture further confirms that the method can scale up, and a case study involving out-of-distribution and adversarial samples shows potential of the proposed method to improve robustness of neural network classifiers more broadly in the future.", "keywords": "Neural Network Classifier;Error Detection;AI safety", "primary_area": "", "supplementary_material": "", "author": "Xin Qiu;Risto Miikkulainen", "authorids": "~Xin_Qiu1;~Risto_Miikkulainen1", "gender": "M;", "homepage": "https://vsonicv.github.io/;http://www.cs.utexas.edu/users/risto", "dblp": "83/7479-1;m/RistoMiikkulainen", "google_scholar": "https://scholar.google.com/citations?hl=en;", "orcid": ";", "linkedin": "xin-qiu-4a5ba0116/;", "or_profile": "~Xin_Qiu1;~Risto_Miikkulainen1", "aff": "Cognizant;The University of Texas, Austin", "aff_domain": "cognizant.com;cs.utexas.edu", "position": "Associate Director;Full Professor", "bibtex": "@misc{\nqiu2021detecting,\ntitle={Detecting Misclassification Errors in Neural Networks with a Gaussian Process Model},\nauthor={Xin Qiu and Risto Miikkulainen},\nyear={2021},\nurl={https://openreview.net/forum?id=1UtnrqVUeNE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=1UtnrqVUeNE", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "3;2;3;3", "wc_review": "618;221;245;216", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1160;731;918;720", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;2;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 325.0, 169.5184355756034 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 882.25, 178.62303182960477 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16863815237308541611&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1", "aff_unique_norm": "Cognizant;University of Texas at Austin", "aff_unique_dep": ";", "aff_unique_url": "https://www.cognizant.com;https://www.utexas.edu", "aff_unique_abbr": "Cognizant;UT Austin", "aff_campus_unique_index": "1", "aff_campus_unique": ";Austin", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "1WF-fPvY_jQ", "title": "Cross-lingual Transfer Learning for Pre-trained Contextualized Language Models", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Though the pre-trained contextualized language model (PrLM) has made a significant impact on NLP, training PrLMs in languages other than English can be impractical for two reasons: other languages often lack corpora sufficient for training powerful PrLMs, and because of the commonalities among human languages, computationally expensive PrLM training for different languages is somewhat redundant.\nIn this work, building upon the recent works connecting cross-lingual transfer learning and neural machine translation, we thus propose a novel cross-lingual transfer learning framework for PrLMs: \\textsc{TreLM}. \nTo handle the symbol order and sequence length differences between languages, we propose an intermediate ``TRILayer\" structure that learns from these differences and creates a better transfer in our primary translation direction, as well as a new cross-lingual language modeling objective for transfer training. \nAdditionally, we showcase an embedding aligning that adversarially adapts a PrLM's non-contextualized embedding space and the TRILayer structure to learn a text transformation network across languages, which addresses the vocabulary difference between languages. \nExperiments on both language understanding and structure parsing tasks show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency. \nMoreover, despite an insignificant performance loss compared to pre-training from scratch in resource-rich scenarios, our transfer learning framework is significantly more economical.", "keywords": "transfer learning;pre-trained language models;contextualized language models", "primary_area": "", "supplementary_material": "", "author": "Zuchao Li;Kevin Barry Parnow;hai zhao;Zhuosheng Zhang;Rui Wang;Masao Utiyama;Eiichiro Sumita", "authorids": "~Zuchao_Li1;~Kevin_Barry_Parnow1;~hai_zhao1;~Zhuosheng_Zhang1;~Rui_Wang10;~Masao_Utiyama2;~Eiichiro_Sumita1", "gender": "M;M;M;M;M;;M", "homepage": "https://zcli-charlie.github.io/;;http://bcmi.sjtu.edu.cn/~zhaohai/;https://bcmi.sjtu.edu.cn/~zhangzs/;http://www2.nict.go.jp/astrec-att/member/mutiyama/;;https://wangruinlp.github.io/", "dblp": "198/9339;;25/1145-1.html;06/9708;76/5745.html;95/5465;w/RuiWang15", "google_scholar": "PyzBf5oAAAAJ;;https://scholar.google.com.tw/citations?user=4dU5KS0AAAAJ;https://scholar.google.co.jp/citations?user=63LTQhgAAAAJ;artIO6gAAAAJ;;oTU0v5IAAAAJ", "orcid": ";;;0000-0002-4183-3645;;;0000-0001-8007-2503", "linkedin": ";;;;;;", "or_profile": "~Zuchao_Li1;~Kevin_Barry_Parnow1;~hai_zhao1;~Zhuosheng_Zhang1;~Masao_Utiyama2;~Eiichiro_Sumita1;~Rui_Wang7", "aff": "Shanghai Jiaotong University;Shanghai Jiaotong University;Shanghai Jiaotong University;Shanghai Jiaotong University;National Institute of Information and Communications Technology (NICT), National Institute of Advanced Industrial Science and Technology;;Shanghai Jiaotong University", "aff_domain": "sjtu.edu.cn;sjtu.edu.cn;sjtu.edu.cn;sjtu.edu.cn;nict.go.jp;;sjtu.edu.cn", "position": "PhD student;MS student;Full Professor;PhD student;Researcher;;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=1WF-fPvY_jQ", "pdf_size": 0, "rating": "4;4;4;4", "confidence": "4;4;4;3", "wc_review": "629;321;693;372", "wc_reply_reviewers": "0;0;0;33", "wc_reply_authors": "1345;751;1916;942", "reply_reviewers": "0;0;0;1", "reply_authors": "2;2;4;3", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 503.75, 159.88961035664576 ], "wc_reply_reviewers_avg": [ 8.25, 14.289419162443238 ], "wc_reply_authors_avg": [ 1238.5, 446.07090243592444 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.75, 0.82915619758885 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17626598956404723984&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;1;0", "aff_unique_norm": "Shanghai Jiao Tong University;National Institute of Information and Communications Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.sjtu.edu.cn;https://www.nict.go.jp/", "aff_unique_abbr": "SJTU;NICT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;1;0", "aff_country_unique": "China;Japan" }, { "title": "Learning from Protein Structure with Geometric Vector Perceptrons", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3102", "id": "1YLJDvSx6J4", "poster": "", "openreview": "https://openreview.net/forum?id=1YLJDvSx6J4", "slides": "https://iclr.cc/virtual/2021/poster/3102", "video": "https://iclr.cc/virtual/2021/poster/3102", "author_site": "Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael J Townshend, Ron Dror", "tldr": "", "abstract": "Learning on 3D structures of large biomolecules is emerging as a distinct area in machine learning, but there has yet to emerge a unifying network architecture that simultaneously leverages the geometric and relational aspects of the problem domain. To address this gap, we introduce geometric vector perceptrons, which extend standard dense layers to operate on collections of Euclidean vectors. Graph neural networks equipped with such layers are able to perform both geometric and relational reasoning on efficient representations of macromolecules. We demonstrate our approach on two important problems in learning from protein structure: model quality assessment and computational protein design. Our approach improves over existing classes of architectures on both problems, including state-of-the-art convolutional neural networks and graph neural networks. We release our code at https://github.com/drorlab/gvp.", "keywords": "structural biology;graph neural networks;proteins;geometric deep learning", "primary_area": "", "supplementary_material": "", "author": "Bowen Jing;Stephan Eismann;Patricia Suriana;Raphael John Lamarre Townshend;Ron Dror", "authorids": "~Bowen_Jing1;~Stephan_Eismann1;psuriana@stanford.edu;~Raphael_John_Lamarre_Townshend1;~Ron_Dror1", "gender": ";;;M;", "homepage": ";;;;", "dblp": ";;;223/6101;", "google_scholar": ";;;2nR71vcAAAAJ;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Bowen_Jing1;~Stephan_Eismann1;psuriana@stanford.edu;~Raphael_John_Lamarre_Townshend1;~Ron_Dror1", "aff": ";;;Atomic AI;", "aff_domain": ";;;atomic.ai;", "position": ";;;CEO;", "bibtex": "@inproceedings{\njing2021learning,\ntitle={Learning from Protein Structure with Geometric Vector Perceptrons},\nauthor={Bowen Jing and Stephan Eismann and Patricia Suriana and Raphael John Lamarre Townshend and Ron Dror},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=1YLJDvSx6J4}\n}", "github": "[![github](/images/github_icon.svg) drorlab/gvp-pytorch](https://github.com/drorlab/gvp-pytorch) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=1YLJDvSx6J4)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;10", "confidence": "4;4;3;4", "wc_review": "343;231;308;199", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "421;262;343;128", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.25, 1.6393596310755 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 270.25, 57.73809401080018 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 288.5, 108.38473139700075 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.08804509063256237, "gs_citation": 569, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=151372908751868472&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=1YLJDvSx6J4", "email": ";;;atomic.ai;", "author_num": 5, "aff_unique_index": "0", "aff_unique_norm": "Atomic AI", "aff_unique_dep": "", "aff_unique_url": "", "aff_unique_abbr": "", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "1cEEqSp9kXV", "title": "Constructing Multiple High-Quality Deep Neural Networks: A TRUST-TECH Based Approach", "track": "main", "status": "Reject", "tldr": "", "abstract": "The success of deep neural networks relied heavily on efficient stochastic gradient descent-like training methods. However, these methods are sensitive to initialization and hyper-parameters. \nIn this paper, a systematical method for finding multiple high-quality local optimal deep neural networks from a single training session, using the TRUST-TECH (TRansformation Under Stability-reTaining Equilibria Characterization) method, is introduced. \nTo realize effective TRUST-TECH searches to train deep neural networks on large datasets, a dynamic search paths (DSP) method is proposed to provide an improved search guidance in TRUST-TECH method. \nThe proposed DSP-TT method is implemented such that the computation graph remains constant during the search process, with only minor GPU memory overhead and requires just one training session to obtain multiple local optimal solutions (LOS). To take advantage of these LOSs, we also propose an improved ensemble method. Experiments on image classification datasets show that our method improves the testing performance by a substantial margin. Specifically, our fully-trained DSP-TT ResNet ensmeble improves the SGD baseline by 20\\% (CIFAR10) and 15\\%(CIFAR100). Furthermore, our method shows several advantages over other ensembling methods. ", "keywords": "Nonlinear Dynamical Systems;Global Optimization;Deep Neural Networks;Ensemble.", "primary_area": "", "supplementary_material": "", "author": "Zhiyong Hao;Hsiao-Dong Chiang;Bin Wang", "authorids": "~Zhiyong_Hao1;~Hsiao-Dong_Chiang1;bw297@cornell.edu", "gender": "M;M;", "homepage": ";https://www.engineering.cornell.edu/faculty-directory/hsiao-dong-chiang;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": "zhiyong-hao/;;", "or_profile": "~Zhiyong_Hao1;~Hsiao-Dong_Chiang1;bw297@cornell.edu", "aff": "Cornell University;Cornell University;", "aff_domain": "cornell.edu;cornell.edu;", "position": "PhD student;Full Professor;", "bibtex": "@misc{\nhao2021constructing,\ntitle={Constructing Multiple High-Quality Deep Neural Networks: A {\\{}TRUST{\\}}-{\\{}TECH{\\}} Based Approach},\nauthor={Zhiyong Hao and Hsiao-Dong Chiang and Bin Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=1cEEqSp9kXV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=1cEEqSp9kXV", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;3;3;4", "wc_review": "451;230;760;94", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "662;361;564;91", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 383.75, 251.83365045203948 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 419.5, 218.52974625894754 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:bK5n8QVI7E4J:scholar.google.com/&scioq=Constructing+Multiple+High-Quality+Deep+Neural+Networks:+A+TRUST-TECH+Based+Approach&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Cornell University", "aff_unique_dep": "", "aff_unique_url": "https://www.cornell.edu", "aff_unique_abbr": "Cornell", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "1dm_j4ciZp", "title": "How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many optimizers have been proposed for training deep neural networks, and they often have multiple hyperparameters, which make it tricky to benchmark their performance. In this work, we propose a new benchmarking protocol to evaluate both end-to-end efficiency (training a model from scratch without knowing the best hyperparameter) and data-addition training efficiency (the previously selected hyperparameters are used for periodically re-training the model with newly collected data). For end-to-end efficiency, unlike previous work that assumes random hyperparameter tuning, which over-emphasizes the tuning time, we propose to evaluate with a bandit hyperparameter tuning strategy. A human study is conducted to show our evaluation protocol matches human tuning behavior better than the random search. For data-addition training, we propose a new protocol for assessing the hyperparameter sensitivity to data shift. We then apply the proposed benchmarking framework to 7 optimizers and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining. Our results show that there is no clear winner across all the tasks. \n", "keywords": "deep learning;optimization;benchmarking", "primary_area": "", "supplementary_material": "", "author": "Yuanhao Xiong;Xuanqing Liu;Li-Cheng Lan;Yang You;Si Si;Cho-Jui Hsieh", "authorids": "~Yuanhao_Xiong1;~Xuanqing_Liu1;~Li-Cheng_Lan1;~Yang_You1;~Si_Si1;~Cho-Jui_Hsieh1", "gender": "M;M;M;M;F;M", "homepage": "https://xyh97.github.io/;;https://lan-lc.github.io/;https://www.comp.nus.edu.sg/~youy/;;http://web.cs.ucla.edu/~chohsieh/index.html", "dblp": "232/1248;205/2594;200/8672;33/8167-1.html;03/7627;14/2770", "google_scholar": "DVKxiMkAAAAJ;;https://scholar.google.com.tw/citations?view_op=list_works;jF4dPZwAAAAJ;;Wy89g4IAAAAJ", "orcid": ";;;;;", "linkedin": ";;;yang-you-0b92914b/;;", "or_profile": "~Yuanhao_Xiong1;~Xuanqing_Liu1;~Li-Cheng_Lan1;~Yang_You1;~Si_Si1;~Cho-Jui_Hsieh1", "aff": "University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles;National University of Singapore;Google;University of California, Los Angeles", "aff_domain": "cs.ucla.edu;ucla.edu;ucla.edu;nus.edu.sg;google.com;ucla.edu", "position": "PhD student;PhD student;PhD student;Professor;research scientist;Assistant Professor", "bibtex": "@misc{\nxiong2021how,\ntitle={How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers},\nauthor={Yuanhao Xiong and Xuanqing Liu and Li-Cheng Lan and Yang You and Si Si and Cho-Jui Hsieh},\nyear={2021},\nurl={https://openreview.net/forum?id=1dm_j4ciZp}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=1dm_j4ciZp", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;4;4;4", "wc_review": "491;743;263;927", "wc_reply_reviewers": "0;383;0;0", "wc_reply_authors": "544;1375;436;1124", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;2", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 606.0, 251.33841727837788 ], "wc_reply_reviewers_avg": [ 95.75, 165.84386482472001 ], "wc_reply_authors_avg": [ 869.75, 391.8458721232112 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5528494941477390947&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0;1;2;0", "aff_unique_norm": "University of California, Los Angeles;National University of Singapore;Google", "aff_unique_dep": ";;Google", "aff_unique_url": "https://www.ucla.edu;https://www.nus.edu.sg;https://www.google.com", "aff_unique_abbr": "UCLA;NUS;Google", "aff_campus_unique_index": "0;0;0;2;0", "aff_campus_unique": "Los Angeles;;Mountain View", "aff_country_unique_index": "0;0;0;1;0;0", "aff_country_unique": "United States;Singapore" }, { "id": "1eKz1kjHO1", "title": "Contextual Image Parsing via Panoptic Segment Sorting", "track": "main", "status": "Reject", "tldr": "", "abstract": "Visual context is versatile and hard to describe or label precisely. We aim to leverage the densely labeled task, image parsing, a.k.a panoptic segmentation, to learn a model that encodes and discovers object-centric context. Most existing approaches based on deep learning tackle image parsing via fusion of pixel-wise classification and instance masks from two sub-networks. Such approaches isolate things from stuff and fuse the semantic and instance masks in the later stage. To encode object-centric context inherently, we propose a metric learning framework, Panoptic Segment Sorting, that is directly trained with stuff and things jointly. Our key insight is to make the panoptic embeddings separate every instance so that the model automatically learns to leverage visual context as many instances across different images appear similar. We show that the context of our model's retrieved instances is more consistent relatively by 13.7%, further demonstrating its ability to discover novel context unsupervisedly. Our overall framework also achieves competitive performance across standard panoptic segmentation metrics amongst the state-of-the-art methods on two large datasets, Cityscapes and PASCAL VOC. These promising results suggest that pixel-wise embeddings can not only inject new understanding into panoptic segmentation but potentially serve for other tasks such as modeling instance relationships.", "keywords": "metric learning;context encoding;context discovery;image parsing;panoptic segmentation", "primary_area": "", "supplementary_material": "", "author": "Jyh-Jing Hwang;Tsung-Wei Ke;Stella Yu", "authorids": "~Jyh-Jing_Hwang1;~Tsung-Wei_Ke2;~Stella_Yu2", "gender": "M;;F", "homepage": "http://jyhjinghwang.github.io/;https://twke18.github.io/;http://www.eecs.umich.edu/~stellayu", "dblp": "156/0239;173/4984;58/5089", "google_scholar": "ClTTUWkAAAAJ;WTEFsHMAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;", "linkedin": ";;", "or_profile": "~Jyh-Jing_Hwang1;~Tsung-Wei_Ke2;~Stella_Yu2", "aff": "Waymo;University of California, Berkeley;University of California, Berkeley", "aff_domain": "waymo.com;berkeley.edu;berkeley.edu", "position": "Researcher;PhD student;Director, ICSI Vision Group", "bibtex": "@misc{\nhwang2021contextual,\ntitle={Contextual Image Parsing via Panoptic Segment Sorting},\nauthor={Jyh-Jing Hwang and Tsung-Wei Ke and Stella Yu},\nyear={2021},\nurl={https://openreview.net/forum?id=1eKz1kjHO1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=1eKz1kjHO1", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;4;3;3", "wc_review": "577;161;253;241", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "791;423;224;583", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 308.0, 159.28276742949942 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 505.25, 208.305514809378 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7676169371742120420&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;1", "aff_unique_norm": "Waymo;University of California, Berkeley", "aff_unique_dep": ";", "aff_unique_url": "https://www.waymo.com;https://www.berkeley.edu", "aff_unique_abbr": "Waymo;UC Berkeley", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Berkeley", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "1fLunL_hDj_", "title": "Information-theoretic Vocabularization via Optimal Transport for Machine Translation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "It is well accepted that the choice of token vocabulary largely affects the performance of machine translation.\nOne dominant approach to construct a good vocabulary is the Byte Pair Encoding method (BPE). \nHowever, due to expensive trial costs, most previous studies only conduct simple trials with commonly used vocabulary sizes. \nThis paper finds an exciting relation between an information-theoretic feature and BLEU scores with a given vocabulary. \nWith this observation, we formulate the quest of vocabularization -- finding the best token dictionary with a proper size -- as an optimal transport problem. We then propose Info-VOT, a simple and efficient solution without the full and costly trial training. \nWe evaluate our approach on multiple machine translation tasks, including WMT-14 English-German translation, TED bilingual translation, and TED multilingual translation. Empirical results show that Info-VOT can generate well-performing vocabularies on diverse scenarios. Also, one advantage of the proposed approach lies in its low consumption of computation resources. On TED bilingual translation, Info-VOT only spends a few CPU hours generating vocabularies, while the traditional BPE-Search solution takes hundreds of GPU hours. ", "keywords": "Vocabulary Construction;NLP", "primary_area": "", "supplementary_material": "/attachment/a749424db5779d33f0a383aeb746377539aba906.zip", "author": "Jingjing Xu;Hao Zhou;Chun Gan;Zaixiang Zheng;Lei Li", "authorids": "~Jingjing_Xu1;zhouhao.nlp@bytedance.com;ganchun@bytedance.com;~Zaixiang_Zheng2;~Lei_Li11", "gender": "F;;;M;M", "homepage": ";;;https://zhengzx-nlp.github.io/;https://www.cs.cmu.edu/~leili", "dblp": "25/624;;;205/2769;13/7007-5.html", "google_scholar": ";;;JPSrehMAAAAJ;BYXqAlwAAAAJ", "orcid": ";;;;0000-0003-3095-9776", "linkedin": ";;;;", "or_profile": "~Jingjing_Xu1;zhouhao.nlp@bytedance.com;ganchun@bytedance.com;~Zaixiang_Zheng2;~Lei_Li11", "aff": ";;;Dept. of Computer Science, Nanjing University;ByteDance AI Lab", "aff_domain": ";;;nju.edu.cn;bytedance.com", "position": ";;;PhD student;Director", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=1fLunL_hDj_", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "4;5;2;5", "wc_review": "1512;820;198;1346", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "823;699;217;728", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 4.0, 1.224744871391589 ], "wc_review_avg": [ 969.0, 513.2299679480924 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 616.75, 235.30870680873667 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.40824829046386296, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:_NVEGi09-0IJ:scholar.google.com/&scioq=Information-theoretic+Vocabularization+via+Optimal+Transport+for+Machine+Translation&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Nanjing University;ByteDance", "aff_unique_dep": "Dept. of Computer Science;AI Lab", "aff_unique_url": "http://www.nju.edu.cn;https://www.bytedance.com", "aff_unique_abbr": "Nanjing U;ByteDance", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "1flmvXGGJaa", "title": "NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search", "track": "main", "status": "Reject", "tldr": "", "abstract": "The most significant barrier to the advancement of Neural Architecture Search (NAS) is its demand for large computational resources, which hinders scientifically sound empirical evaluations. As a remedy, several tabular NAS benchmarks were proposed to simulate runs of NAS methods in seconds. However, all existing tabular NAS benchmarks are limited to extremely small architectural spaces since they rely on exhaustive evaluations of the space. This leads to unrealistic results that do not transfer to larger search spaces. To overcome this fundamental limitation, we propose NAS-Bench-301, the first surrogate NAS benchmark, using a search space containing $10^{18}$ architectures, many orders of magnitude larger than any previous tabular NAS benchmark. After motivating the benefits of a surrogate benchmark over a tabular one, we fit various regression models on our dataset, which consists of $\\sim$60k architecture evaluations, and build surrogates via deep ensembles to model uncertainty. We benchmark a wide range of NAS algorithms using NAS-Bench-301 and obtain comparable results to the true benchmark at a fraction of the real cost. Finally, we show how NAS-Bench-301 can be used to generate new scientific insights.", "keywords": "Neural Architecture Search;Benchmarking;Performance Prediction;Deep Learning", "primary_area": "", "supplementary_material": "/attachment/883253d8cd317fc55948adbd9303c9acb1450bdc.zip", "author": "Julien Niklas Siems;Lucas Zimmer;Arber Zela;Jovita Lukasik;Margret Keuper;Frank Hutter", "authorids": "~Julien_Niklas_Siems1;~Lucas_Zimmer1;~Arber_Zela1;~Jovita_Lukasik1;~Margret_Keuper1;~Frank_Hutter1", "gender": "M;M;M;F;F;M", "homepage": "https://juliensiems.github.io;;https://ml.informatik.uni-freiburg.de/people/zela/index.html;https://www.uni-mannheim.de/dws/people/researchers/phd-students/jovita-lukasik/;https://www.vc.informatik.uni-siegen.de/en/keuper-margret;http://ml.informatik.uni-freiburg.de/~hutter/", "dblp": "257/3075;;;255/4833;95/7589;89/5383", "google_scholar": "https://scholar.google.de/citations?user=rKgTTh8AAAAJ;;hD_6YioAAAAJ;https://scholar.google.de/citations?user=TpsZenwAAAAJ;https://scholar.google.de/citations?user=KMqMQAcAAAAJ;https://scholar.google.de/citations?user=YUrxwrkAAAAJ", "orcid": ";0000-0002-5167-2929;;;0000-0002-8437-7993;0000-0002-2037-3694", "linkedin": "julien-niklas-siems/;lucas-z-5369ba170/;https://de.linkedin.com/in/arber-zela-ba85a2145;;;frank-hutter-9190b24b/", "or_profile": "~Julien_Niklas_Siems1;~Lucas_Zimmer1;~Arber_Zela1;~Jovita_Lukasik1;~Margret_Keuper1;~Frank_Hutter1", "aff": "Department of Informatics, University of Zurich, University of Zurich;;University of Freiburg;University of Mannheim;Universit\u00e4t Mannheim;Albert-Ludwigs-Universit\u00e4t Freiburg", "aff_domain": "ifi.uzh.ch;;uni-freiburg.de;uni-mannheim.de;uni-mannheim.de;uni-freiburg.de", "position": "Researcher;;PhD student;PhD student;Assistant Professor;Full Professor", "bibtex": "@misc{\nsiems2021nasbench,\ntitle={{\\{}NAS{\\}}-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search},\nauthor={Julien Niklas Siems and Lucas Zimmer and Arber Zela and Jovita Lukasik and Margret Keuper and Frank Hutter},\nyear={2021},\nurl={https://openreview.net/forum?id=1flmvXGGJaa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=1flmvXGGJaa", "pdf_size": 0, "rating": "3;5;7;8", "confidence": "5;4;4;5", "wc_review": "717;485;308;710", "wc_reply_reviewers": "0;0;0;168", "wc_reply_authors": "1853;1316;361;1479", "reply_reviewers": "0;0;0;2", "reply_authors": "4;4;1;4", "rating_avg": [ 5.75, 1.920286436967152 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 555.0, 170.42447007398908 ], "wc_reply_reviewers_avg": [ 42.0, 72.74613391789285 ], "wc_reply_authors_avg": [ 1252.25, 550.1606015519468 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 3.25, 1.299038105676658 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.13018891098082386, "gs_citation": 204, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15089154778780039166&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;2;2;3", "aff_unique_norm": "University of Zurich;University of Freiburg;University of Mannheim;Albert-Ludwigs-Universit\u00e4t Freiburg", "aff_unique_dep": "Department of Informatics;;;", "aff_unique_url": "https://www.uzh.ch;https://www.uni-freiburg.de;https://www.uni-mannheim.de;https://www.uni-freiburg.de", "aff_unique_abbr": "UZH;UoF;UM;Albert-Ludwigs-Universit\u00e4t", "aff_campus_unique_index": "1", "aff_campus_unique": ";Freiburg", "aff_country_unique_index": "0;1;1;1;1", "aff_country_unique": "Switzerland;Germany" }, { "id": "1hkYtDXAgOZ", "title": "Feature Integration and Group Transformers for Action Proposal Generation", "track": "main", "status": "Reject", "tldr": "", "abstract": "The task of temporal action proposal generation (TAPG) aims to provide high-quality video segments, i.e., proposals that potentially contain action events. The performance of tackling the TAPG task heavily depends on two key issues, feature representation and scoring mechanism. To simultaneously take account of both aspects, we introduce an attention-based model, termed as FITS, to address the issues for retrieving high-quality proposals. We first propose a novel Feature-Integration (FI) module to seamlessly fuse two-stream features concerning their interaction to yield a robust video segment representation. We then design a group of Transformer-driven Scorers (TS) to gain the temporal contextual supports over the representations for estimating the starting or ending boundary of an action event. Unlike most previous work to estimate action boundaries without considering the long-range temporal neighborhood, the proposed action-boundary co-estimation mechanism in TS leverages the bi-directional contextual supports for such boundary estimation, which shows the advantage of removing several false-positive boundary predictions. We conduct experiments on two challenging datasets, ActivityNet-1.3 and THUMOS-14. The experimental results demonstrate that the proposed FITS model consistently outperforms state-of-the-art TAPG methods.", "keywords": "temporal action proposal;transformer;video analysis", "primary_area": "", "supplementary_material": "", "author": "He-Yen Hsieh;Ding-Jie Chen;Tung-Ying Lee;Tyng-Luh Liu", "authorids": "~He-Yen_Hsieh1;~Ding-Jie_Chen1;rilylee@berry-ai.com;~Tyng-Luh_Liu1", "gender": ";M;;", "homepage": ";;;https://www.iis.sinica.edu.tw/pages/liutyng/index_en.html", "dblp": "209/1822.html;123/2959;;68/2368", "google_scholar": ";6nxRMzEAAAAJ;;https://scholar.google.com.tw/citations?user=20N2rlkAAAAJ", "orcid": ";;;0000-0002-8366-5213", "linkedin": ";;;", "or_profile": "~He-Yen_Hsieh1;~Ding-Jie_Chen1;rilylee@berry-ai.com;~Tyng-Luh_Liu1", "aff": "IIS, Academia Sinica;Academia Sinica;;Academia Sinica", "aff_domain": "iis.sinica.edu.tw;sinica.edu.tw;;sinica.edu.tw", "position": "Research Assistant;Postdoc;;Principal Researcher", "bibtex": "@misc{\nhsieh2021feature,\ntitle={Feature Integration and Group Transformers for Action Proposal Generation},\nauthor={He-Yen Hsieh and Ding-Jie Chen and Tung-Ying Lee and Tyng-Luh Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=1hkYtDXAgOZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=1hkYtDXAgOZ", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "3;3;5;4", "wc_review": "665;411;571;197", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "553;653;590;392", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 461.0, 177.42040468897596 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 547.0, 96.36648795094693 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:_38-rbJJThEJ:scholar.google.com/&scioq=Feature+Integration+and+Group+Transformers+for+Action+Proposal+Generation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Academia Sinica", "aff_unique_dep": "Institute of Information Science", "aff_unique_url": "https://www.iis.sinica.edu.tw", "aff_unique_abbr": "IIS", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Taiwan", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "1ibNKMp8SKc", "title": "On Disentangled Representations Learned From Correlated Data", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite impressive progress in the last decade, it still remains an open challenge to build models that generalize well across multiple tasks and datasets. One path to achieve this is to learn meaningful and compact representations, in which different semantic aspects of data are structurally disentangled. The focus of disentanglement approaches has been on separating independent factors of variation despite the fact that real-world observations are often not structured into meaningful independent causal variables. In this work, we bridge the gap to real-world scenarios by analyzing the behavior of most prominent methods and disentanglement scores on correlated data in a large scale empirical study (including 4260 models). We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations, while widely used disentanglement scores fall short of capturing these latent correlations. Finally, we demonstrate how to disentangle these latent correlations using weak supervision, even if we constrain this supervision to be causally plausible. Our results thus support the argument to learn independent mechanisms rather than independent factors of variations.", "keywords": "representation learning;disentanglement", "primary_area": "", "supplementary_material": "", "author": "Frederik Tr\u00e4uble;Elliot Creager;Niki Kilbertus;Anirudh Goyal;Francesco Locatello;Bernhard Sch\u00f6lkopf;Stefan Bauer", "authorids": "~Frederik_Tr\u00e4uble1;~Elliot_Creager1;~Niki_Kilbertus1;~Anirudh_Goyal1;~Francesco_Locatello1;~Bernhard_Sch\u00f6lkopf1;~Stefan_Bauer1", "gender": "M;M;;M;M;;", "homepage": "https://ei.is.tuebingen.mpg.de/person/ftraeuble;https://ecreager.github.io/;;https://anirudh9119.github.io/;https://twitter.com/FrancescoLocat8;;https://cifar.ca/bios/stefan-bauer/", "dblp": ";182/2055;202/1966;172/1039;195/6074;;", "google_scholar": "https://scholar.google.de/citations?user=oc2OOyMAAAAJ;boebIUcAAAAJ;uQZjTq4AAAAJ;krrh6OUAAAAJ;;;O-oICE8AAAAJ", "orcid": ";0009-0004-7122-3866;;;;;", "linkedin": ";;;;;;", "or_profile": "~Frederik_Tr\u00e4uble1;~Elliot_Creager1;~Niki_Kilbertus1;~Anirudh_Goyal1;~Francesco_Locatello1;~Bernhard_Sch\u00f6lkopf1;~Stefan_Bauer1", "aff": "Max Planck Institute for Intelligent Systems;University of Toronto;Helmholtz AI;University of Montreal;Amazon;;Max Planck Institute for Intelligent Systems, Max-Planck Institute", "aff_domain": "is.tuebingen.mpg.de;toronto.edu;helmholtz-muenchen.de;umontreal.ca;amazon.com;;tuebingen.mpg.de", "position": "PhD student;PhD student;Group Leader;PhD student;Senior Applied Scientist;;Research Group Leader", "bibtex": "@misc{\ntr{\\\"a}uble2021on,\ntitle={On Disentangled Representations Learned From Correlated Data},\nauthor={Frederik Tr{\\\"a}uble and Elliot Creager and Niki Kilbertus and Anirudh Goyal and Francesco Locatello and Bernhard Sch{\\\"o}lkopf and Stefan Bauer},\nyear={2021},\nurl={https://openreview.net/forum?id=1ibNKMp8SKc}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=1ibNKMp8SKc", "pdf_size": 0, "rating": "3;6;7", "confidence": "4;5;3", "wc_review": "242;984;205", "wc_reply_reviewers": "0;563;0", "wc_reply_authors": "846;1313;59", "reply_reviewers": "0;2;0", "reply_authors": "2;4;1", "rating_avg": [ 5.333333333333333, 1.699673171197595 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 477.0, 358.8212182503519 ], "wc_reply_reviewers_avg": [ 187.66666666666666, 265.40074520535086 ], "wc_reply_authors_avg": [ 739.3333333333334, 517.4696985224244 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 2.3333333333333335, 1.247219128924647 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.2401922307076307, "gs_citation": 142, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10644866140945749570&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;2;3;4;0", "aff_unique_norm": "Max Planck Institute for Intelligent Systems;University of Toronto;Helmholtz Association of German Research Centres;University of Montreal;Amazon", "aff_unique_dep": "Intelligent Systems;;Helmholtz AI;;Amazon.com, Inc.", "aff_unique_url": "https://www.mpi-is.mpg.de;https://www.utoronto.ca;https://www.helmholtz-ai.de;https://wwwumontreal.ca;https://www.amazon.com", "aff_unique_abbr": "MPI-IS;U of T;Helmholtz AI;UM;Amazon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;1;2;0", "aff_country_unique": "Germany;Canada;United States" }, { "id": "1qJtBS8QF9", "title": "Graph View-Consistent Learning Network", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent years, methods based on neural networks have made great achievements in solving large and complex graph problems. However, high efficiency of these methods depends on large training and validation sets, while the acquisition of ground-truth labels is expensive and time-consuming. In this paper, a graph view-consistent learning network (GVCLN) is specially designed for the semi-supervised learning when the number of the labeled samples is very small. We fully exploit the neighborhood aggregation capability of GVCLN and use dual views to obtain different representations. Although the two views have different viewing angles, their observation objects are the same, so their observation representations need to be consistent. For view-consistent representations between two views, two loss functions are designed besides a supervised loss. The supervised loss uses the known labeled set, while a view-consistent loss is applied to the two views to obtain the consistent representation and a pseudo-label loss is designed by using the common high-confidence predictions. GVCLN with these loss functions can obtain the view-consistent representations of the original feature. We also find that preprocessing the node features with specific filter before training is good for subsequent classification tasks. Related experiments have been done on the three citation network datasets of Cora, Citeseer, and PubMed. On several node classification tasks, GVCLN achieves state-of-the-art performance.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/0690ecf654706f560eac7692fe5387aece799984.zip", "author": "Zhuolin Liao;Kun Zhan", "authorids": "liaozl20@lzu.edu.cn;~Kun_Zhan1", "gender": ";M", "homepage": ";https://kunzhan.github.io/", "dblp": ";46/8462", "google_scholar": ";sk7TcGAAAAAJ", "orcid": ";0000-0002-8000-5682", "linkedin": ";", "or_profile": "liaozl20@lzu.edu.cn;~Kun_Zhan1", "aff": ";Lanzhou University", "aff_domain": ";lzu.edu.cn", "position": ";Associate Professor", "bibtex": "@misc{\nliao2021graph,\ntitle={Graph View-Consistent Learning Network},\nauthor={Zhuolin Liao and Kun Zhan},\nyear={2021},\nurl={https://openreview.net/forum?id=1qJtBS8QF9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer5", "site": "https://openreview.net/forum?id=1qJtBS8QF9", "pdf_size": 0, "rating": "3;3;4;4;5", "confidence": "4;4;5;4;5", "wc_review": "212;213;230;362;159", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "57;100;95;171;179", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 3.8, 0.7483314773547882 ], "confidence_avg": [ 4.4, 0.4898979485566356 ], "wc_review_avg": [ 235.2, 67.73891053154014 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 120.4, 47.06421145626473 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.7637626158259733, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:biV9vAAIQZMJ:scholar.google.com/&scioq=Graph+View-Consistent+Learning+Network&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Lanzhou University", "aff_unique_dep": "", "aff_unique_url": "https://www.lzu.edu.cn", "aff_unique_abbr": "LZU", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "title": "Drop-Bottleneck: Learning Discrete Compressed Representation for Noise-Robust Exploration", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3127", "id": "1rxHOBjeDUW", "poster": "", "openreview": "https://openreview.net/forum?id=1rxHOBjeDUW", "slides": "https://iclr.cc/virtual/2021/poster/3127", "video": "https://iclr.cc/virtual/2021/poster/3127", "author_site": "Jaekyeom Kim, Minjung Kim, Dongyeon Woo, Gunhee Kim", "tldr": "", "abstract": "We propose a novel information bottleneck (IB) method named Drop-Bottleneck, which discretely drops features that are irrelevant to the target variable. Drop-Bottleneck not only enjoys a simple and tractable compression objective but also additionally provides a deterministic compressed representation of the input variable, which is useful for inference tasks that require consistent representation. Moreover, it can jointly learn a feature extractor and select features considering each feature dimension's relevance to the target task, which is unattainable by most neural network-based IB methods. We propose an exploration method based on Drop-Bottleneck for reinforcement learning tasks. In a multitude of noisy and reward sparse maze navigation tasks in VizDoom (Kempka et al., 2016) and DMLab (Beattie et al., 2016), our exploration method achieves state-of-the-art performance. As a new IB framework, we demonstrate that Drop-Bottleneck outperforms Variational Information Bottleneck (VIB) (Alemi et al., 2017) in multiple aspects including adversarial robustness and dimensionality reduction.", "keywords": "Reinforcement learning;Information bottleneck", "primary_area": "", "supplementary_material": "/attachment/d44f61058c2223bf909e4a7cc9dcd6ba0ae38909.zip", "author": "Jaekyeom Kim;Minjung Kim;Dongyeon Woo;Gunhee Kim", "authorids": "~Jaekyeom_Kim1;~Minjung_Kim2;~Dongyeon_Woo1;~Gunhee_Kim1", "gender": "M;F;M;M", "homepage": "https://jaekyeom.github.io/;https://minnjung.github.io/;;http://vision.snu.ac.kr/gunhee/", "dblp": "228/6696;92/4738-1;;45/115", "google_scholar": "8PR-AaoAAAAJ;nSwyFAMAAAAJ;dKGgnbMAAAAJ;https://scholar.google.co.kr/citations?user=CiSdOV0AAAAJ", "orcid": ";0009-0009-8038-3710;;0000-0002-9543-7453", "linkedin": "jaekyeom-kim-14157428;minjung-kim-45148722a/;;", "or_profile": "~Jaekyeom_Kim1;~Minjung_Kim2;~Dongyeon_Woo1;~Gunhee_Kim1", "aff": "Seoul National University;Seoul National University;Seoul National University;Seoul National University", "aff_domain": "snu.ac.kr;snu.ac.kr;snu.ac.kr;snu.ac.kr", "position": "PhD student;PhD student;PhD student;Full Professor", "bibtex": "@inproceedings{\nkim2021dropbottleneck,\ntitle={Drop-Bottleneck: Learning Discrete Compressed Representation for Noise-Robust Exploration},\nauthor={Jaekyeom Kim and Minjung Kim and Dongyeon Woo and Gunhee Kim},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=1rxHOBjeDUW}\n}", "github": "[![github](/images/github_icon.svg) jaekyeom/drop-bottleneck](https://github.com/jaekyeom/drop-bottleneck)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;3;3;4", "wc_review": "453;433;203;339", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "642;326;202;347", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 357.0, 98.78258955909183 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 379.25, 161.50135448348414 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 25, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4970327572686173895&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=1rxHOBjeDUW", "email": "snu.ac.kr;snu.ac.kr;snu.ac.kr;snu.ac.kr", "author_num": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Seoul National University", "aff_unique_dep": "", "aff_unique_url": "https://www.snu.ac.kr", "aff_unique_abbr": "SNU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "1s1T7xHc5l6", "title": "FILTRA: Rethinking Steerable CNN by Filter Transform", "track": "main", "status": "Reject", "tldr": "", "abstract": "Steerable CNN imposes the prior knowledge of transformation invariance or equivariance in the network architecture to enhance the the network robustness on geometry transformation of data and reduce overfitting. Filter transform has been an intuitive and widely used technique to construct steerable CNN in the past decades. Recently, group representation theory is used to analyze steerable CNN and reveals the function space structure of a steerable kernel function. However, it is not yet clear on how this theory is related to the filter transform technique. In this paper, we show that kernel constructed by filter transform can also be interpreted in the group representation theory. Meanwhile, we show that filter transformed kernels can be used to convolve input/output features in different group representation. This interpretation help complete the puzzle of steerable CNN theory and provides a novel and simple approach to implement steerable convolution operators. Experiments are executed on multiple datasets to verify the feasibilty of the proposed approach.\n", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/e179622b1c65c98bfb9b2ddf98d648401162a0ee.zip", "author": "Bo Li;Qili Wang;Gim Hee Lee", "authorids": "~Bo_Li16;~Qili_Wang1;~Gim_Hee_Lee1", "gender": "M;;", "homepage": "https://prclibo.github.io;https://github.com/gobigrassland;https://www.comp.nus.edu.sg/~leegh/", "dblp": "50/3402-18;;49/9455", "google_scholar": "dqLdMyAAAAAJ;https://github.com/gobigrassland;https://scholar.google.com.sg/citations?user=7hNKrPsAAAAJ", "orcid": ";;0000-0002-1583-0475", "linkedin": ";;", "or_profile": "~Bo_Li16;~Qili_Wang1;~Gim_Hee_Lee1", "aff": ";;National University of Singapore", "aff_domain": ";;nus.edu.sg", "position": ";;Assistant Professor", "bibtex": "@misc{\nli2021filtra,\ntitle={{\\{}FILTRA{\\}}: Rethinking Steerable {\\{}CNN{\\}} by Filter Transform},\nauthor={Bo Li and Qili Wang and Gim Hee Lee},\nyear={2021},\nurl={https://openreview.net/forum?id=1s1T7xHc5l6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=1s1T7xHc5l6", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "5;4;3;4", "wc_review": "821;470;272;354", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "706;217;639;465", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 479.25, 209.47478965259756 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 506.75, 189.00314150828288 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12773800134537729615&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0", "aff_unique_norm": "National University of Singapore", "aff_unique_dep": "", "aff_unique_url": "https://www.nus.edu.sg", "aff_unique_abbr": "NUS", "aff_country_unique_index": "0", "aff_country_unique": "Singapore" }, { "id": "1sJWR4y1lG", "title": "Deep Learning Is Composite Kernel Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent works have connected deep learning and kernel methods. In this paper, we show that architectural choices such as convolutional layers with pooling, skip connections, make deep learning a composite kernel learning method, where the kernel is a (architecture dependent) composition of base kernels: even before training, standard deep networks have in-built structural properties that ensure their success. In particular, we build on the recently developed `neural path' framework that characterises the role of gates/masks in fully connected deep networks with ReLU activations. ", "keywords": "deep learning;kernel methods", "primary_area": "", "supplementary_material": "/attachment/64343178c8600b23a4adad19340f3148224b257d.zip", "author": "Chandra Shekar Lakshminarayanan;Amit Vikram Singh", "authorids": "~Chandra_Shekar_Lakshminarayanan2;~Amit_Vikram_Singh1", "gender": "M;M", "homepage": "https://iitpkd.ac.in/people/cnarayanan;", "dblp": "143/7535;", "google_scholar": ";", "orcid": ";", "linkedin": ";amitadvaita/", "or_profile": "~Chandra_Shekar_Lakshminarayanan2;~Amit_Vikram_Singh1", "aff": "Indian Institute of Technology;", "aff_domain": "iitpkd.ac.in;", "position": "Assistant Professor;", "bibtex": "@misc{\nlakshminarayanan2021deep,\ntitle={Deep Learning Is Composite Kernel Learning},\nauthor={Chandra Shekar Lakshminarayanan and Amit Vikram Singh},\nyear={2021},\nurl={https://openreview.net/forum?id=1sJWR4y1lG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=1sJWR4y1lG", "pdf_size": 0, "rating": "4;6;6;8", "confidence": "4;1;2;2", "wc_review": "361;137;1043;131", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "668;105;271;5", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 2.25, 1.0897247358851685 ], "wc_review_avg": [ 418.0, 372.5600622718436 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 262.25, 252.79178685234217 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.6488856845230502, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:qrAidipbS8sJ:scholar.google.com/&scioq=Deep+Learning+Is+Composite+Kernel+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Indian Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.iit.edu", "aff_unique_abbr": "IIT", "aff_country_unique_index": "0", "aff_country_unique": "India" }, { "id": "1toB0Fo9CZy", "title": "Neural Architecture Search of SPD Manifold Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we propose a new neural architecture search (NAS) problem of Symmetric Positive Definite (SPD) manifold networks. Unlike the conventional NAS problem, our problem requires to search for a unique computational cell called the SPD cell. This SPD cell serves as a basic building block of SPD neural architectures. An efficient solution to our problem is important to minimize the extraneous manual effort in the SPD neural architecture design. To accomplish this goal, we first introduce a geometrically rich and diverse SPD neural architecture search space for an efficient SPD cell design. Further, we model our new NAS problem using the supernet strategy, which models the architecture search problem as a one-shot training process of a single supernet. Based on the supernet modeling, we exploit a differentiable NAS algorithm on our relaxed continuous search space for SPD neural architecture search. Statistical evaluation of our method on drone, action, and emotion recognition tasks mostly provides better results than the state-of-the-art SPD networks and NAS algorithms. Empirical results show that our algorithm excels in discovering better SPD network design and providing models that are more than 3 times lighter than searched by state-of-the-art NAS algorithms.", "keywords": "Neural Architecture Search;AutoML", "primary_area": "", "supplementary_material": "", "author": "Rhea Sanjay Sukthanker;Zhiwu Huang;Suryansh Kumar;Erik Goron;Yan Wu;Luc Van Gool", "authorids": "~Rhea_Sanjay_Sukthanker1;~Zhiwu_Huang1;~Suryansh_Kumar1;~Erik_Goron1;~Yan_Wu4;~Luc_Van_Gool1", "gender": "F;M;M;M;F;", "homepage": ";https://zhiwu-huang.github.io;https://suryanshkumar.github.io/;;https://wuyan01.github.io/;", "dblp": ";47/7711.html;124/2783;;;61/5017", "google_scholar": "OsamqmMAAAAJ;https://scholar.google.ch/citations?user=yh6t92AAAAAJ;wbk0QAcAAAAJ;;dv1IuQUAAAAJ;https://scholar.google.be/citations?user=TwMib_QAAAAJ", "orcid": ";;;;;", "linkedin": ";;;Https://www.linkedin.com/in/erikgoron/;%E8%89%B3-%E5%90%B4-660b9b181/;", "or_profile": "~Rhea_Sanjay_Sukthanker1;~Zhiwu_Huang1;~Suryansh_Kumar1;~Erik_Goron1;~Yan_Wu4;~Luc_Van_Gool1", "aff": "Swiss Federal Institute of Technology;Swiss Federal Institute of Technology (ETH Zurich);Swiss Federal Institute of Technology;Swiss Federal Institute of Technology;Swiss Federal Institute of Technology;KU Leuven", "aff_domain": "ethz.ch;ethz.ch;ethz.ch;ethz.ch;ethz.ch;kuleuven.be", "position": "MS student;Postdoc;Researcher;MS student;MS student;Emeritus", "bibtex": "@misc{\nsukthanker2021neural,\ntitle={Neural Architecture Search of {\\{}SPD{\\}} Manifold Networks},\nauthor={Rhea Sanjay Sukthanker and Zhiwu Huang and Suryansh Kumar and Erik Goron and Yan Wu and Luc Van Gool},\nyear={2021},\nurl={https://openreview.net/forum?id=1toB0Fo9CZy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=1toB0Fo9CZy", "pdf_size": 0, "rating": "4;4;6;7", "confidence": "3;5;5;4", "wc_review": "166;522;382;151", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "595;2232;892;847", "reply_reviewers": "0;0;0;0", "reply_authors": "1;4;2;2", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 305.25, 154.9635037678227 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1141.5, 639.6938720982092 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 1.0897247358851685 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 17, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=480888072018022846&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 12, "aff_unique_index": "0;1;0;0;0;2", "aff_unique_norm": "Swiss Federal Institute of Technology;ETH Zurich;Katholieke Universiteit Leuven", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ethz.ch;https://www.ethz.ch;https://www.kuleuven.be", "aff_unique_abbr": "ETH Zurich;ETH;KU Leuven", "aff_campus_unique_index": "1", "aff_campus_unique": ";Zurich", "aff_country_unique_index": "0;0;0;0;0;1", "aff_country_unique": "Switzerland;Belgium" }, { "id": "1wtC_X12XXC", "title": "Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain", "track": "main", "status": "Reject", "tldr": "", "abstract": "The backpropagation of error algorithm (backprop) has been instrumental in the recent success of deep learning. However, a key question remains as to whether backprop can be formulated in a manner suitable for implementation in neural circuitry. The primary challenge is to ensure that any candidate formulation uses only local information, rather than relying on global signals as in standard backprop. Recently several algorithms for approximating backprop using only local signals have been proposed. However, these algorithms typically impose other requirements which challenge biological plausibility: for example, requiring complex and precise connectivity schemes, or multiple sequential backwards phases with information being stored across phases. Here, we propose a novel algorithm, Activation Relaxation (AR), which is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, utilises only a single parallel backwards relaxation phase, and can operate on arbitrary computation graphs. We illustrate these properties by training deep neural networks on visual classification tasks, and describe simplifications to the algorithm which remove further obstacles to neurobiological implementation (for example, the weight-transport problem, and the use of nonlinear derivatives), while preserving performance.", "keywords": "Neural Networks;Biological Plausibility;Backprop", "primary_area": "", "supplementary_material": "/attachment/13df83da62eb47f68929085543ac5d052309626b.zip", "author": "Beren Millidge;Alexander Tschantz;Anil K Seth;Christopher Buckley", "authorids": "~Beren_Millidge1;~Alexander_Tschantz1;~Anil_K_Seth1;~Christopher_Buckley1", "gender": "M;M;M;M", "homepage": "http://beren.io/;;http://www.anilseth.com;https://christopherlbuckley.com/", "dblp": "244/9967;254/2125;;37/3540.html", "google_scholar": "3GGkFTkAAAAJ;5NbVgO0AAAAJ;https://scholar.google.co.uk/citations?user=3eJCZCkAAAAJ;https://scholar.google.co.uk/citations?user=nWuZ0XcAAAAJ", "orcid": ";;0000-0002-1421-6051;0000-0002-8551-9121", "linkedin": "beren-millidge-377065142/;;profanilseth/?originalSubdomain=uk;", "or_profile": "~Beren_Millidge1;~Alexander_Tschantz1;~Anil_K_Seth1;~Christopher_Buckley1", "aff": "University of Oxford;University of Sussex;University of Sussex;", "aff_domain": "ox.ac.uk;sussex.ac.uk;sussex.ac.uk;", "position": "Postdoc;PhD student;Full Professor;", "bibtex": "@misc{\nmillidge2021activation,\ntitle={Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain},\nauthor={Beren Millidge and Alexander Tschantz and Anil K Seth and Christopher Buckley},\nyear={2021},\nurl={https://openreview.net/forum?id=1wtC_X12XXC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=1wtC_X12XXC", "pdf_size": 0, "rating": "4;4;7;8", "confidence": "4;4;4;2", "wc_review": "689;457;996;142", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "357;403;340;125", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 1.7853571071357126 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 571.0, 312.8841638689948 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 306.25, 107.15263645846517 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7276068751089989, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9614583989177709244&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;1", "aff_unique_norm": "University of Oxford;University of Sussex", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://www.sussex.ac.uk", "aff_unique_abbr": "Oxford;Sussex", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "1yDrpckYHnN", "title": "Self-supervised and Supervised Joint Training for Resource-rich Machine Translation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains on resource-rich NMT. In this paper, we propose a joint training approach, $F_2$-XEnDec, to combine self-supervised and supervised learning to optimize NMT models. To exploit complementary self-supervised signals for supervised learning, NMT models are trained on examples that are interbred from monolingual and parallel sentences through a new process called crossover encoder-decoder. Experiments on two resource-rich translation bench-marks, WMT\u201914 English-German and WMT\u201914 English-French, demonstrate that our approach achieves substantial improvements over a vanilla Transformer and obtains a new state of the art of 46 BLEU on English-French. Results also show that our approach is capable of improving model robustness against input perturbations which is known as a key weakness in contemporary NMT systems.", "keywords": "resource-rich machine translation;neural machine translation;pre-training;self-supervised learning;joint training", "primary_area": "", "supplementary_material": "", "author": "Yong Cheng;Wei Wang.;Lu Jiang;Wolfgang Macherey", "authorids": "~Yong_Cheng3;~Wei_Wang.1;~Lu_Jiang1;~Wolfgang_Macherey1", "gender": "M;M;M;M", "homepage": ";http://www.lujiang.info/;;https://research.google/people/106159/", "dblp": "34/6276.html;22/752-4;88/4457;w/WeiWang6", "google_scholar": "rZ0mlMYAAAAJ;jIKjjSYAAAAJ;;", "orcid": ";0000-0003-0286-8439;;", "linkedin": ";roadjiang/;;", "or_profile": "~Yong_Cheng3;~Lu_Jiang1;~Wolfgang_Macherey1;~Wei_Wang36", "aff": "Google;Google Research;Google;Google Research", "aff_domain": "google.com;google.com;google.com;google.com", "position": "Researcher;Researcher;Research Scientist;Research Scientist", "bibtex": "@misc{\ncheng2021selfsupervised,\ntitle={Self-supervised and Supervised Joint Training for Resource-rich Machine Translation},\nauthor={Yong Cheng and Wei Wang. and Lu Jiang and Wolfgang Macherey},\nyear={2021},\nurl={https://openreview.net/forum?id=1yDrpckYHnN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=1yDrpckYHnN", "pdf_size": 0, "rating": "5;5;5;7", "confidence": "5;5;4;4", "wc_review": "1125;447;286;1045", "wc_reply_reviewers": "0;182;0;0", "wc_reply_authors": "1401;534;601;1060", "reply_reviewers": "0;1;0;0", "reply_authors": "2;2;1;3", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 725.75, 364.8296691608291 ], "wc_reply_reviewers_avg": [ 45.5, 78.80831174438391 ], "wc_reply_authors_avg": [ 899.0, 353.53712676322976 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=157816153944296093&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "1yXhko8GZEE", "title": "Precondition Layer and Its Use for GANs", "track": "main", "status": "Reject", "tldr": "", "abstract": "One of the major challenges when training generative adversarial nets (GANs) is instability. To address this instability spectral normalization (SN) is remarkably successful. However, SN-GAN still suffers from training instabilities, especially when working with higher-dimensional data. We find that those instabilities are accompanied by large condition numbers of the discriminator weight matrices. To improve training stability we study common linear-algebra practice and employ preconditioning. Specifically, we introduce a preconditioning layer (PC-layer)that performs a low-degree polynomial preconditioning. We use this PC-layer in two ways: 1) fixed preconditioning (FPC) adds a fixed PC-layer to all layers, and 2) adaptive preconditioning (APC) adaptively controls the strength of preconditioning. Empirically, we show that FPC and APC stabilize the training of un-conditional GANs using classical architectures. On LSUN256\u00d7256 data, APC improves FID scores by around 5 points over baselines.", "keywords": "GAN;Preconditioning;Condition Number", "primary_area": "", "supplementary_material": "/attachment/47acf56558d011a54715c22799b345399043b04c.zip", "author": "Tiantian Fang;Alex Schwing;Ruoyu Sun", "authorids": "~Tiantian_Fang1;~Alex_Schwing1;~Ruoyu_Sun1", "gender": "F;Unspecified;", "homepage": ";https://ece.illinois.edu/directory/profile/aschwing;https://ruoyus.github.io/", "dblp": ";79/9775;30/9879-1", "google_scholar": ";3B2c31wAAAAJ;PsfzbCMAAAAJ", "orcid": ";;", "linkedin": "tiantian-fang/;;", "or_profile": "~Tiantian_Fang1;~Alex_Schwing1;~Ruoyu_Sun1", "aff": ";University of Illinois, Urbana Champaign;University of Illinois, Urbana-Champaign", "aff_domain": ";illinois.edu;uiuc.edu", "position": ";Assistant Professor;Assistant Professor", "bibtex": "@misc{\nfang2021precondition,\ntitle={Precondition Layer and Its Use for {\\{}GAN{\\}}s},\nauthor={Tiantian Fang and Alex Schwing and Ruoyu Sun},\nyear={2021},\nurl={https://openreview.net/forum?id=1yXhko8GZEE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=1yXhko8GZEE", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "3;3;3;4", "wc_review": "500;358;341;881", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1289;1301;988;689", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 520.0, 217.37410149325518 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1066.75, 251.57739862714217 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.7745966692414834, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:0rmm88fc414J:scholar.google.com/&scioq=Precondition+Layer+and+Its+Use+for+GANs&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Illinois Urbana-Champaign;University of Illinois", "aff_unique_dep": ";", "aff_unique_url": "https://illinois.edu;https://illinois.edu", "aff_unique_abbr": "UIUC;UIUC", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Urbana-Champaign", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "1z_Hg9oBCtY", "title": "MAS-GAN: Adversarial Calibration of Multi-Agent Market Simulators.", "track": "main", "status": "Reject", "tldr": "", "abstract": "We look at the problem of how the simulation of a financial market should be configured so that it most accurately emulates the behavior of a real market. In particular, we address agent-based simulations of markets that are composed of many hundreds or thousands of trading agents. A solution to this problem is important because it provides a credible test bed for evaluating potential trading algorithms (e.g., execution strategies). Simple backtesting of such algorithms suffers from a critical weaknesses, chiefly that the overall market is not responsive to the candidate trading algorithm. Multi-agent simulations address this weakness by simulating {\\it market impact} via interaction between market participants. Calibration of such multi-agent simulators to ensure realism, however, is a challenge. In this paper, we present MAS-GAN -- a multi-agent simulator calibration method that allows to tune simulator parameters and to support more accurate evaluations of candidate trading algorithm. Our calibration focus is on high level parameters such as the relative proportions of the various types of agents that populate the simulation.\nMAS-GAN is a two-step approach: first, we train a discriminator that is able to distinguish between ``real'' and ``fake'' market data as a part of GAN with self-attention, and then utilize it within an optimization framework to refine simulation parameters. The paper concludes with quantitative examples of applying MAS-GAN to improve simulator realism.", "keywords": "Generative adversarial networks;multi-agent systems.", "primary_area": "", "supplementary_material": "", "author": "Victor Storchan;Svitlana Vyetrenko;Tucker Balch", "authorids": "victor.storchan@jpmchase.com;~Svitlana_Vyetrenko1;~Tucker_Balch2", "gender": ";;M", "homepage": ";;", "dblp": ";26/8396.html;", "google_scholar": ";;jM1cT4QAAAAJ", "orcid": ";;0000-0002-5148-2033", "linkedin": ";;", "or_profile": "victor.storchan@jpmchase.com;~Svitlana_Vyetrenko1;~Tucker_Balch2", "aff": ";J.P. Morgan Chase;J.P. Morgan Chase", "aff_domain": ";jpmorgan.com;jpmorgan.com", "position": ";AI Research Director;Managing Director", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=1z_Hg9oBCtY", "pdf_size": 0, "rating": "3;5;7", "confidence": "4;5;4", "wc_review": "305;372;415", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "200;483;112", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 1.632993161855452 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 364.0, 45.2621990922521 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 265.0, 158.2803420095707 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2397133330658771585&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "JPMorgan Chase & Co.", "aff_unique_dep": "", "aff_unique_url": "https://www.jpmorganchase.com", "aff_unique_abbr": "JPM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "20qC5K2ICZL", "title": "Robust Learning via Golden Symmetric Loss of (un)Trusted Labels", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning robust deep models against noisy labels becomes ever critical when today's data is commonly collected from open platforms and subject to adversarial corruption. The information on the label corruption process, i.e., corruption matrix, can greatly enhance the robustness of deep models but still fall behind in combating hard classes. In this paper, we propose to construct a golden symmetric loss (GSL) based on the estimated confusion matrix as to avoid overfitting to noisy labels and learn effectively from hard classes. GSL is the weighted sum of the corrected regular cross entropy and reverse cross entropy. By leveraging a small fraction of trusted clean data, we estimate the corruption matrix and use it to correct the loss as well as to determine the weights of GSL. We theoretically prove the robustness of the proposed loss function in the presence of dirty labels. We provide a heuristics to adaptively tune the loss weights of GSL according to the noise rate and diversity measured from the dataset. We evaluate our proposed golden symmetric loss on both vision and natural language deep models subject to different types of label noise patterns. Empirical results show that GSL can significantly outperform the existing robust training methods on different noise patterns, showing accuracy improvement up to 18% on CIFAR-100 and 1% on real world noisy dataset of Clothing1M. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Amirmasoud Ghiassi;Robert Birke;Lydia Y. Chen", "authorids": "~Amirmasoud_Ghiassi1;~Robert_Birke1;~Lydia_Y._Chen1", "gender": "M;;F", "homepage": ";;https://www.lydiaychen.com/", "dblp": ";;https://dblp.uni-trier.de/pers/c/Chen:Lydia_Y=.html", "google_scholar": "rYlHxR4AAAAJ;;https://scholar.google.ch/citations?hl=en", "orcid": ";;", "linkedin": ";;", "or_profile": "~Amirmasoud_Ghiassi1;~Robert_Birke1;~Lydia_Y._Chen1", "aff": "Delft University of Technology;;Delft University of Technology", "aff_domain": "tudelft.nl;;tudelft.nl", "position": "PhD student;;Associate Professor", "bibtex": "@misc{\nghiassi2021robust,\ntitle={Robust Learning via Golden Symmetric Loss of (un)Trusted Labels},\nauthor={Amirmasoud Ghiassi and Robert Birke and Lydia Y. Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=20qC5K2ICZL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=20qC5K2ICZL", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "5;4;3;4", "wc_review": "424;337;521;477", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "646;241;702;222", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 439.75, 68.54697294556486 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 452.75, 222.23565757996622 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1103314907122660066&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0", "aff_unique_norm": "Delft University of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.tudelft.nl", "aff_unique_abbr": "TU Delft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Netherlands" }, { "id": "21aG-pxQWa", "title": "Counterfactual Fairness through Data Preprocessing", "track": "main", "status": "Reject", "tldr": "", "abstract": "Machine learning has become more important in real-life decision-making but people are concerned about the ethical problems it may bring when used improperly. Recent work brings the discussion of machine learning fairness into the causal framework and elaborates on the concept of Counterfactual Fairness. In this paper, we develop the Fair Learning through dAta Preprocessing (FLAP) algorithm to learn counterfactually fair decisions from biased training data and formalize the conditions where different data preprocessing procedures should be used to guarantee counterfactual fairness. We also show that Counterfactual Fairness is equivalent to the conditional independence of the decisions and the sensitive attributes given the processed non-sensitive attributes, which enables us to detect discrimination in the original decision using the processed data. The performance of our algorithm is illustrated using simulated data and real-world applications.", "keywords": "Counterfactual fairness;data preprocessing;fairness test;discrimination detection;affirmative action", "primary_area": "", "supplementary_material": "/attachment/e7cb202d30fe382da2f67ac843e53ccb933135f5.zip", "author": "Haoyu Chen;Wenbin Lu;Rui Song;Pulak Ghosh", "authorids": "~Haoyu_Chen4;~Wenbin_Lu1;~Rui_Song2;pulak.ghosh@iimb.ac.in", "gender": "M;M;;", "homepage": ";https://statistics.sciences.ncsu.edu/people/wlu4/;https://song-ray.github.io/;", "dblp": ";;01/2743-6.html;", "google_scholar": "9Bsb3k0AAAAJ;;;", "orcid": "0009-0007-2608-6382;;0000-0003-1875-2115;", "linkedin": "chen-haoyu/;;;", "or_profile": "~Haoyu_Chen4;~Wenbin_Lu1;~Rui_Song2;pulak.ghosh@iimb.ac.in", "aff": "North Carolina State University;North Carolina State University;North Carolina State University;", "aff_domain": "ncsu.edu;ncsu.edu;ncsu.edu;", "position": "PhD student;Full Professor;Full Professor;", "bibtex": "@misc{\nchen2021counterfactual,\ntitle={Counterfactual Fairness through Data Preprocessing},\nauthor={Haoyu Chen and Wenbin Lu and Rui Song and Pulak Ghosh},\nyear={2021},\nurl={https://openreview.net/forum?id=21aG-pxQWa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=21aG-pxQWa", "pdf_size": 0, "rating": "4;5;5", "confidence": "2;3;4", "wc_review": "279;365;265", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1003;543;699", "reply_reviewers": "0;0;0", "reply_authors": "2;1;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 303.0, 44.21161235090467 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 748.3333333333334, 191.0066898188531 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1661434113176282809&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "North Carolina State University", "aff_unique_dep": "", "aff_unique_url": "https://www.ncsu.edu", "aff_unique_abbr": "NCSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "2234Pp-9ikZ", "title": "Don't be picky, all students in the right family can learn from good teachers", "track": "main", "status": "Reject", "tldr": "", "abstract": "State-of-the-art results in deep learning have been improving steadily, in good part due to the use of larger models. However, widespread use is constrained by device hardware limitations, resulting in a substantial performance gap between state-of-the-art models and those that can be effectively deployed on small devices. \n\nWhile Knowledge Distillation (KD) theoretically enables small student models to emulate larger teacher models, in practice selecting a good student architecture requires considerable human expertise. Neural Architecture Search (NAS) appears as a natural solution to this problem but most approaches can be inefficient, as most of the computation is spent comparing architectures sampled from the same distribution, with negligible differences in performance. \n\nIn this paper, we propose to instead search for a family of student architectures sharing the property of being good at learning from a given teacher. \nOur approach AutoKD, powered by Bayesian Optimization, explores a flexible graph-based search space, enabling us to automatically learn the optimal student architecture distribution and KD parameters, while being 20x more sample efficient compared to existing state-of-the-art. We evaluate our method on 3 datasets; on large images specifically, we reach the teacher performance while using 3x less memory and 10x less parameters. Finally, while AutoKD uses the traditional KD loss, it outperforms more advanced KD variants using hand-designed students.", "keywords": "knowledge distillation;neural architecture search;nas;automl;knowledge trasfer;model compression", "primary_area": "", "supplementary_material": "", "author": "Roy Henha Eyono;Fabio Maria Carlucci;Pedro M Esperan\u00e7a;Binxin Ru;Philip Torr", "authorids": "~Roy_Henha_Eyono1;~Fabio_Maria_Carlucci2;~Pedro_M_Esperan\u00e7a1;~Binxin_Ru1;~Philip_Torr1", "gender": ";M;M;;M", "homepage": "https://fmcarlucci.github.io/;;;http://www.robots.ox.ac.uk/~tvg/;https://mila.quebec/en/person/roy-eyono/", "dblp": ";;;;", "google_scholar": ";https://scholar.google.co.uk/citations?user=ralB4sUAAAAJ;https://scholar.google.co.uk/citations?user=4piw-XMAAAAJ;;a_AGzTgAAAAJ", "orcid": "0000-0003-4916-5706;;;;", "linkedin": ";;;;", "or_profile": "~Fabio_Maria_Carlucci2;~Pedro_M_Esperan\u00e7a1;~Binxin_Ru1;~Philip_Torr1;~Roy_Pavel_Samuel_henha_Eyono1", "aff": "Huawei Technologies Ltd.;Huawei Technologies Ltd.;University of Oxford;University of Oxford;McGill University & Mila-Quebec AI Institute", "aff_domain": "huawei.com;huawei.com;ox.ac.uk;ox.ac.uk;mail.mcgill.ca", "position": "Researcher;Researcher;PhD student;Full Professor;PhD student", "bibtex": "@misc{\neyono2021dont,\ntitle={Don't be picky, all students in the right family can learn from good teachers},\nauthor={Roy Henha Eyono and Fabio Maria Carlucci and Pedro M Esperan{\\c{c}}a and Binxin Ru and Philip Torr},\nyear={2021},\nurl={https://openreview.net/forum?id=2234Pp-9ikZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=2234Pp-9ikZ", "pdf_size": 0, "rating": "3;3;5", "confidence": "4;3;4", "wc_review": "357;354;388", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "541;460;604", "reply_reviewers": "0;0;0", "reply_authors": "1;2;2", "rating_avg": [ 3.6666666666666665, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 366.3333333333333, 15.369522511198005 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 535.0, 58.9406481131655 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:yQNC39msDUkJ:scholar.google.com/&scioq=Don%27t+be+picky,+all+students+in+the+right+family+can+learn+from+good+teachers&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1;1;2", "aff_unique_norm": "Huawei;University of Oxford;McGill University", "aff_unique_dep": "Huawei Technologies;;", "aff_unique_url": "https://www.huawei.com;https://www.ox.ac.uk;https://www.mcgill.ca", "aff_unique_abbr": "Huawei;Oxford;McGill", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;1;2", "aff_country_unique": "China;United Kingdom;Canada" }, { "title": "Scalable Transfer Learning with Expert Models", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2545", "id": "23ZjUGpjcc", "poster": "", "openreview": "https://openreview.net/forum?id=23ZjUGpjcc", "slides": "https://iclr.cc/virtual/2021/poster/2545", "video": "https://iclr.cc/virtual/2021/poster/2545", "author_site": "Joan Puigcerver i Perez, Carlos Riquelme, Basil Mustafa, Cedric Renggli, Andr\u00e9 Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby", "tldr": "", "abstract": "Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single model. We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases.", "keywords": "Transfer Learning;Expert Models;Few Shot", "primary_area": "", "supplementary_material": "/attachment/becde7d25ac5ab6249e62c7f925a62f2c953ec05.zip", "author": "Joan Puigcerver;Carlos Riquelme Ruiz;Basil Mustafa;Cedric Renggli;Andr\u00e9 Susano Pinto;Sylvain Gelly;Daniel Keysers;Neil Houlsby", "authorids": "~Joan_Puigcerver1;~Carlos_Riquelme_Ruiz1;basilm@google.com;~Cedric_Renggli1;~Andr\u00e9_Susano_Pinto1;~Sylvain_Gelly1;~Daniel_Keysers2;~Neil_Houlsby1", "gender": "M;M;;;M;M;M;M", "homepage": "http://www.jpuigcerver.net;https://rikel.github.io/;;https://rengglic.github.io/;;;http://www.keysers.net/daniel;https://neilhoulsby.github.io/", "dblp": "155/3271;https://dblp.uni-trier.de/pers/hd/r/Riquelme:Carlos;;;73/10264;;02/6955;91/10669", "google_scholar": "https://scholar.google.com/citations?hl=en;Es2BBeYAAAAJ;;-gquq44AAAAJ;pTYo1vYAAAAJ;https://scholar.google.ch/citations?user=m7LvuTkAAAAJ;nZO3qCcAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;;;", "linkedin": ";;;;;;daniel-keysers-14b9511/;", "or_profile": "~Joan_Puigcerver1;~Carlos_Riquelme_Ruiz1;basilm@google.com;~Cedric_Renggli1;~Andr\u00e9_Susano_Pinto1;~Sylvain_Gelly1;~Daniel_Keysers2;~Neil_Houlsby1", "aff": "Google;Google;;ETHZ - ETH Zurich;Google DeepMind;Google Brain;Google;Google", "aff_domain": "google.com;google.com;;ethz.ch;google.com;google.com;google.com;google.com", "position": "Software Engineer in Research;Researcher;;PhD student;Software Engineer;Software Engineer;Software Engineer;Researcher", "bibtex": "@inproceedings{\npuigcerver2021scalable,\ntitle={Scalable Transfer Learning with Expert Models},\nauthor={Joan Puigcerver and Carlos Riquelme Ruiz and Basil Mustafa and Cedric Renggli and Andr{\\'e} Susano Pinto and Sylvain Gelly and Daniel Keysers and Neil Houlsby},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=23ZjUGpjcc}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;3;4;2", "wc_review": "242;224;551;229", "wc_reply_reviewers": "28;121;17;0", "wc_reply_authors": "340;620;319;395", "reply_reviewers": "1;2;1;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 311.5, 138.43139094872956 ], "wc_reply_reviewers_avg": [ 41.5, 46.97073557013984 ], "wc_reply_authors_avg": [ 418.5, 119.60037625358876 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.4545454545454545, "gs_citation": 67, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16267389337622022287&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=23ZjUGpjcc", "email": "google.com;google.com;;ethz.ch;google.com;google.com;google.com;google.com", "author_num": 8, "aff_unique_index": "0;0;1;0;0;0;0", "aff_unique_norm": "Google;ETH Zurich", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.ethz.ch", "aff_unique_abbr": "Google;ETHZ", "aff_campus_unique_index": "0;0;0;0;0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;0;1;2;0;0;0", "aff_country_unique": "United States;Switzerland;United Kingdom" }, { "id": "24-DxeAe2af", "title": "Accurate and fast detection of copy number variations from short-read whole-genome sequencing with deep convolutional neural network", "track": "main", "status": "Reject", "tldr": "", "abstract": "A copy number variant (CNV) is a type of genetic mutation where a stretch of DNA is lost or duplicated once or multiple times. CNVs play important roles in the development of diseases and complex traits. CNV detection with short-read DNA sequencing technology is challenging because CNVs significantly vary in size and are similar to DNA sequencing artifacts. Many methods have been developed but still yield unsatisfactory results with high computational costs. Here, we propose CNV-Net, a novel approach for CNV detection using a six-layer convolutional neural network. We encode DNA sequencing information into RGB images and train the convolutional neural network with these images. The fitted convolutional neural network can then be used to predict CNVs from DNA sequencing data. We benchmark CNV-Net with two high-quality whole-genome sequencing datasets available from the Genome in a Bottle Consortium, considered as gold standard benchmarking datasets for CNV detection. We demonstrate that CNV-Net is more accurate and efficient in CNV detection than current tools.", "keywords": "copy number variation;deep learning;convolutional neural network;computational biology;DNA sequencing", "primary_area": "", "supplementary_material": "", "author": "Jiajin Li;Stephen Hwang;Luke Zhang;Jae Hoon Sul", "authorids": "~Jiajin_Li3;sjhwang@ucsc.edu;zhanglucasjifeng@gmail.com;~Jae_Hoon_Sul1", "gender": "M;;;", "homepage": ";;;", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": "albertleejiajinli/;;;", "or_profile": "~Jiajin_Li3;sjhwang@ucsc.edu;zhanglucasjifeng@gmail.com;~Jae_Hoon_Sul1", "aff": "University of California, Los Angeles;;;", "aff_domain": "ucla.edu;;;", "position": "PhD student;;;", "bibtex": "@misc{\nli2021accurate,\ntitle={Accurate and fast detection of copy number variations from short-read whole-genome sequencing with deep convolutional neural network},\nauthor={Jiajin Li and Stephen Hwang and Luke Zhang and Jae Hoon Sul},\nyear={2021},\nurl={https://openreview.net/forum?id=24-DxeAe2af}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=24-DxeAe2af", "pdf_size": 0, "rating": "2;2;3;5", "confidence": "4;5;4;3", "wc_review": "308;317;615;230", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 367.5, 146.8443053032701 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8660254037844386, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:wcHi9TXZv7gJ:scholar.google.com/&scioq=Accurate+and+fast+detection+of+copy+number+variations+from+short-read+whole-genome+sequencing+with+deep+convolutional+neural+network&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of California, Los Angeles", "aff_unique_dep": "", "aff_unique_url": "https://www.ucla.edu", "aff_unique_abbr": "UCLA", "aff_campus_unique_index": "0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "25OSRH9H0Gi", "title": "Putting Theory to Work: From Learning Bounds to Meta-Learning Algorithms", "track": "main", "status": "Reject", "tldr": "", "abstract": "Most of existing deep learning models rely on excessive amounts of labeled training data in order to achieve state-of-the-art results, even though these data can be hard or costly to get in practice. One attractive alternative is to learn with little supervision, commonly referred to as few-shot learning (FSL), and, in particular, meta-learning that learns to learn with few data from related tasks. Despite the practical success of meta-learning, many of its algorithmic solutions proposed in the literature are based on sound intuitions, but lack a solid theoretical analysis of the expected performance on the test task. In this paper, we review the recent advances in meta-learning theory and show how they can be used in practice both to better understand the behavior of popular meta-learning algorithms and to improve their generalization capacity. This latter is achieved by integrating the theoretical assumptions ensuring efficient meta-learning in the form of regularization terms into several popular meta-learning algorithms for which we provide a large study of their behavior on classic few-shot classification benchmarks. To the best of our knowledge, this is the first contribution that puts the most recent learning bounds of meta-learning theory into practice for the popular task of few-shot classification. ", "keywords": "meta-learning;few-shot learning", "primary_area": "", "supplementary_material": "/attachment/d62ec4e6aa55c3b618a2c3d9992dbda236e705b6.zip", "author": "Quentin Bouniot;Ievgen Redko;Romaric Audigier;Ang\u00e9lique Loesch;Amaury Habrard", "authorids": "~Quentin_Bouniot1;~Ievgen_Redko2;romaric.audigier@cea.fr;angelique.loesch@cea.fr;~Amaury_Habrard1", "gender": "M;;;;M", "homepage": "https://qbouniot.github.io/;;;;http://perso.univ-st-etienne.fr/habrarda/", "dblp": "271/7069;150/3980;;;22/2297.html", "google_scholar": "https://scholar.google.com/citations?hl=fr;https://scholar.google.fr/citations?user=qJ1-XewAAAAJ;;;https://scholar.google.fr/citations?user=oPemAuMAAAAJ", "orcid": "0000-0002-0982-372X;;;;", "linkedin": "quentin-bouniot/;;;;amaury-habrard-0375145", "or_profile": "~Quentin_Bouniot1;~Ievgen_Redko2;romaric.audigier@cea.fr;angelique.loesch@cea.fr;~Amaury_Habrard1", "aff": "CEA;University Lyon;;;Universit\u00e9 Saint-Etienne, Laboratoire Hubert Curien", "aff_domain": "cea.fr;univ-st-etienne.fr;;;univ-st-etienne.fr", "position": "PhD student;Associate Professor;;;Full Professor", "bibtex": "@misc{\nbouniot2021putting,\ntitle={Putting Theory to Work: From Learning Bounds to Meta-Learning Algorithms},\nauthor={Quentin Bouniot and Ievgen Redko and Romaric Audigier and Ang{\\'e}lique Loesch and Amaury Habrard},\nyear={2021},\nurl={https://openreview.net/forum?id=25OSRH9H0Gi}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=25OSRH9H0Gi", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;4;4;3", "wc_review": "204;225;260;357", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "512;456;479;602", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 261.5, 58.653644388051454 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 512.25, 55.50844530339505 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:AaMFeBcFmB0J:scholar.google.com/&scioq=Putting+Theory+to+Work:+From+Learning+Bounds+to+Meta-Learning+Algorithms&hl=en&as_sdt=0,5", "gs_version_total": 8, "aff_unique_index": "0;1;2", "aff_unique_norm": "Commissariat \u00e0 l'\u00c9nergie Atomique et aux \u00c9nergies Alternatives;University of Lyon;Universit\u00e9 Saint-Etienne", "aff_unique_dep": ";;Laboratoire Hubert Curien", "aff_unique_url": "https://www cea fr;https://www.universite-lyon.fr;https://www.univ-st-etienne.fr", "aff_unique_abbr": "CEA;UCBL;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "France" }, { "id": "26WnoE4hjS", "title": "Measuring and mitigating interference in reinforcement learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. But, before we overcome interference we must understand it better. In this work, we first provide a definition and novel measure of interference for value-based control methods such as Fitted Q Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with forgetting, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and develop new learning algorithms. In particular we show that updates on the last layer result in significantly higher interference than updates internal to the network. Lastly, we introduce a novel online-aware representation learning algorithm to minimize interference, and we empirically demonstrate that it improves stability and has lower interference.", "keywords": "Reinforcement Learning;Representation Learning", "primary_area": "", "supplementary_material": "/attachment/e082d7b880944a5d6339c0b1cab831470f2fb1e5.zip", "author": "Vincent Liu;Adam M White;Hengshuai Yao;Martha White", "authorids": "~Vincent_Liu3;~Adam_M_White1;~Hengshuai_Yao2;~Martha_White1", "gender": ";M;F;M", "homepage": ";http://adamwhite.ca;http://marthawhite.ca;https://hengshuaiyao.github.io/", "dblp": ";91/10481;60/7057;25/4960", "google_scholar": "https://scholar.google.ca/citations?hl=en;https://scholar.google.ca/citations?user=1GqGhcsAAAAJ;t5zdD_IAAAAJ;R_wcnUgAAAAJ", "orcid": ";;0000-0002-5356-2950;", "linkedin": ";;;", "or_profile": "~Vincent_Liu3;~Adam_M_White1;~Martha_White1;~hengshuai_yao1", "aff": "University of Alberta;Deepmind;University of Alberta;Huawei Technologies Ltd.", "aff_domain": "ualberta.ca;google.com;ualberta.ca;huawei.com", "position": "PhD student;Research Scientist;Associate Professor;Principal Researcher", "bibtex": "@misc{\nliu2021measuring,\ntitle={Measuring and mitigating interference in reinforcement learning},\nauthor={Vincent Liu and Adam M White and Hengshuai Yao and Martha White},\nyear={2021},\nurl={https://openreview.net/forum?id=26WnoE4hjS}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=26WnoE4hjS", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "3;4;4;2", "wc_review": "261;529;650;282", "wc_reply_reviewers": "0;0;213;0", "wc_reply_authors": "462;861;1026;254", "reply_reviewers": "0;0;1;0", "reply_authors": "1;2;2;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 430.5, 164.82187354838555 ], "wc_reply_reviewers_avg": [ 53.25, 92.23170550304272 ], "wc_reply_authors_avg": [ 650.75, 307.4307848931203 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.4264014327112209, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11121611407438707131&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "University of Alberta;DeepMind;Huawei", "aff_unique_dep": ";;Huawei Technologies", "aff_unique_url": "https://www.ualberta.ca;https://deepmind.com;https://www.huawei.com", "aff_unique_abbr": "UAlberta;DeepMind;Huawei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;2", "aff_country_unique": "Canada;United Kingdom;China" }, { "title": "Neural ODE Processes", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2925", "id": "27acGyyI1BY", "poster": "", "openreview": "https://openreview.net/forum?id=27acGyyI1BY", "slides": "https://iclr.cc/virtual/2021/poster/2925", "video": "https://iclr.cc/virtual/2021/poster/2925", "author_site": "Alexander Norcliffe, Cristian Bodnar, Ben Day, Jacob Moss, Pietro Li\u00f2", "tldr": "", "abstract": "Neural Ordinary Differential Equations (NODEs) use a neural network to model the instantaneous rate of change in the state of a system. However, despite their apparent suitability for dynamics-governed time-series, NODEs present a few disadvantages. First, they are unable to adapt to incoming data-points, a fundamental requirement for real-time applications imposed by the natural direction of time. Second, time-series are often composed of a sparse set of measurements that could be explained by many possible underlying dynamics. NODEs do not capture this uncertainty. In contrast, Neural Processes (NPs) are a new class of stochastic processes providing uncertainty estimation and fast data-adaptation, but lack an explicit treatment of the flow of time. To address these problems, we introduce Neural ODE Processes (NDPs), a new class of stochastic processes determined by a distribution over Neural ODEs. By maintaining an adaptive data-dependent distribution over the underlying ODE, we show that our model can successfully capture the dynamics of low-dimensional systems from just a few data-points. At the same time, we demonstrate that NDPs scale up to challenging high-dimensional time-series with unknown latent dynamics such as rotating MNIST digits. ", "keywords": "differential equations;neural processes;dynamics;deep learning;neural ode", "primary_area": "", "supplementary_material": "", "author": "Alexander Norcliffe;Cristian Bodnar;Ben Day;Jacob Moss;Pietro Li\u00f2", "authorids": "alex.norcliffe98@gmail.com;~Cristian_Bodnar1;~Ben_Day1;jm2311@cam.ac.uk;~Pietro_Li\u00f21", "gender": ";M;;;", "homepage": ";https://crisbodnar.github.io/;;;", "dblp": ";220/3234;;;l/PietroLio", "google_scholar": ";pSmh9tkAAAAJ;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "alex.norcliffe98@gmail.com;~Cristian_Bodnar1;~Ben_Day1;jm2311@cam.ac.uk;~Pietro_Li\u00f21", "aff": ";University of Cambridge;;;", "aff_domain": ";cam.ac.uk;;;", "position": ";PhD student;;;", "bibtex": "@inproceedings{\nnorcliffe2021neural,\ntitle={Neural {\\{}ODE{\\}} Processes},\nauthor={Alexander Norcliffe and Cristian Bodnar and Ben Day and Jacob Moss and Pietro Li{\\`o}},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=27acGyyI1BY}\n}", "github": "[![github](/images/github_icon.svg) crisbodnar/ndp](https://github.com/crisbodnar/ndp) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=27acGyyI1BY)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "4;3;2;3", "wc_review": "457;384;38;508", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "840;821;59;661", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;2", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 346.75, 183.6237661633156 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 595.25, 317.3140203331709 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 84, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12135997685697455587&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 13, "pdf": "https://openreview.net/pdf?id=27acGyyI1BY", "email": ";cam.ac.uk;;;", "author_num": 5, "aff_unique_index": "0", "aff_unique_norm": "University of Cambridge", "aff_unique_dep": "", "aff_unique_url": "https://www.cam.ac.uk", "aff_unique_abbr": "Cambridge", "aff_campus_unique_index": "0", "aff_campus_unique": "Cambridge", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "28AnM10CHyO", "title": "A Spectral Perspective of Neural Networks Robustness to Label Noise", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Deep networks usually require a massive amount of labeled data for their training. Yet, such data may include some mistakes in the labels. Interestingly, networks have been shown to be robust to such errors. This work uses a spectral (Fourier) analysis of their learned mapping to provide an explanation for their robustness. In particular, we relate the smoothness regularization that usually exists in conventional training to attenuation of high frequencies, which mainly characterize noise. By using a connection between the smoothness and the spectral norm of the network weights, we suggest that one may further improve robustness via spectral normalization. Empirical experiments validate our claims and show the advantage of this normalization for classification with label noise.", "keywords": "Label noise;Neural network robustness;Regularization methods;Spectral normalization;Fourier analysis", "primary_area": "", "supplementary_material": "/attachment/41529a16423d3502f92ea9e17253918d31fc9725.zip", "author": "Oshrat Bar;Amnon Drory;Raja Giryes", "authorids": "~Oshrat_Bar1;~Amnon_Drory1;~Raja_Giryes1", "gender": ";M;M", "homepage": ";;https://www.giryes.sites.tau.ac.il/", "dblp": ";;50/7998", "google_scholar": ";;https://scholar.google.co.il/citations?user=9aQUYVQAAAAJ", "orcid": ";;0000-0002-2830-0297", "linkedin": ";;raja-giryes-0818935/", "or_profile": "~Oshrat_Bar1;~Amnon_Drory1;~Raja_Giryes1", "aff": "Tel Aviv University;Tel Aviv University;Tel Aviv University", "aff_domain": "tau.ac.il;tau.ac.il;tauex.tau.ac.il", "position": "MS student;PhD student;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=28AnM10CHyO", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "4;4;4;4", "wc_review": "739;319;255;434", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "393;284;305;412", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 436.75, 185.91715224798384 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 348.5, 54.92039693957064 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:xLaVQNwsHk0J:scholar.google.com/&scioq=A+Spectral+Perspective+of+Neural+Networks+Robustness+to+Label+Noise&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Tel Aviv University", "aff_unique_dep": "", "aff_unique_url": "https://www.tau.ac.il", "aff_unique_abbr": "TAU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Israel" }, { "title": "Towards Robust Neural Networks via Close-loop Control", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2963", "id": "2AL06y9cDE-", "poster": "", "openreview": "https://openreview.net/forum?id=2AL06y9cDE-", "slides": "https://iclr.cc/virtual/2021/poster/2963", "video": "https://iclr.cc/virtual/2021/poster/2963", "author_site": "Zhuotong Chen, Qianxiao Li, Zheng Zhang", "tldr": "", "abstract": "Despite their success in massive engineering applications, deep neural networks are vulnerable to various perturbations due to their black-box nature. Recent study has shown that a deep neural network can misclassify the data even if the input data is perturbed by an imperceptible amount. In this paper, we address the robustness issue of neural networks by a novel close-loop control method from the perspective of dynamic systems. Instead of modifying the parameters in a fixed neural network architecture, a close-loop control process is added to generate control signals adaptively for the perturbed or corrupted data. We connect the robustness of neural networks with optimal control using the geometrical information of underlying data to design the control objective. The detailed analysis shows how the embedding manifolds of state trajectory affect error estimation of the proposed method. Our approach can simultaneously maintain the performance on clean data and improve the robustness against many types of data perturbations. It can also further improve the performance of robustly trained neural networks against different perturbations. To the best of our knowledge, this is the first work that improves the robustness of neural networks with close-loop control.", "keywords": "neural network robustness;optimal control;dynamical system", "primary_area": "", "supplementary_material": "", "author": "Zhuotong Chen;Qianxiao Li;Zheng Zhang", "authorids": "~Zhuotong_Chen1;~Qianxiao_Li1;~Zheng_Zhang2", "gender": "M;M;M", "homepage": ";https://blog.nus.edu.sg/qianxiaoli/;https://web.ece.ucsb.edu/~zhengzhang/", "dblp": "284/8038;172/0930.html;181/2621-5", "google_scholar": "OVs7TPUAAAAJ;https://scholar.google.com.sg/citations?user=zLgReYoAAAAJ;qeahx5QAAAAJ", "orcid": ";0000-0002-3903-3737;", "linkedin": ";;", "or_profile": "~Zhuotong_Chen1;~Qianxiao_Li1;~Zheng_Zhang2", "aff": "University of California, Santa Barbara;National University of Singapore;UC Santa Barbara", "aff_domain": "ucsb.edu;nus.edu.sg;ucsb.edu", "position": "PhD student;Assistant Professor;Assistant Professor", "bibtex": "@inproceedings{\nchen2021towards,\ntitle={Towards Robust Neural Networks via Close-loop Control},\nauthor={Zhuotong Chen and Qianxiao Li and Zheng Zhang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=2AL06y9cDE-}\n}", "github": "[![github](/images/github_icon.svg) zhuotongchen/Towards-Robust-Neural-Networks-via-Close-loop-Control](https://github.com/zhuotongchen/Towards-Robust-Neural-Networks-via-Close-loop-Control)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;3;3", "wc_review": "498;393;236;284", "wc_reply_reviewers": "10;0;0;0", "wc_reply_authors": "729;544;364;551", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 352.75, 101.3345326135173 ], "wc_reply_reviewers_avg": [ 2.5, 4.330127018922194 ], "wc_reply_authors_avg": [ 547.0, 129.07168550848013 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 34, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3798545379660922122&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=2AL06y9cDE-", "email": "ucsb.edu;nus.edu.sg;ucsb.edu", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of California, Santa Barbara;National University of Singapore", "aff_unique_dep": ";", "aff_unique_url": "https://www.ucsb.edu;https://www.nus.edu.sg", "aff_unique_abbr": "UCSB;NUS", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Santa Barbara;", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;Singapore" }, { "title": "SkipW: Resource Adaptable RNN with Strict Upper Computational Limit", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3160", "id": "2CjEVW-RGOJ", "poster": "", "openreview": "https://openreview.net/forum?id=2CjEVW-RGOJ", "slides": "https://iclr.cc/virtual/2021/poster/3160", "video": "https://iclr.cc/virtual/2021/poster/3160", "author_site": "Tsiry MAYET, Anne Lambert, Pascal Le Guyadec, Francoise Le Bolzer, Fran\u00e7ois Schnitzler", "tldr": "", "abstract": "We introduce Skip-Window, a method to allow recurrent neural networks (RNNs) to trade off accuracy for computational cost during the analysis of a sequence. Similarly to existing approaches, Skip-Window extends existing RNN cells by adding a mechanism to encourage the model to process fewer inputs. Unlike existing approaches, Skip-Window is able to respect a strict computational budget, making this model more suitable for limited hardware. We evaluate this approach on two datasets: a human activity recognition task and adding task. Our results show that Skip-Window is able to exceed the accuracy of existing approaches for a lower computational cost while strictly limiting said cost.", "keywords": "Recurrent neural networks;Flexibility;Computational resources", "primary_area": "", "supplementary_material": "", "author": "Tsiry Mayet;Anne Lambert;Pascal Leguyadec;Francoise Le Bolzer;Fran\u00e7ois Schnitzler", "authorids": "mayet.tsiry@gmail.com;anne.lambert@interdigital.com;pascal.leguyadec@interdigital.com;francoise.lebolzer@interdigital.com;~Fran\u00e7ois_Schnitzler1", "gender": ";;;;M", "homepage": ";;;;https://sites.google.com/site/francoisschnitzler", "dblp": ";;;;18/9854", "google_scholar": ";;;;https://scholar.google.fr/citations?user=IW2QsUYAAAAJ", "orcid": ";;;;0000-0003-1304-2157", "linkedin": ";;;;francois-schnitzler", "or_profile": "mayet.tsiry@gmail.com;anne.lambert@interdigital.com;pascal.leguyadec@interdigital.com;francoise.lebolzer@interdigital.com;~Fran\u00e7ois_Schnitzler1", "aff": ";;;;InterDigital", "aff_domain": ";;;;interdigital.com", "position": ";;;;Senior Scientist", "bibtex": "@inproceedings{\nmayet2021skipw,\ntitle={SkipW: Resource Adaptable {\\{}RNN{\\}} with Strict Upper Computational Limit},\nauthor={Tsiry Mayet and Anne Lambert and Pascal Leguyadec and Francoise Le Bolzer and Fran{\\c{c}}ois Schnitzler},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=2CjEVW-RGOJ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "5;5;2;3", "wc_review": "467;1088;570;250", "wc_reply_reviewers": "759;811;0;0", "wc_reply_authors": "1476;961;877;463", "reply_reviewers": "3;3;0;0", "reply_authors": "3;3;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 1.299038105676658 ], "wc_review_avg": [ 593.75, 307.84604512645603 ], "wc_reply_reviewers_avg": [ 392.5, 392.9303373372944 ], "wc_reply_authors_avg": [ 944.25, 360.26474640186484 ], "reply_reviewers_avg": [ 1.5, 1.5 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5555555555555555, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3618159677812504202&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=2CjEVW-RGOJ", "email": ";;;;interdigital.com", "author_num": 5, "aff_unique_index": "0", "aff_unique_norm": "InterDigital", "aff_unique_dep": "", "aff_unique_url": "https://www.interdigital.com", "aff_unique_abbr": "InterDigital", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "2Ey_1FeNtOC", "title": "Minimum Description Length Recurrent Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recurrent neural networks (RNNs) face two well-known challenges: (a) the difficulty of such networks to generalize appropriately as opposed to memorizing, especially from very short input sequences (generalization); and (b) the difficulty for us to understand the knowledge that the network has attained (transparency). We explore the implications to these challenges of employing a general search through neural architectures using a genetic algorithm with Minimum Description Length (MDL) as an objective function. We find that MDL leads the networks to reach adequate levels of generalization from very small corpora, improving over backpropagation-based alternatives. We demonstrate this approach by evolving networks which perform tasks of increasing complexity with absolute correctness. The resulting networks are small, easily interpretable, and unlike classical RNNs, are provably appropriate for sequences of arbitrary length even when trained on very limited corpora. One case study is addition, for which our system grows a network with just four cells, reaching 100% accuracy (and at least .999 certainty) for arbitrary large numbers.", "keywords": "recurrent neural network;neural network;language modeling;minimum description length;genetic algorithm;semantics;syntax", "primary_area": "", "supplementary_material": "", "author": "Nur Lan;Emmanuel Chemla;Roni Katzir", "authorids": "nurlan@mail.tau.ac.il;chemla@ens.fr;~Roni_Katzir1", "gender": ";;", "homepage": ";;https://taucompling.github.io/", "dblp": ";;45/5156", "google_scholar": ";;BlRjL4gAAAAJ", "orcid": ";;0000-0002-0241-1896", "linkedin": ";;", "or_profile": "nurlan@mail.tau.ac.il;chemla@ens.fr;~Roni_Katzir1", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "@misc{\nlan2021minimum,\ntitle={Minimum Description Length Recurrent Neural Networks},\nauthor={Nur Lan and Emmanuel Chemla and Roni Katzir},\nyear={2021},\nurl={https://openreview.net/forum?id=2Ey_1FeNtOC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=2Ey_1FeNtOC", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "3;3;5;3", "wc_review": "714;450;89;563", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "405;358;213;269", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 454.0, 230.60897640811817 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 311.25, 74.85444208595773 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.13245323570650439, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6315172319950330390&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 12 }, { "id": "2G9u-wu2tXP", "title": "Continual learning using hash-routed convolutional neural networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Continual learning could shift the machine learning paradigm from data centric to model centric. A continual learning model needs to scale efficiently to handle semantically different datasets, while avoiding unnecessary growth. We introduce hash-routed convolutional neural networks: a group of convolutional units where data flows dynamically. Feature maps are compared using feature hashing and similar data is routed to the same units. A hash-routed network provides excellent plasticity thanks to its routed nature, while generating stable features through the use of orthogonal feature hashing. Each unit evolves separately and new units can be added (to be used only when necessary). Hash-routed networks achieve excellent performance across a variety of typical continual learning benchmarks without storing raw data and train using only gradient descent. Besides providing a continual learning framework for supervised tasks with encouraging results, our model can be used for unsupervised or reinforcement learning.", "keywords": "Lifelong learning;continual learning;feature hashing", "primary_area": "", "supplementary_material": "/attachment/2bbbf67903e6d351ab354beffa0e223b76854278.zip", "author": "Ahmad Berjaoui", "authorids": "~Ahmad_Berjaoui1", "gender": "M", "homepage": "", "dblp": "https://dblp.uni-trier.de/pid/228/9118.html", "google_scholar": "UWUptKgAAAAJ", "orcid": "", "linkedin": "ahmad-berjaoui-a960b4133/", "or_profile": "~Ahmad_Berjaoui1", "aff": "", "aff_domain": "", "position": "", "bibtex": "@misc{\nberjaoui2021continual,\ntitle={Continual learning using hash-routed convolutional neural networks},\nauthor={Ahmad Berjaoui},\nyear={2021},\nurl={https://openreview.net/forum?id=2G9u-wu2tXP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=2G9u-wu2tXP", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "4;4;4;3", "wc_review": "1729;981;500;124", "wc_reply_reviewers": "823;69;0;0", "wc_reply_authors": "308;416;188;89", "reply_reviewers": "1;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 833.5, 599.6434357182608 ], "wc_reply_reviewers_avg": [ 223.0, 347.553592989628 ], "wc_reply_authors_avg": [ 250.25, 123.17137451534752 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16430686646717562876&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11 }, { "id": "2HLTMwxOxwe", "title": "Learn what you can't learn: Regularized Ensembles for Transductive out-of-distribution detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "Machine learning models are often used in practice once they achieve good generalization results on in-distribution (ID) holdout data. To predict test sets in the wild, they should detect samples they cannot predict well. We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios, e.g.\\ when OOD data consists of unseen classes or corrupted measurements. This paper studies how such ``hard'' OOD scenarios can benefit from tuning the detection method after observing a batch of the test data. This \\emph{transductive} setting is relevant when the advantage of even a slightly delayed OOD detection outweighs the financial cost for additional tuning. We propose a novel method that uses an artificial labeling scheme for the test data and early stopping regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch. We show via comprehensive experiments that our approach is indeed able to significantly outperform both inductive and transductive baselines on difficult OOD detection scenarios, such as unseen classes on CIFAR-10/CIFAR-100, severe corruptions (CIFAR-C), and strong covariate shift ImageNet vs ObjectNet.", "keywords": "out-of-distribution detection;transductive;predictive uncertainty;ensembles;ensemble diversity;outlier detection", "primary_area": "", "supplementary_material": "", "author": "Alexandru \u021aifrea;Eric Petru Stavarache;Fanny Yang", "authorids": "~Alexandru_\u021aifrea1;ericst@student.ethz.ch;~Fanny_Yang1", "gender": ";;", "homepage": ";;http://www.fanny-yang.de", "dblp": ";;126/4852", "google_scholar": ";;BfDKicQAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Alexandru_\u021aifrea1;ericst@student.ethz.ch;~Fanny_Yang1", "aff": ";;Swiss Federal Institute of Technology", "aff_domain": ";;ethz.ch", "position": ";;Professor", "bibtex": "@misc{\n\u021bifrea2021learn,\ntitle={Learn what you can't learn: Regularized Ensembles for Transductive out-of-distribution detection},\nauthor={Alexandru \u021aifrea and Eric Petru Stavarache and Fanny Yang},\nyear={2021},\nurl={https://openreview.net/forum?id=2HLTMwxOxwe}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=2HLTMwxOxwe", "pdf_size": 0, "rating": "4;6;6;8", "confidence": "4;2;3;3", "wc_review": "852;696;668;157", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "708;404;1413;134", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;3;1", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 593.25, 261.44346903298236 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 664.75, 477.34545928499205 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1278906471923039999&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Swiss Federal Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich", "aff_country_unique_index": "0", "aff_country_unique": "Switzerland" }, { "id": "2Id6XxTjz7c", "title": "Post-Training Weighted Quantization of Neural Networks for Language Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "As a practical model compression technique, parameter quantization is effective especially for language models associated with a large memory footprint. Neural network quantization is usually performed to reduce quantization loss assuming that quantization error of each parameter equally contributes to the overall training loss. The importance of each parameter, however, may highly differ such that for the same number of quantization bits, certain parameters lead to higher training loss than the others after quantization. In this paper, we consider a non-uniform quantization scheme, specifically binary-coding-based quantization, for high compression ratio and efficient computations while avoiding large accuracy degradation by uniform quantization (e.g., INT8). Then, we derive quantization optimization methods to take into account the importance of each parameter. We demonstrate that for post-training quantization, weight magnitude can represent importance and improve model accuracy significantly compared to the previous schemes lacking importance considerations. For various language models including BERT, DistilBERT, AWD-LSTM, and Transformer, we achieve 2-4 bits per weight by our proposed post-training quantization with reasonable accuracy degradation.", "keywords": "Model Compression;Non-uniform Quantization;Post-training Quantization;Language Model", "primary_area": "", "supplementary_material": "", "author": "Se Jung Kwon;Dongsoo Lee;Yongkweon Jeon;Byeongwook Kim;Bae Seong Park;Yeonju Ro", "authorids": "~Se_Jung_Kwon1;~Dongsoo_Lee1;~Yongkweon_Jeon1;~Byeongwook_Kim1;~Bae_Seong_Park1;~Yeonju_Ro1", "gender": "M;M;;;M;F", "homepage": ";;;;https://baeseong.tistory.com/;https://sites.google.com/view/hey-yeonju", "dblp": "119/5676;11/9680;;220/5405;241/6925.html;232/0146", "google_scholar": "https://scholar.google.co.kr/citations?user=8eTxKOkAAAAJ;ALiieEkAAAAJ;;https://scholar.google.co.kr/citations?user=OjfC7gUAAAAJ;https://scholar.google.co.kr/citations?user=RMmyMJsAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;0009-0002-9034-0377", "linkedin": "se-jung-kwon-305503175/;;;;baeseong-park/;yeonju-ro-a938728b/", "or_profile": "~Se_Jung_Kwon1;~Dongsoo_Lee1;~Yongkweon_Jeon1;~Byeongwook_Kim1;~Bae_Seong_Park1;~Yeonju_Ro1", "aff": "Samsung Research;Samsung Research;;Samsung Research;Samsung Research;Samsung Research", "aff_domain": "samsung.com;samsung.com;;research.samsung.com;research.samsung.com;samsung.com", "position": "Staff Engineer;Principal Engineer;;Software Engineer;Software Engineer;Researcher", "bibtex": "@misc{\nkwon2021posttraining,\ntitle={Post-Training Weighted Quantization of Neural Networks for Language Models},\nauthor={Se Jung Kwon and Dongsoo Lee and Yongkweon Jeon and Byeongwook Kim and Bae Seong Park and Yeonju Ro},\nyear={2021},\nurl={https://openreview.net/forum?id=2Id6XxTjz7c}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=2Id6XxTjz7c", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "5;4;3;4", "wc_review": "366;220;313;228", "wc_reply_reviewers": "272;1022;0;0", "wc_reply_authors": "1753;1636;520;52", "reply_reviewers": "1;4;0;0", "reply_authors": "4;4;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 281.75, 60.77982806820039 ], "wc_reply_reviewers_avg": [ 323.5, 418.28787933670753 ], "wc_reply_authors_avg": [ 990.25, 724.60829935904 ], "reply_reviewers_avg": [ 1.25, 1.6393596310755 ], "reply_authors_avg": [ 2.5, 1.5 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15221476406826998906&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "Samsung", "aff_unique_dep": "Samsung Research", "aff_unique_url": "https://research.samsung.com", "aff_unique_abbr": "Samsung", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "2K5WDVL2KI", "title": "Information Condensing Active Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "We introduce Information Condensing Active Learning (ICAL), a batch mode model agnostic Active Learning (AL) method targeted at Deep Bayesian Active Learning that focuses on acquiring labels for points which have as much information as possible about the still unacquired points. ICAL uses the Hilbert Schmidt Independence Criterion (HSIC) to measure the strength of the dependency between a candidate batch of points and the unlabeled set. We develop key optimizations that allow us to scale our method to large unlabeled sets. We show significant improvements in terms of model accuracy and negative log likelihood (NLL) on several image datasets compared to state of the art batch mode AL methods for deep learning.", "keywords": "active learning", "primary_area": "", "supplementary_material": "/attachment/8c55bc17e51abaa1bfc65a949a17c30b12e173af.zip", "author": "Siddhartha Jain;Ge Liu;David Gifford", "authorids": "~Siddhartha_Jain1;~Ge_Liu2;~David_Gifford1", "gender": "M;F;M", "homepage": "https://tmfs10.github.io/;http://www.mit.edu/~geliu/;http://giffordlab.mit.edu", "dblp": "81/8212;;g/DavidKGifford", "google_scholar": "mBJIa8cAAAAJ;P6EahzcAAAAJ;", "orcid": ";0000-0001-9383-5186;", "linkedin": ";;", "or_profile": "~Siddhartha_Jain1;~Ge_Liu2;~David_Gifford1", "aff": "Amazon;Amazon AWS AI;Massachusetts Institute of Technology", "aff_domain": "amazon.com;amazon.com;mit.edu", "position": "Applied Scientist;Researcher;Full Professor", "bibtex": "@misc{\njain2021information,\ntitle={Information Condensing Active Learning},\nauthor={Siddhartha Jain and Ge Liu and David Gifford},\nyear={2021},\nurl={https://openreview.net/forum?id=2K5WDVL2KI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=2K5WDVL2KI", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "3;4;3;4", "wc_review": "256;271;674;469", "wc_reply_reviewers": "0;0;263;0", "wc_reply_authors": "173;294;1041;387", "reply_reviewers": "0;0;2;0", "reply_authors": "1;1;4;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 417.5, 170.28578918982055 ], "wc_reply_reviewers_avg": [ 65.75, 113.88234059765368 ], "wc_reply_authors_avg": [ 473.75, 336.1765719082756 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=427288786869855412&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;1", "aff_unique_norm": "Amazon;Massachusetts Institute of Technology", "aff_unique_dep": "Amazon.com, Inc.;", "aff_unique_url": "https://www.amazon.com;https://web.mit.edu", "aff_unique_abbr": "Amazon;MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "2KSsaPGemn2", "title": "Non-Linear Rewards For Successor Features", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement Learning algorithms have reached new heights in performance, often overtaking humans on several challenging tasks such as Atari and Go. However, the resulting models learn fragile policies that are unable to transfer between tasks without full retraining. Successor features aim to improve this situation by decomposing the policy into two components: one capturing environmental dynamics and the other modelling reward. Under this framework, transfer between related tasks requires only training the reward component. However, successor features builds upon the limiting assumption that the current reward can be predicted from a linear combination of state features. This paper proposes a novel improvement to the successor feature framework, where we instead assume that the reward function is a non-linear function of the state features, thereby increasing its representational power. After derivation of the new state-action value function, the decomposition includes a second term that learns the auto-correlation matrix between state features. Experimentally, we show this term explicitly models the environment's stochasticity and can also be used in place of $\\epsilon$-greedy exploration methods during transfer. The performance of the proposed improvements to the successor feature framework is validated empirically on navigation tasks and control of a simulated robotic arm.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Norman L Tasfi;Miriam Capretz", "authorids": "~Norman_L_Tasfi1;~Miriam_Capretz1", "gender": "M;F", "homepage": "http://343hz.com;", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Norman_L_Tasfi1;~Miriam_Capretz1", "aff": "University of Western Ontario;University of Western Ontario", "aff_domain": "uwo.ca;uwo.ca", "position": "PhD student;Full Professor", "bibtex": "@misc{\ntasfi2021nonlinear,\ntitle={Non-Linear Rewards For Successor Features},\nauthor={Norman L Tasfi and Miriam Capretz},\nyear={2021},\nurl={https://openreview.net/forum?id=2KSsaPGemn2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=2KSsaPGemn2", "pdf_size": 0, "rating": "4;4;4;4", "confidence": "4;4;4;5", "wc_review": "582;850;962;471", "wc_reply_reviewers": "0;0;0;393", "wc_reply_authors": "350;825;848;1115", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;2", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 716.25, 197.7705425486819 ], "wc_reply_reviewers_avg": [ 98.25, 170.1739918436422 ], "wc_reply_authors_avg": [ 784.5, 275.5417391249464 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10631631221620610676&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Western Ontario", "aff_unique_dep": "", "aff_unique_url": "https://www.uwo.ca", "aff_unique_abbr": "UWO", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Canada" }, { "id": "2LBhynkS2SC", "title": "Individuality in the hive - Learning to embed lifetime social behaviour of honey bees", "track": "main", "status": "Reject", "tldr": "", "abstract": "Honey bees are a popular model for complex social systems, in which global behavior emerges from the actions and interactions of thousands of individuals. While the average life of a bee is organized as a sequence of tasks roughly determined by age, there is substantial variation at the individual level. For example, young bees can become foragers early in life, depending on the colony\u2019s needs. Using a unique dataset containing lifetime trajectories of all individuals over multiple generations in two honey bee colonies, we propose a new temporal matrix factorization model that jointly learns the average developmental path and structured variations of individuals in the social network over their entire lives. Our method yields inherently interpretable embeddings that are biologically plausible and consistent over time, which allow one to compare individuals regardless of when, or in which colony, they lived. Our method provides a quantitative framework for understanding behavioral heterogeneity in complex social systems applicable in fields such as behavioral biology, social sciences, neuroscience, and information science.", "keywords": "matrix factorization;honey bees;explainable;social networks;implicit bias;dataset", "primary_area": "", "supplementary_material": "/attachment/33402958c558295be85d24aa705b42f07f93d250.zip", "author": "Benjamin Wild;David Dormagen;Michael L Smith;Tim Landgraf", "authorids": "~Benjamin_Wild1;david.dormagen@fu-berlin.de;msmith@ab.mpg.de;~Tim_Landgraf1", "gender": "M;;;", "homepage": ";;;", "dblp": "131/9500;;;04/10008", "google_scholar": ";;;https://scholar.google.de/citations?user=ChX0opIAAAAJ", "orcid": "0000-0002-7492-8448;;;0000-0003-4951-5235", "linkedin": ";;;", "or_profile": "~Benjamin_Wild1;david.dormagen@fu-berlin.de;msmith@ab.mpg.de;~Tim_Landgraf1", "aff": "Freie Universitaet Berlin;;;Freie Universit\u00e4t Berlin", "aff_domain": "fu-berlin.de;;;fu-berlin.de", "position": "PhD student;;;Assistant Professor", "bibtex": "@misc{\nwild2021individuality,\ntitle={Individuality in the hive - Learning to embed lifetime social behaviour of honey bees},\nauthor={Benjamin Wild and David Dormagen and Michael L Smith and Tim Landgraf},\nyear={2021},\nurl={https://openreview.net/forum?id=2LBhynkS2SC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=2LBhynkS2SC", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;4;4;3", "wc_review": "568;366;552;932", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1167;700;1130;1229", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;2;2", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 604.5, 205.0774244035652 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1056.5, 208.8426441127386 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:aG_7N0dYQOYJ:scholar.google.com/&scioq=Individuality+in+the+hive+-+Learning+to+embed+lifetime+social+behaviour+of+honey+bees&hl=en&as_sdt=0,33", "gs_version_total": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Freie Universitaet Berlin;Freie Universit\u00e4t Berlin", "aff_unique_dep": ";", "aff_unique_url": "https://www.fu-berlin.de;https://www.fu-berlin.de", "aff_unique_abbr": "FU Berlin;FU Berlin", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "2LiGI26kRdt", "title": "Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup", "track": "main", "status": "Reject", "tldr": "", "abstract": "Pre-trained language models, such as BERT, have achieved significant accuracy gain in many natural language processing tasks. Despite its effectiveness, the huge number of parameters makes training a BERT model computationally very challenging. In this paper, we propose an efficient multi-stage layerwise training (MSLT) approach to reduce the training time of BERT. We decompose the whole training process into several stages. The training is started from a small model with only a few encoder layers and we gradually increase the depth of the model by adding new encoder layers. At each stage, we only train the top (near the output layer) few encoder layers which are newly added. The parameters of the other layers which have been trained in the previous stages will not be updated in the current stage. In BERT training, the backward calculation is much more time-consuming than the forward calculation, especially in the distributed training setting in which the backward calculation time further includes the communication time for gradient synchronization. In the proposed training strategy, only top few layers participate backward calculation, while most layers only participate forward calculation. Hence both the computation and communication efficiencies are greatly improved. Experimental results show that the proposed method can greatly reduce the training time without significant performance degradation.", "keywords": "BERT;Training speedup;Multi-stage training;Natural language processing", "primary_area": "", "supplementary_material": "", "author": "Cheng Yang;Shengnan Wang;Chao Yang;Yuechuan Li;Ru He;Jingqiao Zhang", "authorids": "~Cheng_Yang3;~Shengnan_Wang2;~Chao_Yang7;~Yuechuan_Li1;~Ru_He1;~Jingqiao_Zhang1", "gender": "M;M;;;M;M", "homepage": ";;;;;", "dblp": ";;;151/4807;;", "google_scholar": "5QdPzoAAAAAJ;TFZviW8AAAAJ;;;QSVXSmIAAAAJ;", "orcid": ";;;;;", "linkedin": ";;%E8%B6%85-%E6%9D%A8-400740153/;;ru-he-0b66b117/;zhangj/", "or_profile": "~Cheng_Yang3;~Shengnan_Wang2;~Chao_Yang7;~Yuechuan_Li1;~Ru_He1;~Jingqiao_Zhang1", "aff": "Alibaba Group;Zhejiang University;;;Alibaba Group;Alibaba Group", "aff_domain": "alibaba-inc.com;zju.edu.cn;;;alibaba-inc.com;alibaba-inc.com", "position": "Researcher;Researcher;;;Staff Algorithm Engineer;Researcher", "bibtex": "@misc{\nyang2021progressively,\ntitle={Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for {\\{}BERT{\\}} Training Speedup},\nauthor={Cheng Yang and Shengnan Wang and Chao Yang and Yuechuan Li and Ru He and Jingqiao Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=2LiGI26kRdt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=2LiGI26kRdt", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "5;4;4;4", "wc_review": "229;186;256;229", "wc_reply_reviewers": "0;91;0;0", "wc_reply_authors": "186;476;246;218", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 225.0, 25.06990227344335 ], "wc_reply_reviewers_avg": [ 22.75, 39.40415587219196 ], "wc_reply_authors_avg": [ 281.5, 114.28363837400347 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 25, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17735071194556288578&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Alibaba Group;Zhejiang University", "aff_unique_dep": ";", "aff_unique_url": "https://www.alibaba.com;https://www.zju.edu.cn", "aff_unique_abbr": "Alibaba;ZJU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "2NHl-ETnHxk", "title": "Adversarial Privacy Preservation in MRI Scans of the Brain", "track": "main", "status": "Reject", "tldr": "", "abstract": "De-identification of magnetic resonance imagery (MRI) is intrinsically difficult since, even with all metadata removed, a person's face can easily be rendered and matched against a database. Existing de-identification methods tackle this task by obfuscating or removing parts of the face, but they either fail to reliably hide the patient's identity or they remove so much information that they adversely affect further analyses in the 3D space surrounding the face. In this work, we describe a new class of MRI de-identification techniques that remodel privacy-sensitive facial features as opposed to removing them.To accomplish this, we propose a conditional, multi-scale, 3D GAN architecture that takes a patient's MRI scan as input and generates a 3D volume in which the brain is not modified but the face has been de-identified. Compared to the classical removal-based techniques, our deep learning framework preserves privacy more reliably without adversely affecting downstream medical analyses on the brain, including segmentation and age prediction.", "keywords": "medical imaging;generative modeling;privacy;de-identification", "primary_area": "", "supplementary_material": "/attachment/68331d2025cda3e7dba7ef6f0ddbb10fa3f778b8.zip", "author": "Lennart Alexander Van der Goten;Tobias Hepp;Zeynep Akata;Kevin Smith", "authorids": "~Lennart_Alexander_Van_der_Goten1;~Tobias_Hepp1;~Zeynep_Akata1;~Kevin_Smith1", "gender": ";M;F;", "homepage": ";https://www.is.mpg.de/person/thepp;https://eml-unitue.de/people/zeynep-akata;", "dblp": ";;117/4838;", "google_scholar": ";;jQl9RtkAAAAJ;", "orcid": ";;0000-0002-1432-7747;", "linkedin": ";;zeynep-akata-36182045/?ppe=1;", "or_profile": "~Lennart_Alexander_Van_der_Goten1;~Tobias_Hepp1;~Zeynep_Akata1;~Kevin_Smith1", "aff": ";Max Planck Institute for Intelligent Systems, Max-Planck Institute;University of T\u00fcbingen;", "aff_domain": ";tuebingen.mpg.de;uni-tuebingen.de;", "position": ";PhD student;Full Professor;", "bibtex": "@misc{\ngoten2021adversarial,\ntitle={Adversarial Privacy Preservation in {\\{}MRI{\\}} Scans of the Brain},\nauthor={Lennart Alexander Van der Goten and Tobias Hepp and Zeynep Akata and Kevin Smith},\nyear={2021},\nurl={https://openreview.net/forum?id=2NHl-ETnHxk}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=2NHl-ETnHxk", "pdf_size": 0, "rating": "3;3;6;6;7", "confidence": "4;4;4;4;3", "wc_review": "175;335;104;190;417", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "430;642;424;385;335", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.0, 1.6733200530681511 ], "confidence_avg": [ 3.8, 0.39999999999999997 ], "wc_review_avg": [ 244.2, 114.44369794794294 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 443.2, 105.03218554328953 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5976143046671969, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11776389548247383643&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Max Planck Institute for Intelligent Systems;University of T\u00fcbingen", "aff_unique_dep": "Intelligent Systems;", "aff_unique_url": "https://www.mpi-is.mpg.de;https://www.uni-tuebingen.de/", "aff_unique_abbr": "MPI-IS;Uni T\u00fcbingen", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "2NU7a9AHo-6", "title": "AUL is a better optimization metric in PU learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Traditional binary classification models are trained and evaluated with fully labeled data which is not common in real life. In non-ideal dataset, only a small fraction of positive data are labeled. Training a model from such partially labeled data is named as positive-unlabeled (PU) learning. A naive solution of PU learning is treating unlabeled samples as negative. However, using biased data, the trained model may converge to non-optimal point and its real performance cannot be well estimated. Recent works try to recover the unbiased result by estimating the proportion of positive samples with mixture proportion estimation (MPE) algorithms, but the model performance is still limited and heavy computational cost is introduced (particularly for big datasets). In this work, we theoretically prove that Area Under Lift curve (AUL) is an unbiased metric in PU learning scenario, and the experimental evaluation on 9 datasets shows that the average absolute error of AUL estimation is only 1/6 of AUC estimation. By experiments we also find that, compared with state-of-the-art AUC-optimization algorithm, AULoptimization algorithm can not only significantly save the computational cost, but also improve the model performance by up to 10%.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Shangchuan Huang;Songtao Wang;Dan Li;Liwei Jiang", "authorids": "~Shangchuan_Huang1;~Songtao_Wang1;tolidan@tsinghua.edu.cn;lavender_jlw@126.com", "gender": ";M;;", "homepage": ";;;", "dblp": ";128/5091;;", "google_scholar": ";;;", "orcid": ";0000-0002-2073-8235;;", "linkedin": ";;;", "or_profile": "~Shangchuan_Huang1;~Songtao_Wang1;tolidan@tsinghua.edu.cn;lavender_jlw@126.com", "aff": ";Tsinghua University;;", "aff_domain": ";tsinghua.edu.cn;;", "position": ";Postdoc;;", "bibtex": "@misc{\nhuang2021aul,\ntitle={{\\{}AUL{\\}} is a better optimization metric in {\\{}PU{\\}} learning},\nauthor={Shangchuan Huang and Songtao Wang and Dan Li and Liwei Jiang},\nyear={2021},\nurl={https://openreview.net/forum?id=2NU7a9AHo-6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer6", "site": "https://openreview.net/forum?id=2NU7a9AHo-6", "pdf_size": 0, "rating": "3;5;5", "confidence": "3;3;4", "wc_review": "823;431;201", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 485.0, 256.785253989918 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.49999999999999983, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8444775600735572399&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Tsinghua University", "aff_unique_dep": "", "aff_unique_url": "https://www.tsinghua.edu.cn", "aff_unique_abbr": "THU", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "2OcEd8jSvR", "title": "An Euler-based GAN for time series", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "A new model of generative adversarial networks for time series based on Euler scheme and Wasserstein distances including Sinkhorn divergence is proposed. Euler scheme improves stability of learning, provides meaningful learning parameters such as drift and volatility while allowing the representation of a large class of processes. We test our Euler GAN generations with usual Monte Carlo simulations in one-dimension and in a multi-dimensional case. We show how the proposed methodology can be combined with transfer learning to include the latest historical dataset features. The approach is tested on financial indicators computation on S\\&P500 and on an option hedging problem. ", "keywords": "Euler GAN;GAN;time series;Wasserstein;Sinkhorn divergence;transfer learning", "primary_area": "", "supplementary_material": "", "author": "Carl Remlinger;Joseph Mickael;Romuald Elie", "authorids": "~Carl_Remlinger1;joseph.mikael@edf.fr;~Romuald_Elie1", "gender": ";;M", "homepage": "http://lama.u-pem.fr/membres/remlinger.carl;;", "dblp": ";;https://dblp.uni-trier.de/pers/hd/e/Elie:Romuald", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Carl_Remlinger1;joseph.mikael@edf.fr;~Romuald_Elie1", "aff": "EPFL - EPF Lausanne;;Google", "aff_domain": "epfl.ch;;google.com", "position": "Researcher;;Research Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer5", "site": "https://openreview.net/forum?id=2OcEd8jSvR", "pdf_size": 0, "rating": "3;3;3;5;5", "confidence": "4;3;5;4;3", "wc_review": "516;201;562;224;292", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 3.8, 0.9797958971132712 ], "confidence_avg": [ 3.8, 0.7483314773547882 ], "wc_review_avg": [ 359.0, 150.68908387803015 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3273268353539886, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:qrsz6D3CF0UJ:scholar.google.com/&scioq=An+Euler-based+GAN+for+time+series&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "EPFL;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.epfl.ch;https://www.google.com", "aff_unique_abbr": "EPFL;Google", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Lausanne;Mountain View", "aff_country_unique_index": "0;1", "aff_country_unique": "Switzerland;United States" }, { "id": "2V1ATRzaZQU", "title": "Characterizing Lookahead Dynamics of Smooth Games", "track": "main", "status": "Reject", "tldr": "", "abstract": "As multi-agent systems proliferate in machine learning research, games have attracted much attention as a framework to understand optimization of multiple interacting objectives. However, a key challenge in game optimization is that, in general, there is no guarantee for usual gradient-based methods to converge to a local solution of the game. The latest work by Chavdarova et al. (2020) report that Lookahead optimizer (Zhang et al. (2019)) significantly improves the performance of Generative Adversarial Networks (GANs) and reduces the rotational force of bilinear games. While promising, their observations were purely empirical, and Lookahead optimization of smooth games still lacks theoretical understanding. In this paper, we fill this gap by theoretically characterizing Lookahead dynamics of smooth games. We provide an intuitive geometric explanation on how and when Lookahead can improve game dynamics in terms of stability and convergence. Furthermore, we present sufficient conditions under which Lookahead optimization of bilinear games provably stabilizes or accelerates convergence to a Nash equilibrium of the game. Finally, we show that Lookahead optimizer preserves locally asymptotically stable equilibria of base dynamics and can either stabilize or accelerate the local convergence to a given equilibrium with proper assumptions. We verify each of our theoretical predictions by conducting numerical experiments on two-player zero-sum (non-linear) games.", "keywords": "Lookahead optimizer;game dynamics;smooth game", "primary_area": "", "supplementary_material": "/attachment/3a6cccefdfda72d8e92d1bc08e3235eaf55854c7.zip", "author": "Junsoo Ha;Gunhee Kim", "authorids": "~Junsoo_Ha1;~Gunhee_Kim1", "gender": "M;M", "homepage": "http://hajunsoo.org/resume;http://vision.snu.ac.kr/gunhee/", "dblp": ";45/115", "google_scholar": ";https://scholar.google.co.kr/citations?user=CiSdOV0AAAAJ", "orcid": ";0000-0002-9543-7453", "linkedin": ";", "or_profile": "~Junsoo_Ha1;~Gunhee_Kim1", "aff": "Seoul National University;Seoul National University", "aff_domain": "snu.ac.kr;snu.ac.kr", "position": "MS student;Full Professor", "bibtex": "@misc{\nha2021characterizing,\ntitle={Characterizing Lookahead Dynamics of Smooth Games},\nauthor={Junsoo Ha and Gunhee Kim},\nyear={2021},\nurl={https://openreview.net/forum?id=2V1ATRzaZQU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=2V1ATRzaZQU", "pdf_size": 0, "rating": "4;4;7;9", "confidence": "4;3;4;4", "wc_review": "581;344;856;198", "wc_reply_reviewers": "0;0;127;0", "wc_reply_authors": "729;594;1174;47", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;3;1", "rating_avg": [ 6.0, 2.1213203435596424 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 494.75, 249.36256234647573 ], "wc_reply_reviewers_avg": [ 31.75, 54.99261314031185 ], "wc_reply_authors_avg": [ 636.0, 402.11254643445284 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5443310539518174, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:MAS220FyT-kJ:scholar.google.com/&scioq=Characterizing+Lookahead+Dynamics+of+Smooth+Games&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Seoul National University", "aff_unique_dep": "", "aff_unique_url": "https://www.snu.ac.kr", "aff_unique_abbr": "SNU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "title": "Learning with Instance-Dependent Label Noise: A Sample Sieve Approach", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2755", "id": "2VXyy9mIyU3", "poster": "", "openreview": "https://openreview.net/forum?id=2VXyy9mIyU3", "slides": "https://iclr.cc/virtual/2021/poster/2755", "video": "https://iclr.cc/virtual/2021/poster/2755", "author_site": "Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, Yang Liu", "tldr": "", "abstract": "Human-annotated labels are often prone to noise, and the presence of such noise will degrade the performance of the resulting deep neural network (DNN) models. Much of the literature (with several recent exceptions) of learning with noisy labels focuses on the case when the label noise is independent of features. Practically, annotations errors tend to be instance-dependent and often depend on the difficulty levels of recognizing a certain task. Applying existing results from instance-independent settings would require a significant amount of estimation of noise rates. Therefore, providing theoretically rigorous solutions for learning with instance-dependent label noise remains a challenge. In this paper, we propose CORES$^{2}$ (COnfidence REgularized Sample Sieve), which progressively sieves out corrupted examples. The implementation of CORES$^{2}$ does not require specifying noise rates and yet we are able to provide theoretical guarantees of CORES$^{2}$ in filtering out the corrupted examples. This high-quality sample sieve allows us to treat clean examples and the corrupted ones separately in training a DNN solution, and such a separation is shown to be advantageous in the instance-dependent noise setting. We demonstrate the performance of CORES$^{2}$ on CIFAR10 and CIFAR100 datasets with synthetic instance-dependent label noise and Clothing1M with real-world human noise. As of independent interests, our sample sieve provides a generic machinery for anatomizing noisy datasets and provides a flexible interface for various robust training techniques to further improve the performance. Code is available at https://github.com/UCSC-REAL/cores.", "keywords": "Learning with noisy labels;instance-based label noise;deep neural networks.", "primary_area": "", "supplementary_material": "/attachment/9984bd08efee5085e36b3b47bfbd6b13767247d8.zip", "author": "Hao Cheng;Zhaowei Zhu;Xingyu Li;Yifei Gong;Xing Sun;Yang Liu", "authorids": "~Hao_Cheng5;~Zhaowei_Zhu1;~Xingyu_Li2;~Yifei_Gong1;~Xing_Sun1;~Yang_Liu3", "gender": "M;M;M;;M;M", "homepage": "https://haochenglouis.github.io;https://www.zzw.ai;https://users.soe.ucsc.edu/~xli279/;https://www.yifeigong.me;https://www.sunxing.org;http://www.yliuu.com", "dblp": ";202/1712;45/2385;;;51/3710-18", "google_scholar": "ftlVqVIAAAAJ;YS8pSQoAAAAJ;;;IUtix9IAAAAJ;jKrIVCIAAAAJ", "orcid": "0000-0001-8864-7818;0000-0003-3894-5862;0000-0002-0043-316X;;0000-0001-8132-9083;0000-0001-8420-6011", "linkedin": ";;xingyu-li-588814164/;;sunxings/;", "or_profile": "~Hao_Cheng5;~Zhaowei_Zhu1;~Xingyu_Li2;~Yifei_Gong1;~Xing_Sun1;~Yang_Liu3", "aff": "Tencent Youtu Lab;University of California, Santa Cruz;Shanghai Center for Brain Science and Brain-Inspired Technology;;Tencent YouTu Lab;University of California, Santa Cruz", "aff_domain": "tencent.com;ucsc.edu;bsbii.cn;;tencent.com;ucsc.edu", "position": "Researcher;PhD student;Postdoc;;Principal Researcher;Assistant Professor", "bibtex": "@inproceedings{\ncheng2021learning,\ntitle={Learning with Instance-Dependent Label Noise: A Sample Sieve Approach},\nauthor={Hao Cheng and Zhaowei Zhu and Xingyu Li and Yifei Gong and Xing Sun and Yang Liu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=2VXyy9mIyU3}\n}", "github": "[![github](/images/github_icon.svg) UCSC-REAL/cores](https://github.com/UCSC-REAL/cores)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;8", "confidence": "4;4;5", "wc_review": "786;448;378", "wc_reply_reviewers": "0;259;0", "wc_reply_authors": "1174;2603;388", "reply_reviewers": "0;1;0", "reply_authors": "2;5;1", "rating_avg": [ 6.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 537.3333333333334, 178.14101779832242 ], "wc_reply_reviewers_avg": [ 86.33333333333333, 122.09377088487722 ], "wc_reply_authors_avg": [ 1388.3333333333333, 916.8825200403569 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.6666666666666665, 1.699673171197595 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.9999999999999997, "gs_citation": 255, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1816427362683189606&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 13, "pdf": "https://openreview.net/pdf?id=2VXyy9mIyU3", "email": "tencent.com;ucsc.edu;bsbii.cn;;tencent.com;ucsc.edu", "author_num": 6, "aff_unique_index": "0;1;2;0;1", "aff_unique_norm": "Tencent;University of California, Santa Cruz;Shanghai Center for Brain Science and Brain-Inspired Technology", "aff_unique_dep": "Youtu Lab;;", "aff_unique_url": "https://www.tencent.com;https://www.ucsc.edu;", "aff_unique_abbr": "Tencent;UCSC;", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Santa Cruz", "aff_country_unique_index": "0;1;0;0;1", "aff_country_unique": "China;United States" }, { "id": "2_Z6MECjPEa", "title": "Emergent Properties of Foveated Perceptual Systems", "track": "main", "status": "Reject", "tldr": "", "abstract": "We introduce foveated perceptual systems -- a hybrid architecture inspired by human vision, to explore the role of a \\textit{texture-based} foveation stage on the nature and robustness of subsequently learned visual representation in machines. Specifically, these two-stage perceptual systems first foveate an image, inducing a texture-like encoding of peripheral information -- mimicking the effects of \\textit{visual crowding} -- which is then relayed through a convolutional neural network (CNN) trained to perform scene categorization. We find that these foveated perceptual systems learn a visual representation that is \\textit{distinct} from their non-foveated counterpart through experiments that probe: 1) i.i.d and o.o.d generalization; 2) robustness to occlusion; 3) a center image bias; and 4) high spatial frequency sensitivity. In addition, we examined the impact of this foveation transform with respect to two additional models derived with a rate-distortion optimization procedure to compute matched-resource systems: a lower resolution non-foveated system, and a foveated system with adaptive Gaussian blurring. The properties of greater i.i.d generalization, high spatial frequency sensitivity, and robustness to occlusion emerged exclusively in our foveated texture-based models, independent of network architecture and learning dynamics. Altogether, these results demonstrate that foveation -- via peripheral texture-based computations -- yields a distinct and robust representational format of scene information relative to standard machine vision approaches, and also provides symbiotic computational support that texture-based peripheral encoding has important representational consequences for processing in the human visual system.\n", "keywords": "Hybrid Perceptual Systems;Foveation;Visual Crowding;Texture;Two-stage models", "primary_area": "", "supplementary_material": "", "author": "Arturo Deza;Talia Konkle", "authorids": "~Arturo_Deza1;talia_konkle@harvard.edu", "gender": "M;", "homepage": "http://arturodeza.wikidot.com/;", "dblp": "160/8606;", "google_scholar": "KZLsTmQAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Arturo_Deza1;talia_konkle@harvard.edu", "aff": "Massachusetts Institute of Technology;", "aff_domain": "mit.edu;", "position": "Postdoc;", "bibtex": "@misc{\ndeza2021emergent,\ntitle={Emergent Properties of Foveated Perceptual Systems},\nauthor={Arturo Deza and Talia Konkle},\nyear={2021},\nurl={https://openreview.net/forum?id=2_Z6MECjPEa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=2_Z6MECjPEa", "pdf_size": 0, "rating": "3;5;7;7;7", "confidence": "4;4;3;3;4", "wc_review": "697;406;249;116;703", "wc_reply_reviewers": "328;0;0;0;0", "wc_reply_authors": "3960;0;1292;187;1408", "reply_reviewers": "3;0;0;0;0", "reply_authors": "7;0;2;1;3", "rating_avg": [ 5.8, 1.6 ], "confidence_avg": [ 3.6, 0.4898979485566356 ], "wc_review_avg": [ 434.2, 235.65347440680776 ], "wc_reply_reviewers_avg": [ 65.6, 131.2 ], "wc_reply_authors_avg": [ 1369.4, 1413.6488389978608 ], "reply_reviewers_avg": [ 0.6, 1.2000000000000002 ], "reply_authors_avg": [ 2.6, 2.4166091947189146 ], "replies_avg": [ 23, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.6123724356957946, "gs_citation": 55, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7829917199110081247&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "Massachusetts Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://web.mit.edu", "aff_unique_abbr": "MIT", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "2aw7TEq5jo", "title": "Width transfer: on the (in)variance of width optimization", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Optimizing the channel counts for different layers of a convolutional neural net- work (CNN) to achieve better accuracy without increasing the number of floating- point operations (FLOPs) required during the forward pass at test time is known as CNN width optimization. Prior work on width optimization has cast it as a hyperparameter optimization problem, which introduces large computational overhead (e.g., an additional 2\u00d7 FLOPs of standard training). Minimizing this overhead could therefore significantly speed up training. With that in mind, this paper sets out to empirically understand width optimization by sensitivity analysis. Specifically, we consider the following research question: \u201cDo similar training configurations for a width optimization algorithm also share similar optimized widths?\u201d If this in fact is the case, it suggests that one can find a proxy training configuration requiring fewer FLOPs to reduce the width optimization overhead. Scientifically, it also suggests that similar training configurations share common architectural structure, which may be harnessed to build better methods. To this end, we control the training configurations, i.e., network architectures and training data, for three existing width optimization algorithms and find that the optimized widths are largely transferable across settings. Per our analysis, we can achieve up to 320\u00d7 reduction in width optimization overhead without compromising the top-1 accuracy on ImageNet. Our findings not only suggest an efficient way to conduct width optimization, but also highlight that the widths that lead to better accuracy are invariant to various aspects of network architectures and training data.", "keywords": "Channel Optimization;Channel Pruning;Neural Architecture Search;Convolutional Neural Network;Image Classification", "primary_area": "", "supplementary_material": "", "author": "Rudy Chin;Diana Marculescu;Ari S. Morcos", "authorids": "~Rudy_Chin2;~Diana_Marculescu4;~Ari_S._Morcos1", "gender": ";;", "homepage": ";;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Rudy_Chin2;~Diana_Marculescu4;~Ari_S._Morcos1", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=2aw7TEq5jo", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "4;4;4;5", "wc_review": "230;280;458;208", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 294.0, 98.21405194777374 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7916982184897229925&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10 }, { "id": "2bw8QFtPAZD", "title": "IF-Defense: 3D Adversarial Point Cloud Defense via Implicit Function based Restoration", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Point cloud is an important 3D data representation widely used in many essential applications. Leveraging deep neural networks, recent works have shown great success in processing 3D point clouds. However, those deep neural networks are vulnerable to various 3D adversarial attacks, which can be summarized as two primary types: point perturbation that affects local point distribution, and surface distortion that causes dramatic changes in geometry. In this paper, we propose a novel 3D adversarial point cloud defense method leveraging implicit function based restoration (IF-Defense) to address both the aforementioned attacks. It is composed of two steps: 1) it predicts an implicit function that captures the clean shape through a surface recovery module, and 2) restores a clean and complete point cloud via minimizing the difference between the attacked point cloud and the predicted implicit function under geometry- and distribution- aware constraints. Our experimental results show that IF-Defense achieves the state-of-the-art defense performance against all existing adversarial attacks on PointNet, PointNet++, DGCNN and PointConv. Comparing with previous methods, IF-Defense presents 20.02% improvement in classification accuracy against salient point dropping attack and 16.29% against LG-GAN attack on PointNet.", "keywords": "Point cloud;adversarial defense;implicit function", "primary_area": "", "supplementary_material": "/attachment/8d985f0d9069922b6106e677047db7d6045bfb9d.zip", "author": "Ziyi Wu;Yueqi Duan;He Wang;Qingnan Fan;Leonidas Guibas", "authorids": "~Ziyi_Wu1;~Yueqi_Duan1;~He_Wang5;~Qingnan_Fan2;~Leonidas_Guibas1", "gender": "M;M;M;M;M", "homepage": "https://wuziyi616.github.io/;https://duanyueqi.github.io/;https://hughw19.github.io;https://fqnchina.github.io/;http://geometry.stanford.edu/", "dblp": "217/8678;168/8373;01/6368-10;;g/LeonidasJGuibas", "google_scholar": "iopH6wIAAAAJ;qDseo3cAAAAJ;roCAWkoAAAAJ;;https://scholar.google.com.tw/citations?user=5JlEyTAAAAAJ", "orcid": "0000-0002-8247-5872;;;;", "linkedin": ";;;;", "or_profile": "~Ziyi_Wu1;~Yueqi_Duan1;~He_Wang5;~Qingnan_Fan2;~Leonidas_Guibas1", "aff": "Tsinghua University;Stanford University;Stanford University;Stanford University;Stanford University", "aff_domain": "tsinghua.edu.cn;stanford.edu;stanford.edu;stanford.edu;stanford.edu", "position": "Undergrad student;Postdoc;PhD student;Postdoc;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=2bw8QFtPAZD", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "3;5;4;3", "wc_review": "496;498;274;279", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 386.75, 110.2664386837627 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0909090909090909, "gs_citation": 76, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15266753842120083687&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;1;1;1", "aff_unique_norm": "Tsinghua University;Stanford University", "aff_unique_dep": ";", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.stanford.edu", "aff_unique_abbr": "THU;Stanford", "aff_campus_unique_index": "1;1;1;1", "aff_campus_unique": ";Stanford", "aff_country_unique_index": "0;1;1;1;1", "aff_country_unique": "China;United States" }, { "id": "2d34y5bRWxB", "title": "Regularization Cocktails for Tabular Datasets", "track": "main", "status": "Reject", "tldr": "", "abstract": "The regularization of prediction models is arguably the most crucial ingredient that allows Machine Learning solutions to generalize well on unseen data. Several types of regularization are popular in the Deep Learning community (e.g., weight decay, drop-out, early stopping, etc.), but so far these are selected on an ad-hoc basis, and there is no systematic study as to how different regularizers should be combined into the best \u201ccocktail\u201d. In this paper, we fill this gap, by considering the cocktails of 13 different regularization methods and framing the question of how to best combine them as a standard hyperparameter optimization problem. We perform a large-scale empirical study on 40 tabular datasets, concluding that, firstly, regularization cocktails substantially outperform individual regularization methods, even if the hyperparameters of the latter are carefully tuned; secondly, the optimal regularization cocktail depends on the dataset; and thirdly, regularization cocktails yield the state-of-the-art in classifying tabular datasets by outperforming Gradient-Boosted Decision Trees.", "keywords": "deep learning;regularization;hyperparameter optimization;benchmarks.", "primary_area": "", "supplementary_material": "/attachment/0d8d077b5f848be4ed2569148097971c9400b54e.zip", "author": "Arlind Kadra;Marius Lindauer;Frank Hutter;Josif Grabocka", "authorids": "~Arlind_Kadra1;~Marius_Lindauer1;~Frank_Hutter1;~Josif_Grabocka1", "gender": "M;M;M;M", "homepage": ";https://www.ai.uni-hannover.de/de/institut/team/lindauer;http://ml.informatik.uni-freiburg.de/~hutter/;https://www.utn.de/departments/department-engineering/machine-learning-lab/", "dblp": "252/5295;28/9142;89/5383;117/4936", "google_scholar": "bMa0KUcAAAAJ;https://scholar.google.de/citations?user=0Sxx7DUAAAAJ;https://scholar.google.de/citations?user=YUrxwrkAAAAJ;KRy27XcAAAAJ", "orcid": "0000-0001-9308-6576;;0000-0002-2037-3694;", "linkedin": ";;frank-hutter-9190b24b/;", "or_profile": "~Arlind_Kadra1;~Marius_Lindauer1;~Frank_Hutter1;~Josif_Grabocka1", "aff": "Universit\u00e4t Freiburg;Leibniz Universit\u00e4t Hannover;Albert-Ludwigs-Universit\u00e4t Freiburg;Universit\u00e4t Freiburg", "aff_domain": "uni-freiburg.de;uni-hannover.de;uni-freiburg.de;uni-freiburg.de", "position": "PhD student;Associate Professor;Full Professor;Assistant Professor", "bibtex": "@misc{\nkadra2021regularization,\ntitle={Regularization Cocktails for Tabular Datasets},\nauthor={Arlind Kadra and Marius Lindauer and Frank Hutter and Josif Grabocka},\nyear={2021},\nurl={https://openreview.net/forum?id=2d34y5bRWxB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=2d34y5bRWxB", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "4;5;4;4", "wc_review": "1071;457;329;711", "wc_reply_reviewers": "144;63;0;0", "wc_reply_authors": "969;845;804;786", "reply_reviewers": "1;1;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 642.0, 283.28254446753334 ], "wc_reply_reviewers_avg": [ 51.75, 59.14547742642712 ], "wc_reply_authors_avg": [ 851.0, 71.40378141247143 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:wgE9hwLzDZ4J:scholar.google.com/&scioq=Regularization+Cocktails+for+Tabular+Datasets&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "University of Freiburg;Leibniz Universit\u00e4t Hannover;Albert-Ludwigs-Universit\u00e4t Freiburg", "aff_unique_dep": ";;", "aff_unique_url": "https://www.uni-freiburg.de;https://www.leibniz.uni-hannover.de/;https://www.uni-freiburg.de", "aff_unique_abbr": "Uni Freiburg;LUH;Albert-Ludwigs-Universit\u00e4t", "aff_campus_unique_index": "1", "aff_campus_unique": ";Freiburg", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Germany" }, { "id": "2fadDWoYCUy", "title": "Enabling Efficient On-Device Self-supervised Contrastive Learning by Data Selection", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "This work aims to enable efficient on-device contrastive learning from input streaming data after a model is deployed on edge devices such as robots or unmanned aerial vehicles (UAVs) so that they can adapt to a dynamic new environment for higher accuracy. On the other hand, such data usually does not have any labels, calling for unsupervised learning. Most recently, contrastive learning has demonstrated its great potential in learning visual representation from unlabeled data. However, directly applying it to streaming data requires storing a large dataset on-the-fly, which will quickly drain edge devices\u2019 storage resources. In this paper, we propose a framework to automatically select the most representative data from unlabeled input stream on-the-fly, which only requires the use of a small data buffer for dynamic learning. What is more, considering the fact that the data are not independent and identically distributed (iid) as in the traditional training process, we score new data as they come in by measuring the quality of their representations without requiring any label information, based on which the data in the buffer will be updated. Extensive experiments show that the learning speed and accuracy are greatly improved compared with approaches without data selection.", "keywords": "Contrastive Learning;On-device Training;Data Selection", "primary_area": "", "supplementary_material": "", "author": "Yawen Wu;Zhepeng Wang;Dewen Zeng;Yiyu Shi;Jingtong Hu", "authorids": "~Yawen_Wu1;zhw82@pitt.edu;~Dewen_Zeng1;~Yiyu_Shi1;~Jingtong_Hu1", "gender": "M;;M;M;M", "homepage": "https://sites.google.com/view/yawenwu;;https://scholar.google.com/citations?user=RpJ5nSsAAAAJ&hl=en&authuser=1;;http://www.pitt.edu/~jthu/index.html", "dblp": "230/8649;;;94/5536;37/3401", "google_scholar": "73k09jEAAAAJ;;RpJ5nSsAAAAJ;;OcWo8CYAAAAJ", "orcid": ";;;;0000-0003-4029-4034", "linkedin": "yawenwu06/;;;;", "or_profile": "~Yawen_Wu1;zhw82@pitt.edu;~Dewen_Zeng1;~Yiyu_Shi1;~Jingtong_Hu1", "aff": "University of Pittsburgh;;University of Notre Dame;University of Notre Dame;University of Pittsburgh", "aff_domain": "pitt.edu;;nd.edu;nd.edu;pitt.edu", "position": "PhD student;;PhD student;Full Professor;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=2fadDWoYCUy", "pdf_size": 0, "rating": "4;4;5", "confidence": "4;3;3", "wc_review": "567;387;369", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 441.0, 89.39798655450804 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:8xJgmGuqiZkJ:scholar.google.com/&scioq=Enabling+Efficient+On-Device+Self-supervised+Contrastive+Learning+by+Data+Selection&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "University of Pittsburgh;University of Notre Dame", "aff_unique_dep": ";", "aff_unique_url": "https://www.pitt.edu;https://www.nd.edu", "aff_unique_abbr": "Pitt;Notre Dame", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "2hT6Fbbwh6", "title": "Deep Positive Unlabeled Learning with a Sequential Bias", "track": "main", "status": "Reject", "tldr": "", "abstract": "For many domains, from video stream analytics to human activity recognition, only weakly-labeled datasets are available.\nWorse yet, the given labels are often assigned sequentially, resulting in sequential bias. Current Positive Unlabeled (PU) classifiers, a state-of-the-art family of robust semi-supervised methods, are ineffective under sequential bias. In this work, we propose DeepSPU, the first method to address this sequential bias problem. DeepSPU tackles the two interdependent subproblems of learning both the latent labeling process and the true class likelihoods within one architecture. We achieve this by developing a novel iterative learning strategy aided by theoretically-justified cost terms to avoid collapsing into a naive classifier. Our experimental studies demonstrate that DeepSPU outperforms state-of-the-art methods by over 10% on diverse real-world datasets.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Walter Gerych;Thomas Hartvigsen;Luke Buquicchio;Kavin Chandrasekaran;Hamid Mansoor;Abdulaziz alajaji", "authorids": "~Walter_Gerych2;~Thomas_Hartvigsen1;ljbuquicchio@wpi.edu;kchandrasekaran@wpi.edu;hmansoor@wpi.edu;asalajaji@wpi.edu", "gender": "M;M;;;;", "homepage": "https://waltergerych.github.io/;https://www.tomhartvigsen.com;;;;", "dblp": "237/9060;211/5752;;;;", "google_scholar": "https://scholar.google.com/citations?hl=en;rIjeeRsAAAAJ;;;;", "orcid": ";;;;;", "linkedin": "walter-gerych-84165112b/;;;;;", "or_profile": "~Walter_Gerych2;~Thomas_Hartvigsen1;ljbuquicchio@wpi.edu;kchandrasekaran@wpi.edu;hmansoor@wpi.edu;asalajaji@wpi.edu", "aff": "Worcester Polytechnic Institute;Worcester Polytechnic Institute;;;;", "aff_domain": "wpi.edu;wpi.edu;;;;", "position": "PhD student;PhD student;;;;", "bibtex": "@misc{\ngerych2021deep,\ntitle={Deep Positive Unlabeled Learning with a Sequential Bias},\nauthor={Walter Gerych and Thomas Hartvigsen and Luke Buquicchio and Kavin Chandrasekaran and Hamid Mansoor and Abdulaziz alajaji},\nyear={2021},\nurl={https://openreview.net/forum?id=2hT6Fbbwh6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer6;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=2hT6Fbbwh6", "pdf_size": 0, "rating": "5;5;6", "confidence": "3;5;4", "wc_review": "691;495;587", "wc_reply_reviewers": "204;251;0", "wc_reply_authors": "1008;833;123", "reply_reviewers": "1;1;0", "reply_authors": "2;2;1", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 591.0, 80.06663891201295 ], "wc_reply_reviewers_avg": [ 151.66666666666666, 108.9474899002063 ], "wc_reply_authors_avg": [ 654.6666666666666, 382.67334488946165 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:K5-rq47LEi4J:scholar.google.com/&scioq=Deep+Positive+Unlabeled+Learning+with+a+Sequential+Bias&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Worcester Polytechnic Institute", "aff_unique_dep": "", "aff_unique_url": "https://www.wpi.edu", "aff_unique_abbr": "WPI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "2ioNazs6lvw", "title": "Learning to generate Wasserstein barycenters", "track": "main", "status": "Reject", "tldr": "", "abstract": "Optimal transport is a notoriously difficult problem to solve numerically, with current approaches often remaining intractable for very large scale applications such as those encountered in machine learning. Wasserstein barycenters -- the problem of finding measures in-between given input measures in the optimal transport sense -- is even more computationally demanding. \nBy training a deep convolutional neural network, we improve by a factor of 60 the computational speed of Wasserstein barycenters over the fastest state-of-the-art approach on the GPU, resulting in milliseconds computational times on $512\\times512$ regular grids.\nWe show that our network, trained on Wasserstein barycenters of pairs of measures, generalizes well to the problem of finding Wasserstein barycenters of more than two measures. We validate our approach on synthetic shapes generated via Constructive Solid Geometry as well as on the ``Quick, Draw'' sketches dataset.", "keywords": "Wasserstein barycenters;Optimal Transport", "primary_area": "", "supplementary_material": "", "author": "Julien Lacombe;Julie Digne;Nicolas Courty;Nicolas Bonneel", "authorids": "~Julien_Lacombe1;~Julie_Digne1;~Nicolas_Courty1;~Nicolas_Bonneel1", "gender": ";F;M;M", "homepage": "https://liris.cnrs.fr/en/member-page/julien-lacombe;https://perso.liris.cnrs.fr/julie.digne/;http://people.irisa.fr/Nicolas.Courty/;https://perso.liris.cnrs.fr/nbonneel/", "dblp": ";11/8698;74/4219;95/2472.html", "google_scholar": ";https://scholar.google.fr/citations?user=EOBpDNQAAAAJ;https://scholar.google.fr/citations?user=ibEREjcAAAAJ;https://scholar.google.fr/citations?user=-NXsLG4AAAAJ", "orcid": ";0000-0003-0905-0840;0000-0003-1353-0126;0000-0001-5243-4810", "linkedin": ";;;nicolasbonneel/?originalSubdomain=fr", "or_profile": "~Julien_Lacombe1;~Julie_Digne1;~Nicolas_Courty1;~Nicolas_Bonneel1", "aff": "INSA de Lyon;LIRIS, CNRS;IRISA;CNRS", "aff_domain": "insa-lyon.fr;liris.cnrs.fr;irisa.fr;cnrs.fr", "position": "PhD student;Researcher;Full Professor;Associate Professor", "bibtex": "@misc{\nlacombe2021learning,\ntitle={Learning to generate Wasserstein barycenters},\nauthor={Julien Lacombe and Julie Digne and Nicolas Courty and Nicolas Bonneel},\nyear={2021},\nurl={https://openreview.net/forum?id=2ioNazs6lvw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=2ioNazs6lvw", "pdf_size": 0, "rating": "3;6;7", "confidence": "4;4;3", "wc_review": "282;306;540", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "334;458;410", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 1.699673171197595 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 376.0, 116.37869220780924 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 400.6666666666667, 51.051172584204394 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.6933752452815364, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=560354860689387048&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "INSA de Lyon;CNRS;Institut de Recherche en Informatique et Automatique;Centre National de la Recherche Scientifique", "aff_unique_dep": ";LIRIS;;", "aff_unique_url": "https://www.insa-lyon.fr;https://www.cnrs.fr;https://www.irisa.fr;https://www.cnrs.fr", "aff_unique_abbr": "INSA Lyon;CNRS;IRISA;CNRS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "France" }, { "id": "2isb_482lP", "title": "A Spectral Perspective on Deep Supervised Community Detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this work, we study the behavior of standard models for community detection under spectral manipulations. Through various ablation experiments, we evaluate the impact of bandpass filtering on the numerical performances of a GCN: we empirically show that most of the necessary and used information for nodes classification is contained in the low-frequency domain, and thus contrary to Euclidean graph (e.g., images), high-frequencies are less crucial to community detection. In particular, it is possible to obtain accuracies at a state-of-the-art level with simple classifiers that rely only on a few low frequencies: this is surprising because contrary to GCNs, no cascade of filtering along the graph structure is involved and it indicates that the important spectral components for the supervised community detection task are essentially in the low-frequency domain.", "keywords": "GCN;graph spectrum;stability;graph Laplacian", "primary_area": "", "supplementary_material": "/attachment/c7122851b8d12dfd41eb9de43244d37ceed886de.zip", "author": "Nathan Grinsztajn;Philippe Preux;Edouard Oyallon", "authorids": "~Nathan_Grinsztajn1;~Philippe_Preux1;~Edouard_Oyallon1", "gender": "M;M;", "homepage": "https://nathangrinsztajn.github.io/;https://philippe-preux.codeberg.page;", "dblp": ";16/4835;", "google_scholar": "yVHIYEYAAAAJ;JTXxmeAAAAAJ;", "orcid": "0000-0001-6817-5972;0000-0002-2067-2838;", "linkedin": "nathan-grinsztajn-960379139/?locale=en_US;;", "or_profile": "~Nathan_Grinsztajn1;~Philippe_Preux1;~Edouard_Oyallon1", "aff": "INRIA;Universit\u00e9 de Lille;", "aff_domain": "inria.fr;univ-lille.fr;", "position": "PhD student;Full Professor;", "bibtex": "@misc{\ngrinsztajn2021a,\ntitle={A Spectral Perspective on Deep Supervised Community Detection},\nauthor={Nathan Grinsztajn and Philippe Preux and Edouard Oyallon},\nyear={2021},\nurl={https://openreview.net/forum?id=2isb_482lP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=2isb_482lP", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "4;4;5;2", "wc_review": "523;417;377;213", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "799;538;698;241", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 382.5, 111.45739096174825 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 569.0, 211.00118483079663 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.7894736842105263, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:WmTQ5bz25BoJ:scholar.google.com/&scioq=A+Spectral+Perspective+on+Deep+Supervised+Community+Detection&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "INRIA;Universit\u00e9 de Lille", "aff_unique_dep": ";", "aff_unique_url": "https://www.inria.fr;https://www.univ-lille.fr", "aff_unique_abbr": "INRIA;UdeL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "France" }, { "id": "2kImxCmYBic", "title": "Numeric Encoding Options with Automunge", "track": "main", "status": "Reject", "tldr": "", "abstract": "Mainstream practice in machine learning with tabular data may take for granted that any feature engineering beyond scaling for numeric sets is superfluous in context of deep neural networks. This paper will offer arguments for potential benefits of extended encodings of numeric streams in deep learning by way of a survey of options for numeric transformations as available in the Automunge open source python library platform for tabular data pipelines, where transformations may be applied to distinct columns in \u201cfamily tree\u201d sets with generations and branches of derivations. Automunge transformation options include normalization, binning, noise injection, derivatives, and more. The aggregation of these methods into family tree sets of transformations are demonstrated for use to present numeric features to machine learning in multiple configurations of varying information content, as may be applied to encode numeric sets of unknown interpretation. Experiments demonstrate the realization of a novel generalized solution to data augmentation by noise injection for tabular learning, as may materially benefit model performance in applications with underserved training data.", "keywords": "tabular;feature engineering;preprocessing", "primary_area": "", "supplementary_material": "/attachment/7990b907292d8471e20a67f1b2d787b71cfafd91.zip", "author": "Nicholas Teague", "authorids": "~Nicholas_Teague1", "gender": "M", "homepage": "https://www.automunge.com", "dblp": "314/5998", "google_scholar": "ioqgQwQAAAAJ", "orcid": "0000-0001-6071-5065", "linkedin": "nicholaste/", "or_profile": "~Nicholas_Teague1", "aff": "Automunge", "aff_domain": "automunge.com", "position": "Founder", "bibtex": "@misc{\nteague2021numeric,\ntitle={Numeric Encoding Options with Automunge},\nauthor={Nicholas Teague},\nyear={2021},\nurl={https://openreview.net/forum?id=2kImxCmYBic}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=2kImxCmYBic", "pdf_size": 0, "rating": "2;2;3;3", "confidence": "4;4;4;4", "wc_review": "493;639;239;531", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "811;2415;474;1472", "reply_reviewers": "0;0;0;0", "reply_authors": "1;3;1;2", "rating_avg": [ 2.5, 0.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 475.5, 146.67225368146492 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1293.0, 740.6095462522745 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8191001905951680710&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Automunge", "aff_unique_dep": "", "aff_unique_url": "https://www.automunge.com", "aff_unique_abbr": "", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2725", "id": "2m0g1wEafh", "poster": "", "openreview": "https://openreview.net/forum?id=2m0g1wEafh", "slides": "https://iclr.cc/virtual/2021/poster/2725", "video": "https://iclr.cc/virtual/2021/poster/2725", "author_site": "Taiji Suzuki, Akiyama Shunta", "tldr": "", "abstract": "Establishing a theoretical analysis that explains why deep learning can outperform shallow learning such as kernel methods is one of the biggest issues in the deep learning literature. Towards answering this question, we evaluate excess risk of a deep learning estimator trained by a noisy gradient descent with ridge regularization on a mildly overparameterized neural network, \nand discuss its superiority to a class of linear estimators that includes neural tangent kernel approach, random feature model, other kernel methods, $k$-NN estimator and so on. We consider a teacher-student regression model, and eventually show that {\\it any} linear estimator can be outperformed by deep learning in a sense of the minimax optimal rate especially for a high dimension setting. The obtained excess bounds are so-called fast learning rate which is faster than $O(1/\\sqrt{n})$ that is obtained by usual Rademacher complexity analysis. This discrepancy is induced by the non-convex geometry of the model and the noisy gradient descent used for neural network training provably reaches a near global optimal solution even though the loss landscape is highly non-convex. Although the noisy gradient descent does not employ any explicit or implicit sparsity inducing regularization, it shows a preferable generalization performance that dominates linear estimators.", "keywords": "Excess risk;minimax optimal rate;local Rademacher complexity;fast learning rate;kernel method;linear estimator", "primary_area": "", "supplementary_material": "", "author": "Taiji Suzuki;Shunta Akiyama", "authorids": "~Taiji_Suzuki1;shunta_akiyama@mist.i.u-tokyo.ac.jp", "gender": "M;", "homepage": "http://ibis.t.u-tokyo.ac.jp/suzuki/;", "dblp": "08/312;", "google_scholar": "x8osrBsAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Taiji_Suzuki1;shunta_akiyama@mist.i.u-tokyo.ac.jp", "aff": "The University of Tokyo;", "aff_domain": "tokyo.ac.jp;", "position": "Associate Professor;", "bibtex": "@inproceedings{\nsuzuki2021benefit,\ntitle={Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods},\nauthor={Taiji Suzuki and Shunta Akiyama},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=2m0g1wEafh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;7;8;8", "confidence": "2;4;3;4", "wc_review": "231;534;475;239", "wc_reply_reviewers": "0;0;42;0", "wc_reply_authors": "480;1240;911;261", "reply_reviewers": "0;0;1;0", "reply_authors": "1;2;2;1", "rating_avg": [ 7.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 369.75, 136.38433744385753 ], "wc_reply_reviewers_avg": [ 10.5, 18.186533479473212 ], "wc_reply_authors_avg": [ 723.0, 379.18531089692806 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.6363636363636364, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9409955607225871639&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=2m0g1wEafh", "email": "tokyo.ac.jp;", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "id": "2nm0fGwWBMr", "title": "PanRep: Universal node embeddings for heterogeneous graphs", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning unsupervised node embeddings facilitates several downstream tasks such as node classification and link prediction. A node embedding is universal if it is designed to be used by and benefit various downstream tasks. This work introduces PanRep, a graph neural network (GNN) model, for unsupervised learning of universal node representations for heterogenous graphs. PanRep consists of a GNN encoder that obtains node embeddings and four decoders, each capturing different topological and node feature properties. Abiding to these properties the novel unsupervised framework learns universal embeddings applicable to different downstream tasks. PanRep can be furthered fine-tuned to account for possible limited labels. In this operational setting PanRep is considered as a pretrained model for extracting node embeddings of heterogenous graph data. PanRep outperforms all unsupervised and certain supervised methods in node classification and link prediction, especially when the labeled data for the supervised methods is small. PanRep-FT (with fine-tuning) outperforms all other supervised approaches, which corroborates the merits of pretraining models. Finally, we apply PanRep-FT for discovering novel drugs for Covid-19. We showcase the advantage of universal embeddings in drug repurposing and identify several drugs used in clinical trials as possible drug candidates.", "keywords": "Graph neural networks;universal node embeddings;node classification;link prediction;unsupervised learning", "primary_area": "", "supplementary_material": "", "author": "Vassilis N. Ioannidis;Da Zheng;George Karypis", "authorids": "~Vassilis_N._Ioannidis1;dzzhen@amazon.com;~George_Karypis1", "gender": ";;M", "homepage": "https://scholar.google.com/citations?hl=en&user=mjmiI4sAAAAJ&view_op=list_works&authuser=1;;", "dblp": ";;", "google_scholar": ";;ElqwScwAAAAJ", "orcid": "0000-0002-8367-0733;;", "linkedin": ";;", "or_profile": "~Vassilis_N._Ioannidis1;dzzhen@amazon.com;~George_Karypis1", "aff": "Amazon Web Services;;University of Minnesota, Minneapolis", "aff_domain": "amazon.com;;umn.edu", "position": "Applied Scientist II;;Full Professor", "bibtex": "@misc{\nioannidis2021panrep,\ntitle={PanRep: Universal node embeddings for heterogeneous graphs},\nauthor={Vassilis N. Ioannidis and Da Zheng and George Karypis},\nyear={2021},\nurl={https://openreview.net/forum?id=2nm0fGwWBMr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=2nm0fGwWBMr", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "5;4;4;3", "wc_review": "212;287;483;433", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 353.75, 109.01232728457823 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7452301386704979434&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Amazon;University of Minnesota", "aff_unique_dep": "Amazon Web Services;", "aff_unique_url": "https://aws.amazon.com;https://www.minnesota.edu", "aff_unique_abbr": "AWS;UMN", "aff_campus_unique_index": "1", "aff_campus_unique": ";Minneapolis", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "2oci5kFXE0o", "title": "The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM). To discover compositional structures of the data, we propose the Scattering Compositional Learner (SCL), an architecture that composes neural networks in a sequence. Our SCL achieves state-of-the-art performance on two RPM datasets, with a 48.7% relative improvement on Balanced-RAVEN and 26.4% on PGM over the previous state-of-the-art. We additionally show that our model discovers compositional representations of objects' attributes (e.g., shape color, size), and their relationships (e.g., progression, union). We also find that the compositional representation makes the SCL significantly more robust to test-time domain shifts and greatly improves zero-shot generalization to previously unseen analogies.", "keywords": "Raven's Progressive Matrices;visual analogical reasoning.", "primary_area": "", "supplementary_material": "/attachment/460bd311ab6ff35d922ab2d0598721af498b6a61.zip", "author": "Yuhuai Wu;Honghua Dong;Roger Baker Grosse;Jimmy Ba", "authorids": "~Yuhuai_Wu1;~Honghua_Dong1;~Roger_Baker_Grosse1;~Jimmy_Ba1", "gender": "M;M;M;M", "homepage": "http://www.cs.toronto.edu/~ywu/;https://dhh1995.github.io/;http://www.cs.toronto.edu/~rgrosse/;http://jimmylba.github.io", "dblp": ";238/2646;26/7058;https://dblp.org/pers/b/Ba:Jimmy.html", "google_scholar": "https://scholar.google.ca/citations?user=bOQGfFIAAAAJ;MrGN4oMAAAAJ;xgQd1qgAAAAJ;https://scholar.google.ca/citations?user=ymzxRhAAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Yuhuai_Wu1;~Honghua_Dong1;~Roger_Baker_Grosse1;~Jimmy_Ba1", "aff": "Department of Computer Science, University of Toronto;Tencent Inc.;Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto", "aff_domain": "cs.toronto.edu;tencent.com;cs.toronto.edu;cs.toronto.edu", "position": "PhD student;Intern;Assistant Professor;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=2oci5kFXE0o", "pdf_size": 0, "rating": "4;5;5", "confidence": "5;4;5", "wc_review": "579;383;410", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 457.3333333333333, 86.73458876877719 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 69, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3288415279290154305&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "University of Toronto;Tencent", "aff_unique_dep": "Department of Computer Science;Tencent", "aff_unique_url": "https://www.utoronto.ca;https://www.tencent.com", "aff_unique_abbr": "U of T;Tencent", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Toronto;", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "Canada;China" }, { "id": "2pYMlvmsNaK", "title": "Dual Averaging is Surprisingly Effective for Deep Learning Optimization", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks. However, the choice of the optimizer has become an ad-hoc rule that can significantly affect the performance. For instance, SGD with momentum (SGD+M) is typically used in computer vision (CV) and Adam is used for training transformer models for Natural Language Processing (NLP). Using the wrong method can lead to significant performance degradation. Inspired by the dual averaging algorithm, we propose Modernized Dual Averaging (MDA), an optimizer that is able to perform as well as SGD+M in CV and as Adam in NLP. Our method is not adaptive and is significantly simpler than Adam. We show that MDA induces a decaying uncentered L 2 -regularization compared to vanilla SGD+M and hypothesize that this may explain why it works on NLP problems where SGD+M fails.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/208607bd8a4001707216d82435b6b61a57aa0d5b.zip", "author": "Samy Jelassi;Aaron Defazio", "authorids": "~Samy_Jelassi1;~Aaron_Defazio1", "gender": "M;M", "homepage": "https://sjelassi.github.io/;https://www.aarondefazio.com/", "dblp": "222/3149;116/2969", "google_scholar": ";KEzJsdkAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Samy_Jelassi1;~Aaron_Defazio1", "aff": "Princeton University;Meta", "aff_domain": "princeton.edu;meta.com", "position": "PhD student;Research Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=2pYMlvmsNaK", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "5;3;4;4", "wc_review": "192;331;355;212", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 272.5, 71.36000280269053 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3244428422615251, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9033346556991220551&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Princeton University;Meta", "aff_unique_dep": ";Meta Platforms, Inc.", "aff_unique_url": "https://www.princeton.edu;https://meta.com", "aff_unique_abbr": "Princeton;Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "2rcgRSAa1A3", "title": "Fighting Filterbubbles with Adversarial BERT-Training for News-Recommendation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recommender engines play a role in the emergence and reinforcement of filter bubbles. When these systems learn that a user prefers content from a particular site, the user will be less likely to be exposed to different sources or opinions and, ultimately, is more likely to develop extremist tendencies.\nWe trace the roots of this phenomenon to the way the recommender engine represents news articles. The vectorial features modern systems extract from the plain text of news articles are already highly predictive of the associated news outlet. We propose a new training scheme based on adversarial machine learning to tackle this issue . Our experiments show that the features we can extract this way are significantly less predictive of the news outlet and thus offer the possibility to reduce the risk of manifestation of new filter bubbles. We validate our intuitions in a news recommendation task using a recent attention-based recommendation system.", "keywords": "Adversarial Learning;Natural Language Processing;BERT;News Recommendation;Attention", "primary_area": "", "supplementary_material": "", "author": "Lukas Pfahler;Katharina Morik", "authorids": "~Lukas_Pfahler1;~Katharina_Morik1", "gender": ";F", "homepage": ";", "dblp": "213/1864.html;", "google_scholar": ";", "orcid": "0000-0003-4012-4502;", "linkedin": ";", "or_profile": "~Lukas_Pfahler1;~Katharina_Morik1", "aff": "TU Dortmund;TU Dortmund", "aff_domain": "tu-dortmund.de;tu-dortmund.de", "position": "PhD student;Full Professor", "bibtex": "@misc{\npfahler2021fighting,\ntitle={Fighting Filterbubbles with Adversarial {\\{}BERT{\\}}-Training for News-Recommendation},\nauthor={Lukas Pfahler and Katharina Morik},\nyear={2021},\nurl={https://openreview.net/forum?id=2rcgRSAa1A3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=2rcgRSAa1A3", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "2;4;4;4", "wc_review": "204;362;253;216", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 258.75, 62.28713751650496 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:s6kK-bYVPlAJ:scholar.google.com/&scioq=Fighting+Filterbubbles+with+Adversarial+BERT-Training+for+News-Recommendation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Technische Universit\u00e4t Dortmund", "aff_unique_dep": "", "aff_unique_url": "https://www.tu-dortmund.de", "aff_unique_abbr": "TU Dortmund", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "2wjKRmraNan", "title": "Non-Inherent Feature Compatible Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "The need of Feature Compatible Learning (FCL) arises from many large scale retrieval-based applications, where updating the entire library of embedding vectors is expensive. When an upgraded embedding model shows potential, it is desired to transform the benefit of the new model without refreshing the library. While progresses have been made along this new direction, existing approaches for feature compatible learning mostly rely on old training data and classifiers, which are not available in many industry settings. In this work, we introduce an approach for feature compatible learning without inheriting old classifier and training data, i.e., Non-Inherent Feature Compatible Learning. Our approach requires only features extracted by \\emph{old} model's backbone and \\emph{new} training data, and makes no assumption about the overlap between old and new training data. We propose a unified framework for FCL, and extend it to handle the case where the old model is a black-box. Specifically, we learn a simple pseudo classifier in lieu of the old model, and further enhance it with a random walk algorithm. As a result, the embedding features produced by the new model can be matched with those from the old model without sacrificing performance. Experiments on ImageNet ILSVRC 2012 and Places365 data proved the efficacy of the proposed approach.", "keywords": "Deep Learning;Feature Learning;Compatible Learning", "primary_area": "", "supplementary_material": "", "author": "Yantao Shen;Fanzi Wu;Ying Shan", "authorids": "~Yantao_Shen2;~Fanzi_Wu1;~Ying_Shan2", "gender": "M;;M", "homepage": "https://scholar.google.com.hk/citations?user=bEctTN0AAAAJ&hl=zh-CN;;", "dblp": "86/3372;193/6532;68/5910", "google_scholar": "https://scholar.google.com.hk/citations?user=bEctTN0AAAAJ;;4oXBp9UAAAAJ", "orcid": ";;0000-0001-7673-8325", "linkedin": ";;YingShanProfile/", "or_profile": "~Yantao_Shen2;~Fanzi_Wu1;~Ying_Shan2", "aff": "Tencent ARC;;Tencent PCG ARC Lab", "aff_domain": "tencent.com;;arc.tencent.com", "position": "Researcher;;Director", "bibtex": "@misc{\nshen2021noninherent,\ntitle={Non-Inherent Feature Compatible Learning},\nauthor={Yantao Shen and Fanzi Wu and Ying Shan},\nyear={2021},\nurl={https://openreview.net/forum?id=2wjKRmraNan}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=2wjKRmraNan", "pdf_size": 0, "rating": "2;5;5;6", "confidence": "4;3;4;3", "wc_review": "240;291;192;260", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "476;600;566;400", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 1.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 245.75, 35.960916284210555 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 510.5, 78.24800316941001 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6666666666666667, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:2dTcF6NxNcAJ:scholar.google.com/&scioq=Non-Inherent+Feature+Compatible+Learning&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Tencent", "aff_unique_dep": "Tencent AI Research Center", "aff_unique_url": "https://www.tencent.com", "aff_unique_abbr": "Tencent ARC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "3-a23gHXQmr", "title": "Parametric Density Estimation with Uncertainty using Deep Ensembles", "track": "main", "status": "Reject", "tldr": "", "abstract": "In parametric density estimation, the parameters of a known probability density are typically recovered from measurements by maximizing the log-likelihood. Prior knowledge of measurement uncertainties is not included in this method -- potentially producing degraded or even biased parameter estimates.\nWe propose an efficient two-step, general-purpose approach for parametric density estimation using deep ensembles. \nFeature predictions and their uncertainties are returned by a deep ensemble and then combined in an importance weighted maximum likelihood estimation to recover parameters representing a known density along with their respective errors. To compare the bias-variance tradeoff of different approaches, we define an appropriate figure of merit.\nWe illustrate a number of use cases for our method in the physical sciences and demonstrate state-of-the-art results for X-ray polarimetry that outperform current classical and deep learning methods.", "keywords": "Deep ensembles;deep learning;computer vision;density estimation;uncertainty", "primary_area": "", "supplementary_material": "", "author": "Abel Peirson;Taylor Howell;Marius Aurel Tirlea", "authorids": "~Abel_Peirson1;thowell@stanford.edu;mtirlea@stanford.edu", "gender": ";;", "homepage": "https://www.alpeirson.com;;", "dblp": ";;", "google_scholar": ";;", "orcid": "0000-0001-6292-1911;;", "linkedin": ";;", "or_profile": "~Abel_Peirson1;thowell@stanford.edu;mtirlea@stanford.edu", "aff": "Stanford University;;", "aff_domain": "stanford.edu;;", "position": "PhD student;;", "bibtex": "@misc{\npeirson2021parametric,\ntitle={Parametric Density Estimation with Uncertainty using Deep Ensembles},\nauthor={Abel Peirson and Taylor Howell and Marius Aurel Tirlea},\nyear={2021},\nurl={https://openreview.net/forum?id=3-a23gHXQmr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=3-a23gHXQmr", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "3;3;3;3", "wc_review": "1095;366;289;670", "wc_reply_reviewers": "0;0;0;156", "wc_reply_authors": "942;543;335;675", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;1;2", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 605.0, 316.74200858111635 ], "wc_reply_reviewers_avg": [ 39.0, 67.54998149518622 ], "wc_reply_authors_avg": [ 623.75, 220.1174402449747 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:E586eM9pI_MJ:scholar.google.com/&scioq=Parametric+Density+Estimation+with+Uncertainty+using+Deep+Ensembles&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Stanford University", "aff_unique_dep": "", "aff_unique_url": "https://www.stanford.edu", "aff_unique_abbr": "Stanford", "aff_campus_unique_index": "0", "aff_campus_unique": "Stanford", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "What are the Statistical Limits of Offline RL with Linear Function Approximation?", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2830", "id": "30EvkP2aQLD", "poster": "", "openreview": "https://openreview.net/forum?id=30EvkP2aQLD", "slides": "https://iclr.cc/virtual/2021/poster/2830", "video": "https://iclr.cc/virtual/2021/poster/2830", "author_site": "Ruosong Wang, Dean Foster, Sham M Kakade", "tldr": "", "abstract": "Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making problems. However, the extent to which this broader approach can be effective is not well understood, where the literature largely consists of sufficient conditions.\n\nThis work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning. Perhaps surprisingly, our main result shows that even if: i) we have realizability in that the true value function of \\emph{every} policy is linear in a given set of features and 2) our off-policy data has good coverage over all features (under a strong spectral condition), any algorithm still (information-theoretically) requires a number of offline samples that is exponential in the problem horizon to non-trivially estimate the value of \\emph{any} given policy. Our results highlight that sample-efficient offline policy evaluation is not possible unless significantly stronger conditions hold; such conditions include either having low distribution shift (where the offline data distribution is close to the distribution of the policy to be evaluated) or significantly stronger representational conditions (beyond realizability).", "keywords": "batch reinforcement learning;function approximation;lower bound;representation", "primary_area": "", "supplementary_material": "", "author": "Ruosong Wang;Dean Foster;Sham M. Kakade", "authorids": "~Ruosong_Wang1;~Dean_Foster1;~Sham_M._Kakade1", "gender": "M;M;M", "homepage": "http://www.cs.cmu.edu/~ruosongw/;http://deanfoster.net;https://shamulent.github.io", "dblp": "183/6164;241/9885;s/SMKakade", "google_scholar": "n8ZpnWMAAAAJ;HDzOsYAAAAAJ;https://scholar.google.com.tw/citations?user=wb-DKCIAAAAJ", "orcid": ";;", "linkedin": ";deanfoster/;", "or_profile": "~Ruosong_Wang1;~Dean_Foster1;~Sham_M._Kakade1", "aff": "Carnegie Mellon University;Amazon;", "aff_domain": "cmu.edu;amazon.com;", "position": "PhD student;scientist;", "bibtex": "@inproceedings{\nwang2021what,\ntitle={What are the Statistical Limits of Offline {RL} with Linear Function Approximation?},\nauthor={Ruosong Wang and Dean Foster and Sham M. Kakade},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=30EvkP2aQLD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "7;7;8;8", "confidence": "4;3;3;3", "wc_review": "286;293;307;302", "wc_reply_reviewers": "18;0;26;0", "wc_reply_authors": "721;103;584;32", "reply_reviewers": "1;0;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.5, 0.5 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 297.0, 8.093207028119323 ], "wc_reply_reviewers_avg": [ 11.0, 11.357816691600547 ], "wc_reply_authors_avg": [ 360.0, 297.54411437633917 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 198, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8485667239546352240&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=30EvkP2aQLD", "email": "cmu.edu;amazon.com;", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Carnegie Mellon University;Amazon", "aff_unique_dep": ";Amazon.com, Inc.", "aff_unique_url": "https://www.cmu.edu;https://www.amazon.com", "aff_unique_abbr": "CMU;Amazon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "30I4Azqc_oP", "title": "Deep Reinforcement Learning with Causality-based Intrinsic Reward", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement Learning (RL) has shown great potential to deal with sequential decision-making problems. However, most RL algorithms do not explicitly consider the relations between entities in the environment. This makes the policy learning suffer from the problems of efficiency, effectivity and interpretability. In this paper, we propose a novel deep reinforcement learning algorithm, which firstly learns the causal structure of the environment and then leverages the learned causal information to assist policy learning. The proposed algorithm learns a graph to encode the environmental structure by calculating Average Causal Effect (ACE) between different categories of entities, and an intrinsic reward is given to encourage the agent to interact more with entities belonging to top-ranked categories, which significantly boosts policy learning. Several experiments are conducted on a number of simulation environments to demonstrate the effectiveness and better interpretability of our proposed method.", "keywords": "Reinforcement Learning;Causal Relation", "primary_area": "", "supplementary_material": "", "author": "Peng Zhang;Furui Liu;Zhitang Chen;Jianye HAO;Jun Wang", "authorids": "~Peng_Zhang20;liufurui2@huawei.com;~Zhitang_Chen1;~Jianye_HAO1;~Jun_Wang2", "gender": "M;;M;M;M", "homepage": ";;;http://www.icdai.org/jianye.html;http://www0.cs.ucl.ac.uk/staff/jun.wang/", "dblp": ";;06/10875;21/7664.html;w/JunWang12", "google_scholar": ";;;;https://scholar.google.co.uk/citations?user=wIE1tY4AAAAJ", "orcid": ";;;0000-0002-0422-8235;", "linkedin": "Https://www.linkedin.com/in/pengzhangidal;;;;", "or_profile": "~Peng_Zhang20;liufurui2@huawei.com;~Zhitang_Chen1;~Jianye_HAO1;~Jun_Wang2", "aff": "Tianjin University;;Huawei Technologies Ltd.;Tianjin University;University College London", "aff_domain": "tju.edu.cn;;huawei.com;tju.edu.cn;ucl.ac.uk", "position": "MS student;;Researcher;Associate Professor;Professor", "bibtex": "@misc{\nzhang2021deep,\ntitle={Deep Reinforcement Learning with Causality-based Intrinsic Reward},\nauthor={Peng Zhang and Furui Liu and Zhitang Chen and Jianye HAO and Jun Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=30I4Azqc_oP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=30I4Azqc_oP", "pdf_size": 0, "rating": "5;6;6", "confidence": "3;3;4", "wc_review": "433;486;382", "wc_reply_reviewers": "109;0;0", "wc_reply_authors": "839;645;438", "reply_reviewers": "1;0;0", "reply_authors": "2;1;1", "rating_avg": [ 5.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 433.6666666666667, 42.460439103816256 ], "wc_reply_reviewers_avg": [ 36.333333333333336, 51.383092766222454 ], "wc_reply_authors_avg": [ 640.6666666666666, 163.73623775925583 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:XQ-6yi2dnJ0J:scholar.google.com/&scioq=Deep+Reinforcement+Learning+with+Causality-based+Intrinsic+Reward&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "Tianjin University;Huawei;University College London", "aff_unique_dep": ";Huawei Technologies;", "aff_unique_url": "http://www.tju.edu.cn;https://www.huawei.com;https://www.ucl.ac.uk", "aff_unique_abbr": "TJU;Huawei;UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "China;United Kingdom" }, { "id": "30SS5VjvhrZ", "title": "Bayesian Neural Networks with Variance Propagation for Uncertainty Evaluation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Uncertainty evaluation is a core technique when deep neural networks (DNNs) are used in real-world problems. In practical applications, we often encounter unexpected samples that have not seen in the training process. Not only achieving the high-prediction accuracy but also detecting uncertain data is significant for safety-critical systems. In statistics and machine learning, Bayesian inference has been exploited for uncertainty evaluation. The Bayesian neural networks (BNNs) have recently attracted considerable attention in this context, as the DNN trained using dropout is interpreted as a Bayesian method. Based on this interpretation, several methods to calculate the Bayes predictive distribution for DNNs have been developed. Though the Monte-Carlo method called MC dropout is a popular method for uncertainty evaluation, it requires a number of repeated feed-forward calculations of DNNs with randomly sampled weight parameters. To overcome the computational issue, we propose a sampling-free method to evaluate uncertainty. Our method converts a neural network trained using the dropout to the corresponding Bayesian neural network with variance propagation. Our method is available not only to feed-forward NNs but also to recurrent NNs including LSTM. We report the computational efficiency and statistical reliability of our method in numerical experiments of the language modeling using RNNs, and the out-of-distribution detection with DNNs. ", "keywords": "uncertainty evaluation;sampling-free method;variance propagation;LSTM;out-of-distribution", "primary_area": "", "supplementary_material": "/attachment/03df10d83605ca9bd6586a8e0ac493338d013bcd.zip", "author": "Yuki Mae;Wataru Kumagai;Takafumi Kanamori", "authorids": "~Yuki_Mae1;~Wataru_Kumagai2;~Takafumi_Kanamori1", "gender": ";M;M", "homepage": ";https://sites.google.com/site/watarukumagaiswebpage/;", "dblp": ";;76/6882", "google_scholar": ";https://scholar.google.co.jp/citations?user=rd5MEO8AAAAJ;", "orcid": "0000-0002-8150-8660;;", "linkedin": ";;", "or_profile": "~Yuki_Mae1;~Wataru_Kumagai2;~Takafumi_Kanamori1", "aff": "DENSO CORPORATION;Omron Sinic X;Tokyo Institute of Technology", "aff_domain": "denso.com;sinicx.com;titech.ac.jp", "position": "Engineer;Researcher;Full Professor", "bibtex": "@misc{\nmae2021bayesian,\ntitle={Bayesian Neural Networks with Variance Propagation for Uncertainty Evaluation},\nauthor={Yuki Mae and Wataru Kumagai and Takafumi Kanamori},\nyear={2021},\nurl={https://openreview.net/forum?id=30SS5VjvhrZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=30SS5VjvhrZ", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "3;4;4;4", "wc_review": "684;340;553;1018", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "294;435;264;388", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 648.75, 246.0095272545354 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 345.25, 69.12081813751918 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:GCqytt9Yn-gJ:scholar.google.com/&scioq=Bayesian+Neural+Networks+with+Variance+Propagation+for+Uncertainty+Evaluation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "DENSO Corporation;OMRON Corporation;Tokyo Institute of Technology", "aff_unique_dep": ";Sinic X Division;", "aff_unique_url": "https://www.denso.com;https://www.omron.com;https://www.titech.ac.jp", "aff_unique_abbr": "DENSO;Omron;Titech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Japan" }, { "id": "32B5lOqZUiO", "title": "Pareto-Frontier-aware Neural Architecture Search", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Designing feasible and effective architectures is essential for deploying deep models to real-world scenarios. In practice, one has to consider multiple objectives (e.g., model performance and computational cost) and diverse constraints incurred by different computation resources. To address this, most methods seek to find promising architectures via optimizing a well pre-defined utility function. However, it is often non-trivial to design an ideal function that could well trade-off different objectives. More critically, in many real scenarios, even for the same platform, we may have different applications with various latency budgets. To find promising architectures under different budgets, existing methods may have to perform an independent search for each budget, which is very inefficient and unnecessary. Nevertheless, it would be fantastic if we can produce multiple promising architectures to fulfill each budget in the same search process. In this paper, we propose a Pareto-Frontier-aware Neural Architecture Search (PFNAS) method which seeks to learn the Pareto frontier (i.e., the set of Pareto optimal architectures) w.r.t. multiple objectives. Here, we formulate the Pareto frontier learning problem as a Markov decision process (MDP). Relied on the MDP, we transform and absorb the objectives other than model performance into the constraints. To learn the whole Pareto frontier, we propose to find a set of Pareto optimal architectures which are uniformly distributed on the range of budget to form a frontier. Based on the learned frontier, we are able to easily find multiple promising architectures to fulfill all considered constraints in the same search process. Extensive experiments on three hardware platforms (i.e., mobile, CPU, and GPU) show that the searched architectures by our PFNAS outperform the ones obtained by existing methods under different budgets.", "keywords": "Neural Architecture Search;Pareto Frontier Learning;Resource Constraint", "primary_area": "", "supplementary_material": "/attachment/0336c49ef1ae5dcf1ad512cea24a27c5589e1040.zip", "author": "Yong Guo;Yaofo Chen;Yin Zheng;Peilin Zhao;Jian Chen;Junzhou Huang;Mingkui Tan", "authorids": "~Yong_Guo1;chenyaofo@gmail.com;~Yin_Zheng1;~Peilin_Zhao2;ellachen@scut.edu.cn;~Junzhou_Huang2;~Mingkui_Tan2", "gender": "M;;M;;;M;", "homepage": "http://www.guoyongcs.com/;;;;;http://ranger.uta.edu/~huang/;", "dblp": ";;120/7090;84/8411;;22/1170.html;", "google_scholar": "https://scholar.google.com/citations?hl=en;;cibWNZIAAAAJ;https://scholar.google.com.hk/citations?user=HPeX_YcAAAAJ;;https://scholar.google.com.tw/citations?user=X7KrguAAAAAJ;", "orcid": "0000-0002-3444-4588;;;0000-0001-8543-3953;;0000-0002-9548-1227;", "linkedin": ";;;;;;", "or_profile": "~Yong_Guo1;chenyaofo@gmail.com;~Yin_Zheng1;~Peilin_Zhao2;ellachen@scut.edu.cn;~Junzhou_Huang2;~Mingkui_Tan2", "aff": "South China University of Technology;;Weixin Group, Tencent;Tencent;;University of Texas at Arlington;", "aff_domain": "scut.edu.cn;;tencent.com;tencent.com;;uta.edu;", "position": "PhD student;;Researcher;Researcher;;Associate Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=32B5lOqZUiO", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "3;5;5;5", "wc_review": "572;375;412;268", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 406.75, 109.08110514658348 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:8VGnndLgBW0J:scholar.google.com/&scioq=Pareto-Frontier-aware+Neural+Architecture+Search&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "South China University of Technology;Tencent;University of Texas at Arlington", "aff_unique_dep": ";Weixin Group;", "aff_unique_url": "https://www.scut.edu.cn;https://www.tencent.com;https://www.uta.edu", "aff_unique_abbr": "SCUT;Tencent;UTA", "aff_campus_unique_index": "1", "aff_campus_unique": ";Arlington", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "China;United States" }, { "id": "33TBJachvOX", "title": "How to compare adversarial robustness of classifiers from a global perspective", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adversarial robustness of machine learning models has attracted considerable attention over recent years. Adversarial attacks undermine the reliability of and trust in machine learning models, but the construction of more robust models hinges on a rigorous understanding of adversarial robustness as a property of a given model. Point-wise measures for specific threat models are currently the most popular tool for comparing the robustness of classifiers and are used in most recent publications on adversarial robustness. In this work, we use robustness curves to show that point-wise measures fail to capture important global properties that are essential to reliably compare the robustness of different classifiers. We introduce new ways in which robustness curves can be used to systematically uncover these properties and provide concrete recommendations for researchers and practitioners when assessing and comparing the robustness of trained models. Furthermore, we characterize scale as a way to distinguish small and large perturbations, and relate it to inherent properties of data sets, demonstrating that robustness thresholds must be chosen accordingly. We hope that our work contributes to a shift of focus away from point-wise measures of robustness and towards a discussion of the question what kind of robustness could and should reasonably be expected. We release code to reproduce all experiments presented in this paper, which includes a Python module to calculate robustness curves for arbitrary data sets and classifiers, supporting a number of frameworks, including TensorFlow, PyTorch and JAX.", "keywords": "adversarial robustness;robustness;adversarial defense;adversarial example", "primary_area": "", "supplementary_material": "", "author": "Niklas Risse;Jan Philip G\u00f6pfert;Christina G\u00f6pfert", "authorids": "~Niklas_Risse1;jgoepfert@techfak.uni-bielefeld.de;~Christina_G\u00f6pfert1", "gender": "M;;F", "homepage": ";;", "dblp": ";;https://dblp.org/pers/g/G=ouml=pfert:Christina.html", "google_scholar": "Y2AGtmYAAAAJ;;S6jFnW8AAAAJ", "orcid": ";;0000-0003-2517-4907", "linkedin": ";;", "or_profile": "~Niklas_Risse1;jgoepfert@techfak.uni-bielefeld.de;~Christina_G\u00f6pfert1", "aff": "Bielefeld University;;", "aff_domain": "uni-bielefeld.de;;", "position": "MS student;;", "bibtex": "@misc{\nrisse2021how,\ntitle={How to compare adversarial robustness of classifiers from a global perspective},\nauthor={Niklas Risse and Jan Philip G{\\\"o}pfert and Christina G{\\\"o}pfert},\nyear={2021},\nurl={https://openreview.net/forum?id=33TBJachvOX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=33TBJachvOX", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;4;3;3", "wc_review": "389;803;195;279", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "641;1103;246;492", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 416.5, 233.50963577548572 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 620.5, 312.24549636463934 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12542213818116075161&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "Bielefeld University", "aff_unique_dep": "", "aff_unique_url": "https://www.uni-bielefeld.de/", "aff_unique_abbr": "Uni Bielefeld", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "title": "Effective and Efficient Vote Attack on Capsule Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2711", "id": "33rtZ4Sjwjn", "poster": "", "openreview": "https://openreview.net/forum?id=33rtZ4Sjwjn", "slides": "https://iclr.cc/virtual/2021/poster/2711", "video": "https://iclr.cc/virtual/2021/poster/2711", "author_site": "Jindong Gu, Baoyuan Wu, Volker Tresp", "tldr": "", "abstract": "Standard Convolutional Neural Networks (CNNs) can be easily fooled by images with small quasi-imperceptible artificial perturbations. As alternatives to CNNs, the recently proposed Capsule Networks (CapsNets) are shown to be more robust to white-box attack than CNNs under popular attack protocols. Besides, the class-conditional reconstruction part of CapsNets is also used to detect adversarial examples. In this work, we investigate the adversarial robustness of CapsNets, especially how the inner workings of CapsNets change when the output capsules are attacked. The first observation is that adversarial examples misled CapsNets by manipulating the votes from primary capsules. Another observation is the high computational cost, when we directly apply multi-step attack methods designed for CNNs to attack CapsNets, due to the computationally expensive routing mechanism. Motivated by these two observations, we propose a novel vote attack where we attack votes of CapsNets directly. Our vote attack is not only effective, but also efficient by circumventing the routing process. Furthermore, we integrate our vote attack into the detection-aware attack paradigm, which can successfully bypass the class-conditional reconstruction based detection method. Extensive experiments demonstrate the superior attack performance of our vote attack on CapsNets.", "keywords": "Capsule Networks;Adversarial Attacks;Adversarial Example Detection", "primary_area": "", "supplementary_material": "", "author": "Jindong Gu;Baoyuan Wu;Volker Tresp", "authorids": "~Jindong_Gu1;~Baoyuan_Wu1;~Volker_Tresp1", "gender": ";M;M", "homepage": ";https://sites.google.com/site/baoyuanwu2015/;https://www.dbs.ifi.lmu.de/~tresp/", "dblp": ";73/7781;t/VolkerTresp", "google_scholar": ";JNTG1KoAAAAJ;xIJHTUwAAAAJ", "orcid": ";0000-0003-2183-5990;0000-0001-9428-3686", "linkedin": ";;volker-tresp-8110a118/", "or_profile": "~Jindong_Gu1;~Baoyuan_Wu1;~Volker_Tresp1", "aff": ";The Chinese University of Hong Kong, Shenzhen;Siemens Corporate Research", "aff_domain": ";cuhk.edu.cn;siemens.com", "position": ";Associate Professor;Principal Researcher", "bibtex": "@inproceedings{\ngu2021effective,\ntitle={Effective and Efficient Vote Attack on Capsule Networks},\nauthor={Jindong Gu and Baoyuan Wu and Volker Tresp},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=33rtZ4Sjwjn}\n}", "github": "[![github](/images/github_icon.svg) JindongGu/VoteAttack](https://github.com/JindongGu/VoteAttack)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;8", "confidence": "3;3;4;2", "wc_review": "850;530;259;231", "wc_reply_reviewers": "0;0;42;0", "wc_reply_authors": "471;613;523;73", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 6.25, 1.0897247358851685 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 467.5, 249.80842659926427 ], "wc_reply_reviewers_avg": [ 10.5, 18.186533479473212 ], "wc_reply_authors_avg": [ 420.0, 206.68091348743357 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6488856845230502, "gs_citation": 37, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17735896064607887754&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=33rtZ4Sjwjn", "email": ";cuhk.edu.cn;siemens.com", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Chinese University of Hong Kong;Siemens AG", "aff_unique_dep": ";Corporate Research", "aff_unique_url": "https://www.cuhk.edu.cn;https://www.siemens.com/research", "aff_unique_abbr": "CUHK;Siemens", "aff_campus_unique_index": "0", "aff_campus_unique": "Shenzhen;", "aff_country_unique_index": "0;1", "aff_country_unique": "China;Germany" }, { "id": "34KAZ9HbJco", "title": "Adapt-and-Adjust: Overcoming the Long-tail Problem of Multilingual Speech Recognition", "track": "main", "status": "Reject", "tldr": "", "abstract": "One crucial challenge of real-world multilingual speech recognition is the long-tailed distribution problem, where some resource-rich languages like English have abundant training data, but a long tail of low-resource languages have varying amounts of limited training data. To overcome the long-tail problem, in this paper, we propose Adapt-and-Adjust (A2), a transformer-based multi-task learning framework for end-to-end multilingual speech recognition. The A2 framework overcomes the long-tail problem via three techniques: (1) exploiting a pretrained multilingual language model (mBERT) to improve the performance of low-resource languages; (2) proposing dual adapters consisting of both language-specific and language-agnostic adaptation with minimal additional parameters; and (3) overcoming the class imbalance, either by imposing class priors in the loss during training or adjusting the logits of the softmax output during inference. Extensive experiments on the CommonVoice corpus show that A2 significantly outperforms conventional approaches.", "keywords": "speech recognition;multilingual;long-tail;adapter;logit adjustments", "primary_area": "", "supplementary_material": "", "author": "Genta Indra Winata;Guangsen Wang;Caiming Xiong;Steven Hoi", "authorids": "~Genta_Indra_Winata1;~Guangsen_Wang2;~Caiming_Xiong1;~Steven_Hoi2", "gender": "M;M;M;M", "homepage": "https://gentawinata.com/;;http://cmxiong.com/;http://stevenhoi.com", "dblp": "https://dblp.uni-trier.de/pers/hd/w/Winata:Genta_Indra;;80/7282;", "google_scholar": "7QxkToIAAAAJ;;vaSdahkAAAAJ;JoLjflYAAAAJ", "orcid": ";;;", "linkedin": "gentaiscool/;guangsen-wang-4071003b/?originalSubdomain=sg;caiming-xiong-150a1417;", "or_profile": "~Genta_Indra_Winata1;~Guangsen_Wang2;~Caiming_Xiong1;~Steven_Hoi2", "aff": "Hong Kong University of Science and Technology;SalesForce.com;Salesforce Research;Singapore Management University", "aff_domain": "ust.hk;salesforce.com;salesforce.com;smu.edu.sg", "position": "PhD student;Senior Applied Scientist;Research Scientist;Associate Professor", "bibtex": "@misc{\nwinata2021adaptandadjust,\ntitle={Adapt-and-Adjust: Overcoming the Long-tail Problem of Multilingual Speech Recognition},\nauthor={Genta Indra Winata and Guangsen Wang and Caiming Xiong and Steven Hoi},\nyear={2021},\nurl={https://openreview.net/forum?id=34KAZ9HbJco}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=34KAZ9HbJco", "pdf_size": 0, "rating": "4;5;5;5;6", "confidence": "4;4;4;4;3", "wc_review": "751;281;539;636;202", "wc_reply_reviewers": "0;0;799;0;0", "wc_reply_authors": "1229;785;2225;930;653", "reply_reviewers": "0;0;1;0;0", "reply_authors": "2;1;4;2;1", "rating_avg": [ 5.0, 0.6324555320336759 ], "confidence_avg": [ 3.8, 0.39999999999999997 ], "wc_review_avg": [ 481.8, 208.86684753689372 ], "wc_reply_reviewers_avg": [ 159.8, 319.6 ], "wc_reply_authors_avg": [ 1164.4, 563.8196874888283 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 2.0, 1.0954451150103321 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7905694150420948, "gs_citation": 63, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18404415562820206616&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "Hong Kong University of Science and Technology;Salesforce;Singapore Management University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ust.hk;https://www.salesforce.com;https://www.smu.edu.sg", "aff_unique_abbr": "HKUST;Salesforce;SMU", "aff_campus_unique_index": "0", "aff_campus_unique": "Hong Kong SAR;", "aff_country_unique_index": "0;1;1;2", "aff_country_unique": "China;United States;Singapore" }, { "id": "36G2rwDbk1k", "title": "On the Discovery of Feature Importance Distribution: An Overlooked Area", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Detecting feature's predictive power is a key problem in Machine Learning. Previous methods have been focusing on providing a single value, usually named feature importance, as a point estimate of the power. However, it is difficult to interpret the predictive power using feature importance. Moreover, in reality feature's predictive power may vary dramatically across feature values. Feature importance, as a point estimate, cannot capture such variance. To address the two problems, we first propose a new definition of feature importance to directly measure feature's predictive power. We then propose a feature importance model to capture a high-resolution distribution of feature importance across feature values. Last we propose a binarized logistic regression model and its learning algorithm to train the feature importance models jointly. We theoretically proved that our approach has the same time complexity as Logistic Regression. Empirical results on three real-world biomedical datasets show that, our approach can detect meaningful feature importance distributions, which could have profound sociological implications. Code, data and full results are publicly available in paper github repository. All the results are reproducible by simply using one command.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Yuxiao Huang", "authorids": "~Yuxiao_Huang1", "gender": "M", "homepage": "", "dblp": "16/10148", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~Yuxiao_Huang1", "aff": "George Washington University", "aff_domain": "gwu.edu", "position": "Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=36G2rwDbk1k", "pdf_size": 0, "rating": "3;4;5", "confidence": "3;5;3", "wc_review": "201;750;238", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 396.3333333333333, 250.53587013085016 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:xCrWs8hcdawJ:scholar.google.com/&scioq=On+the+Discovery+of+Feature+Importance+Distribution:+An+Overlooked+Area&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "George Washington University", "aff_unique_dep": "", "aff_unique_url": "https://www.gwu.edu", "aff_unique_abbr": "GWU", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "37Fh1MiR5Ze", "title": "A Chaos Theory Approach to Understand Neural Network Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite the complicated structure of modern deep neural network architectures, they are still optimized with algorithms based on Stochastic Gradient Descent (SGD). However, the reason behind the effectiveness of SGD is not well understood, making its study an active research area. In this paper, we formulate deep neural network optimization as a dynamical system and show that the rigorous theory developed to study chaotic systems can be useful to understand SGD and its variants. In particular, we first observe that the inverse of the instability timescale of SGD optimization, represented by the largest Lyapunov exponent, corresponds to the most negative eigenvalue of the Hessian of the loss. This observation enables the introduction of an efficient method to estimate the largest eigenvalue of the Hessian. Then, we empirically show that for a large range of learning rates, SGD traverses the loss landscape across regions with largest eigenvalue of the Hessian similar to the inverse of the learning rate. This explains why effective learning rates can be found to be within a large range of values and shows that SGD implicitly uses the largest eigenvalue of the Hessian while traversing the loss landscape. This sheds some light on the effectiveness of SGD over more sophisticated second-order methods. We also propose a quasi-Newton method that dynamically estimates an optimal learning rate for the optimization of deep learning models. We demonstrate that our observations and methods are robust across different architectures and loss functions on CIFAR-10 dataset.", "keywords": "learning theory;stochastic gradient descent;deep learning;neural networks;dynamical systems;chaos theory;Lyapunov exponents", "primary_area": "", "supplementary_material": "", "author": "Michele Sasdelli;Thalaiyasingam Ajanthan;Tat-Jun Chin;Gustavo Carneiro", "authorids": "~Michele_Sasdelli1;~Thalaiyasingam_Ajanthan1;~Tat-Jun_Chin2;~Gustavo_Carneiro1", "gender": "M;M;M;M", "homepage": "http://qmlresearch.com/;https://tajanthan.github.io/;https://cs.adelaide.edu.au/~carneiro/;https://www.ai4space.group/", "dblp": "198/1378;154/6629;53/3609;95/2036", "google_scholar": "https://scholar.google.co.uk/citations?user=-WEzhyEAAAAJ;https://scholar.google.com.au/citations?user=Rza8c10AAAAJ;https://scholar.google.com.au/citations?user=E0TtOWAAAAAJ;https://scholar.google.com.au/citations?user=WyqGF10AAAAJ", "orcid": ";;0000-0002-5571-6220;0000-0003-2423-9342", "linkedin": ";;gustavo-carneiro-3578812/;", "or_profile": "~Michele_Sasdelli1;~Thalaiyasingam_Ajanthan1;~Gustavo_Carneiro1;~Tat-Jun_Chin1", "aff": "The University of Adelaide;Amazon;The University of Adelaide;The University of Adelaide", "aff_domain": "adelaide.edu.au;amazon.com;adelaide.edu.au;adelaide.edu.au", "position": "Postdoc;Researcher;Professor;Full Professor", "bibtex": "@misc{\nsasdelli2021a,\ntitle={A Chaos Theory Approach to Understand Neural Network Optimization},\nauthor={Michele Sasdelli and Thalaiyasingam Ajanthan and Tat-Jun Chin and Gustavo Carneiro},\nyear={2021},\nurl={https://openreview.net/forum?id=37Fh1MiR5Ze}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=37Fh1MiR5Ze", "pdf_size": 0, "rating": "4;4;5", "confidence": "3;5;3", "wc_review": "1157;296;957", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "259;217;219", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 803.3333333333334, 367.91333520575495 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 231.66666666666666, 19.344824171395878 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5155511038028384341&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "University of Adelaide;Amazon", "aff_unique_dep": ";Amazon.com, Inc.", "aff_unique_url": "https://www.adelaide.edu.au;https://www.amazon.com", "aff_unique_abbr": "Adelaide;Amazon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "Australia;United States" }, { "title": "Long-tail learning via logit adjustment", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2675", "id": "37nvvqkCo5", "poster": "", "openreview": "https://openreview.net/forum?id=37nvvqkCo5", "slides": "https://iclr.cc/virtual/2021/poster/2675", "video": "https://iclr.cc/virtual/2021/poster/2675", "author_site": "Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar", "tldr": "", "abstract": "Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels have only a few associated samples. This poses a challenge for generalisation on such labels, and also makes naive learning biased towards dominant labels. In this paper, we present a statistical framework that unifies and generalises several recent proposals to cope with these challenges. Our framework revisits the classic idea of logit adjustment based on the label frequencies, which encourages a large relative margin between logits of rare positive versus dominant negative labels. This yields two techniques for long-tail learning, where such adjustment is either applied post-hoc to a trained model, or enforced in the loss during training. These techniques are statistically grounded, and practically effective on four real-world datasets with long-tailed label distributions. ", "keywords": "long-tail learning;class imbalance", "primary_area": "", "supplementary_material": "/attachment/2bf79f920e77c628ab2c083a15bfd7137aa270c8.zip", "author": "Aditya Krishna Menon;Sadeep Jayasumana;Ankit Singh Rawat;Himanshu Jain;Andreas Veit;Sanjiv Kumar", "authorids": "~Aditya_Krishna_Menon1;~Sadeep_Jayasumana1;~Ankit_Singh_Rawat1;himj@google.com;~Andreas_Veit1;~Sanjiv_Kumar1", "gender": ";;M;;;", "homepage": ";;https://ankitsrawat.github.io/home/;;http://andreasveit.eu/;http://www.sanjivk.com/", "dblp": ";;https://dblp.org/pers/hd/r/Rawat:Ankit_Singh;;133/1801;", "google_scholar": ";;http://scholar.google.com/citations?user=U0_ab4cAAAAJ;;UA9Hb2EAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Aditya_Krishna_Menon1;~Sadeep_Jayasumana1;~Ankit_Singh_Rawat1;himj@google.com;~Andreas_Veit1;~Sanjiv_Kumar1", "aff": ";;Google;;Google;Google", "aff_domain": ";;google.com;;google.com;google.com", "position": ";;Research Scientist;;Senior Research Scientist;Research Scientist", "bibtex": "@inproceedings{\nmenon2021longtail,\ntitle={Long-tail learning via logit adjustment},\nauthor={Aditya Krishna Menon and Sadeep Jayasumana and Ankit Singh Rawat and Himanshu Jain and Andreas Veit and Sanjiv Kumar},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=37nvvqkCo5}\n}", "github": "[![github](/images/github_icon.svg) google-research/google-research](https://github.com/google-research/google-research/tree/master/logit_adjustment) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=37nvvqkCo5)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;7;8;8", "confidence": "4;3;4;4", "wc_review": "253;396;289;403", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "558;228;175;347", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 335.25, 65.54530875661507 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 327.0, 147.19544829919164 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 907, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16016994999710620758&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 11, "pdf": "https://openreview.net/pdf?id=37nvvqkCo5", "email": ";;google.com;;google.com;google.com", "author_num": 6, "aff_unique_index": "0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "389rLpWoOlG", "title": "Machine Learning Algorithms for Data Labeling: An Empirical Evaluation", "track": "main", "status": "Reject", "tldr": "", "abstract": "The lack of labeled data is a major problem in both research and industrial settings since obtaining labels is often an expensive and time-consuming activity. In the past years, several machine learning algorithms were developed to assist and perform automated labeling in partially labeled datasets. While many of these algorithms are available in open-source packages, there is no research that investigates how these algorithms compare to each other in different types of datasets and with different percentages of available labels. To address this problem, this paper empirically evaluates and compares seven algorithms for automated labeling in terms of accuracy. We investigate how these algorithms perform in six different and well-known datasets with three different types of data, images, texts, and numerical values. We evaluate these algorithms under two different experimental conditions, with 10\\% and 50\\% labels of available labels in the dataset. Each algorithm, in each dataset for each experimental condition, is evaluated independently ten times with different random seeds. The results are analyzed and the algorithms are compared utilizing a Bayesian Bradley-Terry model. The results indicate that while the algorithms label spreading with K-nearest neighbors perform better in the aggregated results, the active learning algorithms query by instance QBC and query instance uncertainty sample perform better when there is only 10\\% of labels available. These results can help machine learning practitioners in choosing optimal machine learning algorithms to label their data.", "keywords": "Data Labeling;Empirical Evaluation;Active Machine Learning;Semi-Supervised Learning", "primary_area": "", "supplementary_material": "", "author": "Teodor Anders Fredriksson;David Issa Mattos;Jan Bosch;Helena Holmstr\u00f6m Olsson", "authorids": "~Teodor_Anders_Fredriksson1;davidis@chalmers.se;jan.bosch@chalmers.se;helena.holmstrom.olsson@mau.se", "gender": "M;;;", "homepage": ";;;", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": "https://se.linkedin.com/in/teodor-fredriksson-datascience-ai?challengeId=AQELwKRfr5r8QgAAAXSm-O9g1g0vxVZkEWXH14WiqguJ24RPV5sVXJ1jyApDNu-zrUgm9-tVANWI9tDDppQnZ0uavs3QQP1kGQ&submissionId=3235e4e7-cb38-3616-3611-ef9915434210;;;", "or_profile": "~Teodor_Anders_Fredriksson1;davidis@chalmers.se;jan.bosch@chalmers.se;helena.holmstrom.olsson@mau.se", "aff": "Chalmers University;;;", "aff_domain": "chalmers.se;;;", "position": "PhD student;;;", "bibtex": "@misc{\nfredriksson2021machine,\ntitle={Machine Learning Algorithms for Data Labeling: An Empirical Evaluation},\nauthor={Teodor Anders Fredriksson and David Issa Mattos and Jan Bosch and Helena Holmstr{\\\"o}m Olsson},\nyear={2021},\nurl={https://openreview.net/forum?id=389rLpWoOlG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=389rLpWoOlG", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "4;5;4;4", "wc_review": "119;204;371;343", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 259.25, 102.7433087845627 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:yLlHgu0tXH4J:scholar.google.com/&scioq=Machine+Learning+Algorithms+for+Data+Labeling:+An+Empirical+Evaluation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Chalmers University of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.chalmers.se", "aff_unique_abbr": "Chalmers", "aff_country_unique_index": "0", "aff_country_unique": "Sweden" }, { "title": "Gradient Projection Memory for Continual Learning", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3289", "id": "3AOj0RCNC2", "poster": "", "openreview": "https://openreview.net/forum?id=3AOj0RCNC2", "slides": "https://iclr.cc/virtual/2021/poster/3289", "video": "https://iclr.cc/virtual/2021/poster/3289", "author_site": "Gobinda Saha, Isha Garg, Kaushik Roy", "tldr": "", "abstract": "The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. Existing approaches to enable such learning in artificial neural networks usually rely on network growth, importance based weight update or replay of old data from the memory. In contrast, we propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks. We find the bases of these subspaces by analyzing network representations (activations) after learning each task with Singular Value Decomposition (SVD) in a single shot manner and store them in the memory as Gradient Projection Memory (GPM). With qualitative and quantitative analyses, we show that such orthogonal gradient descent induces minimum to no interference with the past tasks, thereby mitigates forgetting. We evaluate our algorithm on diverse image classification datasets with short and long sequences of tasks and report better or on-par performance compared to the state-of-the-art approaches. ", "keywords": "Continual Learning;Representation Learning;Computer Vision;Deep learning", "primary_area": "", "supplementary_material": "", "author": "Gobinda Saha;Isha Garg;Kaushik Roy", "authorids": "~Gobinda_Saha1;~Isha_Garg1;~Kaushik_Roy1", "gender": "M;F;M", "homepage": "https://sahagobinda.github.io/portfolio/;;https://engineering.purdue.edu/NRL/Group", "dblp": "218/5562;;r/KaushikRoy", "google_scholar": "https://scholar.google.com/citations?hl=en;;to4P8KgAAAAJ", "orcid": ";;", "linkedin": "gobinda-saha/;;", "or_profile": "~Gobinda_Saha1;~Isha_Garg1;~Kaushik_Roy1", "aff": "Purdue University;Purdue University;Purdue University", "aff_domain": "purdue.edu;purdue.edu;purdue.edu", "position": "PhD student;PhD student;Full Professor", "bibtex": "@inproceedings{\nsaha2021gradient,\ntitle={Gradient Projection Memory for Continual Learning},\nauthor={Gobinda Saha and Isha Garg and Kaushik Roy},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3AOj0RCNC2}\n}", "github": "[![github](/images/github_icon.svg) sahagobinda/GPM](https://github.com/sahagobinda/GPM)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;8;8;8", "confidence": "5;5;5;5", "wc_review": "332;309;596;132", "wc_reply_reviewers": "0;0;170;0", "wc_reply_authors": "1191;908;1112;270", "reply_reviewers": "0;0;1;0", "reply_authors": "2;2;3;1", "rating_avg": [ 7.5, 0.8660254037844386 ], "confidence_avg": [ 5.0, 0.0 ], "wc_review_avg": [ 342.25, 165.68399892566572 ], "wc_reply_reviewers_avg": [ 42.5, 73.61215932167728 ], "wc_reply_authors_avg": [ 870.25, 361.61054672119286 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 368, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17694030675794523744&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=3AOj0RCNC2", "email": "purdue.edu;purdue.edu;purdue.edu", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Purdue University", "aff_unique_dep": "", "aff_unique_url": "https://www.purdue.edu", "aff_unique_abbr": "Purdue", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "PMI-Masking: Principled masking of correlated spans", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2783", "id": "3Aoft6NWFej", "poster": "", "openreview": "https://openreview.net/forum?id=3Aoft6NWFej", "slides": "https://iclr.cc/virtual/2021/poster/2783", "video": "https://iclr.cc/virtual/2021/poster/2783", "author_site": "Yoav Levine, Barak Lenz, Opher Lieber, Omri Abend, Kevin Leyton-Brown, Moshe Tennenholtz, Yoav Shoham", "tldr": "", "abstract": "Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs) such as BERT. We show that such uniform masking allows an MLM to minimize its training objective by latching onto shallow local signals, leading to pretraining inefficiency and suboptimal downstream performance. To address this flaw, we propose PMI-Masking, a principled masking strategy based on the concept of Pointwise Mutual Information (PMI), which jointly masks a token n-gram if it exhibits high collocation over the corpus. PMI-Masking motivates, unifies, and improves upon prior more heuristic approaches that attempt to address the drawback of random uniform token masking, such as whole-word masking, entity/phrase masking, and random-span masking. Specifically, we show experimentally that PMI-Masking reaches the performance of prior masking approaches in half the training time, and consistently improves performance at the end of pretraining.", "keywords": "Language modeling;BERT;pointwise mutual information", "primary_area": "", "supplementary_material": "/attachment/18a05922f6b3ef04f925500ac385d3ceb92ed507.zip", "author": "Yoav Levine;Barak Lenz;Opher Lieber;Omri Abend;Kevin Leyton-Brown;Moshe Tennenholtz;Yoav Shoham", "authorids": "~Yoav_Levine1;barakl@ai21.com;opherl@ai21.com;~Omri_Abend1;~Kevin_Leyton-Brown1;~Moshe_Tennenholtz1;~Yoav_Shoham1", "gender": "M;;;M;Not Specified;;M", "homepage": ";;;http://www.cs.huji.ac.il/~oabend/;http://cs.ubc.ca/~kevinlb;http://moshet.net.technion.ac.il;", "dblp": "199/1895;;;30/8159;81/1149;;", "google_scholar": ";;;https://scholar.google.com.tw/citations?user=BD_hRzYAAAAJ;_4dnp0IAAAAJ;;", "orcid": ";;;;0000-0002-7644-5327;;", "linkedin": ";;;;kevinleytonbrown/;;", "or_profile": "~Yoav_Levine1;barakl@ai21.com;opherl@ai21.com;~Omri_Abend1;~Kevin_Leyton-Brown1;~Moshe_Tennenholtz1;~Yoav_Shoham1", "aff": "Hebrew University;;;Hebrew University of Jerusalem;University of British Columbia;Technion - Israel Institute of Technology, Technion;Computer Science Department, Stanford University", "aff_domain": "huji.ac.il;;;huji.ac.il;ubc.ca;technion.ac.il;cs.stanford.edu", "position": "PhD student;;;Associate Professor;Full Professor;Full Professor;Emeritus", "bibtex": "@inproceedings{\nlevine2021pmimasking,\ntitle={{\\{}PMI{\\}}-Masking: Principled masking of correlated spans},\nauthor={Yoav Levine and Barak Lenz and Opher Lieber and Omri Abend and Kevin Leyton-Brown and Moshe Tennenholtz and Yoav Shoham},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3Aoft6NWFej}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;7;8;8", "confidence": "4;4;4;4", "wc_review": "557;365;695;291", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "935;66;800;324", "reply_reviewers": "0;0;0;0", "reply_authors": "4;1;1;1", "rating_avg": [ 7.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 477.0, 158.9528231897754 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 531.25, 351.6570595054221 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 81, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9387318742507969173&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=3Aoft6NWFej", "email": "huji.ac.il;;;huji.ac.il;ubc.ca;technion.ac.il;cs.stanford.edu", "author_num": 7, "aff_unique_index": "0;0;1;2;3", "aff_unique_norm": "Hebrew University of Jerusalem;University of British Columbia;Technion - Israel Institute of Technology;Stanford University", "aff_unique_dep": ";;;Computer Science Department", "aff_unique_url": "https://www.huji.ac.il;https://www.ubc.ca;https://www.technion.ac.il;https://www.stanford.edu", "aff_unique_abbr": "HUJI;UBC;Technion;Stanford", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Jerusalem;Stanford", "aff_country_unique_index": "0;0;1;0;2", "aff_country_unique": "Israel;Canada;United States" }, { "id": "3EM0a2wC-jo", "title": "Learning Online Data Association", "track": "main", "status": "Reject", "tldr": "", "abstract": "When an agent interacts with a complex environment, it receives a stream of percepts in which it may detect entities, such as objects or people. To build up a coherent, low-variance estimate of the underlying state, it is necessary to fuse information from multiple detections over time. To do this fusion, the agent must decide which detections to associate with one another. We address this data-association problem in the setting of an online filter, in which each observation is processed by aggregating into an existing object hypothesis. Classic methods with strong probabilistic foundations exist, but they are computationally expensive and require models that can be difficult to acquire. In this work, we use the deep-learning tools of sparse attention and representation learning to learn a machine that processes a stream of detections and outputs a set of hypotheses about objects in the world. We evaluate this approach on simple clustering problems, problems with dynamics, and a complex image-based domain. We find that it generalizes well from short to long observation sequences and from a few to many hypotheses, outperforming other learning approaches and classical non-learning methods.", "keywords": "Data Association", "primary_area": "", "supplementary_material": "/attachment/7e8b21975aefa5dd3ed43bfa10dc1a33c27a0c7e.zip", "author": "Yilun Du;Joshua B. Tenenbaum;Tomas Perez;Leslie Pack Kaelbling", "authorids": "~Yilun_Du1;~Joshua_B._Tenenbaum1;~Tomas_Perez1;~Leslie_Pack_Kaelbling1", "gender": ";;F;M", "homepage": "https://yilundu.github.io;;http://people.csail.mit.edu/lpk/;http://people.csail.mit.edu/tlp/", "dblp": "204/4379;t/JoshuaBTenenbaum;k/LesliePackKaelbling;90/752", "google_scholar": ";;IcasIiwAAAAJ;gQOKAggAAAAJ", "orcid": ";;0000-0001-6054-7145;", "linkedin": ";;;", "or_profile": "~Yilun_Du1;~Joshua_B._Tenenbaum1;~Leslie_Pack_Kaelbling1;~Tom\u00e1s_Lozano-P\u00e9rez1", "aff": "Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology", "aff_domain": "mit.edu;mit.edu;mit.edu;mit.edu", "position": "PhD student;Professor;Full Professor;Full Professor", "bibtex": "@misc{\ndu2021learning,\ntitle={Learning Online Data Association},\nauthor={Yilun Du and Joshua B. Tenenbaum and Tomas Perez and Leslie Pack Kaelbling},\nyear={2021},\nurl={https://openreview.net/forum?id=3EM0a2wC-jo}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=3EM0a2wC-jo", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;2;3;4", "wc_review": "662;392;556;600", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1171;888;790;864", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;2", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 552.5, 100.02374718035712 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 928.25, 144.7314323151678 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.20751433915982243, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Massachusetts Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://web.mit.edu", "aff_unique_abbr": "MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "3F0Qm7TzNDM", "title": "Variance Based Sample Weighting for Supervised Deep Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "In the context of supervised learning of a function by a Neural Network (NN), we claim and empirically justify that a NN yields better results when the distribution of the data set focuses on regions where the function to learn is steeper. We first traduce this assumption in a mathematically workable way using Taylor expansion. Then, theoretical derivations allow to construct a methodology that we call Variance Based Samples Weighting (VBSW). VBSW uses local variance of the labels to weight the training points. This methodology is general, scalable, cost effective, and significantly increases the performances of a large class of models for various classification and regression tasks on image, text and multivariate data. We highlight its benefits with experiments involving NNs from shallow linear NN to ResNet or Bert.", "keywords": "supervised learning;sample distribution;statistical methods;sample weighting;approximation theory;Taylor expansion", "primary_area": "", "supplementary_material": "/attachment/c6f364c64b6e8845fea0421b223e726743c35687.zip", "author": "Paul Novello;Ga\u00ebl Po\u00ebtte;David Lugato;Pietro Congedo", "authorids": "~Paul_Novello1;gael.poette@cea.fr;david.lugato@cea.fr;pietro.congedo@inria.fr", "gender": "M;;;", "homepage": ";;;", "dblp": "283/7771;;;", "google_scholar": "https://scholar.google.fr/citations?user=uaJK95oAAAAJ;;;", "orcid": "0000-0002-1053-8694;;;", "linkedin": "paul-novello-a036b1a1/;;;", "or_profile": "~Paul_Novello1;gael.poette@cea.fr;david.lugato@cea.fr;pietro.congedo@inria.fr", "aff": "\u00c9cole Polytechnique;;;", "aff_domain": "polytechnique.edu;;;", "position": "PhD student;;;", "bibtex": "@misc{\nnovello2021variance,\ntitle={Variance Based Sample Weighting for Supervised Deep Learning},\nauthor={Paul Novello and Ga{\\\"e}l Po{\\\"e}tte and David Lugato and Pietro Congedo},\nyear={2021},\nurl={https://openreview.net/forum?id=3F0Qm7TzNDM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=3F0Qm7TzNDM", "pdf_size": 0, "rating": "3;6;6;7", "confidence": "4;3;2;4", "wc_review": "309;348;268;216", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "984;745;642;317", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 285.25, 48.976397376695644 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 672.0, 239.58192753210747 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3015113445777637, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18051945973266530183&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Ecole Polytechnique", "aff_unique_dep": "", "aff_unique_url": "https://www.polytechnique.edu", "aff_unique_abbr": "X", "aff_country_unique_index": "0", "aff_country_unique": "France" }, { "id": "3FAl0W6gZ_e", "title": "Three Dimensional Reconstruction of Botanical Trees with Simulatable Geometry", "track": "main", "status": "Reject", "tldr": "", "abstract": "We tackle the challenging problem of creating full and accurate three dimensional reconstructions of botanical trees with the topological and geometric accuracy required for subsequent physical simulation, e.g. in response to wind forces. Although certain aspects of our approach would benefit from various improvements, our results exceed the state of the art especially in geometric and topological complexity and accuracy. Starting with two dimensional RGB image data acquired from cameras attached to drones, we create point clouds, textured triangle meshes, and a simulatable and skinned cylindrical articulated rigid body model. We discuss the pros and cons of each step of our pipeline, and in order to stimulate future research we make the raw and processed data from every step of the pipeline as well as the final geometric reconstructions publicly available.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/fa9af33510cff87b4cf9775e2ae1e484b5aa0c5e.zip", "author": "Ed Quigley;Winnie Lin;Yilin Zhu;Ronald Fedkiw", "authorids": "~Ed_Quigley1;~Winnie_Lin1;~Yilin_Zhu1;~Ronald_Fedkiw1", "gender": ";F;M;", "homepage": "http://physbam.stanford.edu/~equigley/;https://physbam.stanford.edu/~wl1915;https://yilinzhu.me/;", "dblp": ";;;", "google_scholar": ";;neS95GEAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Ed_Quigley1;~Winnie_Lin1;~Yilin_Zhu1;~Ronald_Fedkiw1", "aff": "Industrial Light & Magic;Stanford University;Stanford University;", "aff_domain": "ilm.com;stanford.edu;stanford.edu;", "position": "R&D Engineer;PhD student;PhD student;", "bibtex": "@misc{\nquigley2021three,\ntitle={Three Dimensional Reconstruction of Botanical Trees with Simulatable Geometry},\nauthor={Ed Quigley and Winnie Lin and Yilin Zhu and Ronald Fedkiw},\nyear={2021},\nurl={https://openreview.net/forum?id=3FAl0W6gZ_e}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=3FAl0W6gZ_e", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "4;4;3;3", "wc_review": "263;1077;418;747", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "146;121;87;368", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 626.25, 313.47836847221214 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 180.5, 110.25992018861614 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12146224675680835678&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;1", "aff_unique_norm": "Industrial Light & Magic;Stanford University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ilm.com;https://www.stanford.edu", "aff_unique_abbr": "ILM;Stanford", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Stanford", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "3FK30d5BZdu", "title": "Hidden Incentives for Auto-Induced Distributional Shift", "track": "main", "status": "Reject", "tldr": "", "abstract": "Decisions made by machine learning systems have increasing influence on the world, yet it is common for machine learning algorithms to assume that no such influence exists. An example is the use of the i.i.d. assumption in content recommendation. In fact, the (choice of) content displayed can change users' perceptions and preferences, or even drive them away, causing a shift in the distribution of users. We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs. Our goal is to ensure that machine learning systems do not leverage ADS to increase performance when doing so could be undesirable. We demonstrate that changes to the learning algorithm, such as the introduction of meta-learning, can cause hidden incentives for auto-induced distributional shift (HI-ADS) to be revealed. To address this issue, we introduce `unit tests' and a mitigation strategy for HI-ADS, as well as a toy environment for modelling real-world issues with HI-ADS in content recommendation, where we demonstrate that strong meta-learners achieve gains in performance via ADS. We show meta-learning and Q-learning both sometimes fail unit tests, but pass when using our mitigation strategy.", "keywords": "distributional shift;social impact of AI;content recommendation;incentives;meta-learning", "primary_area": "", "supplementary_material": "", "author": "David Krueger;Tegan Maharaj;Jan Leike", "authorids": "~David_Krueger1;~Tegan_Maharaj1;~Jan_Leike1", "gender": "M;F;M", "homepage": "https://mila.umontreal.ca/en/person/david-scott-krueger/;http://teganmaharaj.com;https://jan.leike.name", "dblp": "142/2741.html;;https://dblp.uni-trier.de/pers/hd/l/Leike:Jan", "google_scholar": "https://scholar.google.ca/citations?user=5Uz70IoAAAAJ;https://scholar.google.ca/citations?user=XpscC-EAAAAJ;beiWcokAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~David_Krueger1;~Tegan_Maharaj1;~Jan_Leike1", "aff": "University of Montreal;Ecole Polytechnique de Montreal;OpenAI", "aff_domain": "umontreal.ca;polymtl.ca;openai.com", "position": "PhD student;PhD student;Alignment Team Lead", "bibtex": "@misc{\nkrueger2021hidden,\ntitle={Hidden Incentives for Auto-Induced Distributional Shift},\nauthor={David Krueger and Tegan Maharaj and Jan Leike},\nyear={2021},\nurl={https://openreview.net/forum?id=3FK30d5BZdu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=3FK30d5BZdu", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "3;3;2;3", "wc_review": "515;387;469;216", "wc_reply_reviewers": "266;0;0;0", "wc_reply_authors": "1338;1005;866;445", "reply_reviewers": "1;0;0;0", "reply_authors": "5;4;2;2", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 396.75, 113.98327728223995 ], "wc_reply_reviewers_avg": [ 66.5, 115.18137870333034 ], "wc_reply_authors_avg": [ 913.5, 320.2815167942103 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 3.25, 1.299038105676658 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 67, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5235071820381314988&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Montreal;Ecole Polytechnique de Montreal;OpenAI", "aff_unique_dep": ";;", "aff_unique_url": "https://wwwumontreal.ca;https://www.polymtl.ca;https://openai.com", "aff_unique_abbr": "UM;Polytechnique Montreal;OpenAI", "aff_campus_unique_index": "1", "aff_campus_unique": ";Montreal", "aff_country_unique_index": "0;0;1", "aff_country_unique": "Canada;United States" }, { "id": "3FkrodAXdk", "title": "Deep Ensembles with Hierarchical Diversity Pruning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Diverse deep ensembles hold the potential for improving accuracy and robustness of deep learning models. Both pairwise and non-pairwise ensemble diversity metrics have been proposed over the past two decades. However, it is also challenging to find the right metrics that can effectively prune those deep ensembles with insufficient ensemble diversity, thus failing to deliver effective ensemble accuracy. In this paper, we first compare six popular diversity metrics in the literature, coined as Q metrics, including both pairwise and non-pairwise representatives. We analyze their inherent limitations in capturing the negative correlation of ensemble member models, and thus inefficient in identifying and pruning low quality ensembles. We next present six HQ ensemble diversity metrics by extending the existing Q-metrics with three novel optimizations: (1) We introduce the concept of focal model and separately measure the ensemble diversity among the deep ensembles of the same team size with the concept of focal model, aiming to better capture the negative correlations of member models of an ensemble. (2) We introduce six HQ-diversity metrics to optimize the corresponding Q-metrics respectively in terms of measuring negative correlation among member models of an ensemble using its ensemble diversity score. (3) We introduce a two phase hierarchical pruning method to effectively identify and prune those deep ensembles with high HQ diversity scores, aiming to increase the lower and upper bounds on ensemble accuracy for the selected ensembles. By combining these three optimizations, deep ensembles selected based on our hierarchical diversity pruning approach significantly outperforms those selected by the corresponding Q-metrics. Comprehensive experimental evaluation over several benchmark datasets shows that our HQ-metrics can effectively select high diversity deep ensembles by pruning out those ensembles with insufficient diversity, and successfully increase the lower bound (worst case) accuracy of the selected deep ensembles, compared to those selected using the state-of-the-art Q-metrics.", "keywords": "Ensemble;Diversity Metrics;Hierarchical Pruning;Ensemble Accuracy;Deep Neural Networks", "primary_area": "", "supplementary_material": "", "author": "Yanzhao Wu;Ling Liu", "authorids": "~Yanzhao_Wu1;~Ling_Liu3", "gender": "M;", "homepage": "http://yanzhaowu.me/;", "dblp": "61/9620-1;", "google_scholar": "ZANfMywAAAAJ;", "orcid": "0000-0001-8761-5486;", "linkedin": "yanzhao-wu;", "or_profile": "~Yanzhao_Wu1;~Ling_Liu3", "aff": "Georgia Institute of Technology;", "aff_domain": "gatech.edu;", "position": "PhD student;", "bibtex": "@misc{\nwu2021deep,\ntitle={Deep Ensembles with Hierarchical Diversity Pruning},\nauthor={Yanzhao Wu and Ling Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=3FkrodAXdk}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=3FkrodAXdk", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "4;5;4;4", "wc_review": "222;483;186;864", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 438.75, 270.9514486028816 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:IvZ2U5lMrt8J:scholar.google.com/&scioq=Deep+Ensembles+with+Hierarchical+Diversity+Pruning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Georgia Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.gatech.edu", "aff_unique_abbr": "Georgia Tech", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "3GYfIYvNNhL", "title": "Characterizing Structural Regularities of Labeled Data in Overparameterized Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Humans are accustomed to environments that contain both regularities and exceptions. For example, at most gas stations, one pays prior to pumping, but the occasional rural station does not accept payment in advance.\nLikewise, deep neural networks can generalize across instances that share common patterns or structures yet have the capacity to memorize rare or irregular forms. We analyze how individual instances are treated by a model via a consistency score. The score characterizes the expected accuracy for a held-out instance given training sets of varying size sampled from the data distribution. We obtain empirical estimates of this score for individual instances in multiple data-sets, and we show that the score identifies out-of-distribution and mislabeled examples at one end of the continuum and strongly regular examples at the other end. We identify computationally inexpensive proxies to the consistency score using statistics collected during training. We apply the score toward understanding the dynamics of representation learning and to filter outliers during training, and we discuss other potential applications including curriculum learning and active data collection.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Ziheng Jiang;Chiyuan Zhang;Kunal Talwar;Michael Curtis Mozer", "authorids": "~Ziheng_Jiang1;~Chiyuan_Zhang1;~Kunal_Talwar1;~Michael_Curtis_Mozer1", "gender": ";M;M;M", "homepage": "http://www.ziheng.org/;http://pluskid.org;http://www.kunaltalwar.org;https://www.cs.colorado.edu/~mozer", "dblp": "14/8980;21/8315;06/3696;m/MichaelCMozer", "google_scholar": "tuRCeekAAAAJ;l_G2vr0AAAAJ;XD_01h8AAAAJ;lmjR_qMAAAAJ", "orcid": ";;;", "linkedin": ";;kunal-talwar-128a6159;", "or_profile": "~Ziheng_Jiang1;~Chiyuan_Zhang1;~Kunal_Talwar1;~Michael_Curtis_Mozer1", "aff": "University of Washington, Seattle;Google;Apple;Google DeepMind", "aff_domain": "uw.edu;google.com;apple.com;google.com", "position": "PhD student;Research Scientist;Research Scientist;Research Scientist", "bibtex": "@misc{\njiang2021characterizing,\ntitle={Characterizing Structural Regularities of Labeled Data in Overparameterized Models},\nauthor={Ziheng Jiang and Chiyuan Zhang and Kunal Talwar and Michael Curtis Mozer},\nyear={2021},\nurl={https://openreview.net/forum?id=3GYfIYvNNhL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=3GYfIYvNNhL", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;4;4", "wc_review": "549;437;631", "wc_reply_reviewers": "0;78;0", "wc_reply_authors": "636;716;480", "reply_reviewers": "0;1;0", "reply_authors": "1;3;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 539.0, 79.51519770878186 ], "wc_reply_reviewers_avg": [ 26.0, 36.76955262170047 ], "wc_reply_authors_avg": [ 610.6666666666666, 97.99773240006911 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 117, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8265006175243503341&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;2;1", "aff_unique_norm": "University of Washington;Google;Apple", "aff_unique_dep": ";Google;Apple Inc.", "aff_unique_url": "https://www.washington.edu;https://www.google.com;https://www.apple.com", "aff_unique_abbr": "UW;Google;Apple", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Seattle;Mountain View;", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "United States;United Kingdom" }, { "id": "3InxcRQsYLf", "title": "VideoGen: Generative Modeling of Videos using VQ-VAE and Transformers", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGen uses VQ-VAE that learns learns downsampled discrete latent representations of a video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings. Despite the simplicity in formulation, ease of training and a light compute requirement, our architecture is able to generate samples competitive with state-of-the-art GAN models for video generation on the BAIR Robot dataset, and generate coherent action-conditioned samples based on experiences gathered from the VizDoom simulator. We hope our proposed architecture serves as a reproducible reference for a minimalistic implementation of transformer based video generation models without requiring industry scale compute resources. Samples are available at https://sites.google.com/view/videogen", "keywords": "video generation;vqvae;transformers;gpt", "primary_area": "", "supplementary_material": "/attachment/f79eb7538f908394e27046c171b8817e3c2af1ee.zip", "author": "Yunzhi Zhang;Wilson Yan;Pieter Abbeel;Aravind Srinivas", "authorids": "yunzhi@berkeley.edu;~Wilson_Yan1;~Pieter_Abbeel2;~Aravind_Srinivas1", "gender": ";M;M;", "homepage": ";https://wilson1yan.github.io/;https://people.eecs.berkeley.edu/~pabbeel/;https://people.eecs.berkeley.edu/~aravind/", "dblp": ";;;218/5157", "google_scholar": ";tR2Qw0YAAAAJ;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ;GhrKC1gAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "yunzhi@berkeley.edu;~Wilson_Yan1;~Pieter_Abbeel2;~Aravind_Srinivas1", "aff": ";University of California, Berkeley;Covariant;University of California, Berkeley", "aff_domain": ";berkeley.edu;covariant.ai;berkeley.edu", "position": ";PhD student;Founder;PhD student", "bibtex": "@misc{\nzhang2021videogen,\ntitle={VideoGen: Generative Modeling of Videos using {\\{}VQ{\\}}-{\\{}VAE{\\}} and Transformers},\nauthor={Yunzhi Zhang and Wilson Yan and Pieter Abbeel and Aravind Srinivas},\nyear={2021},\nurl={https://openreview.net/forum?id=3InxcRQsYLf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=3InxcRQsYLf", "pdf_size": 0, "rating": "4;4;4;4", "confidence": "4;4;5;5", "wc_review": "1195;554;885;316", "wc_reply_reviewers": "0;0;0;37", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;1", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 737.5, 332.56465536794497 ], "wc_reply_reviewers_avg": [ 9.25, 16.021469970012117 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12550420986096797991&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of California, Berkeley;Covariant", "aff_unique_dep": ";", "aff_unique_url": "https://www.berkeley.edu;", "aff_unique_abbr": "UC Berkeley;", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States;" }, { "id": "3JI45wPuReY", "title": "Neural Network Surgery: Combining Training with Topology Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "With ever increasing computational capacities, neural networks become more and more proficient at solving complex tasks. However, picking a sufficiently good network topology usually relies on expert human knowledge. Neural architecture search aims to reduce the extent of expertise that is needed. Modern architecture search techniques often rely on immense computational power, or apply trained meta controllers for decision making. We develop a framework for a genetic algorithm that is both computationally cheap and makes decisions based on mathematical criteria rather than trained parameters. It is a hybrid approach that fuses training and topology optimization together into one process. Structural modifications that are performed include adding or removing layers of neurons, with some re-training applied to make up for incurred change in input-output behaviour. Our ansatz is tested on both the SVHN and (augmented) CIFAR-10 datasets with limited computational overhead compared to training only the baseline. This algorithm can achieve a significant increase in accuracy (as compared to a fully trained baseline), rescue insufficient topologies that in their current state are only able to learn to a limited extent, and dynamically reduce network size without loss in achieved accuracy.", "keywords": "Neural Architecture Search;Genetic Algorithm;SVD", "primary_area": "", "supplementary_material": "", "author": "Elisabeth Schiessler;Roland Aydin;Kevin Linka;Christian Cyron", "authorids": "~Elisabeth_Schiessler1;~Roland_Aydin1;kevin.linka@tuhh.de;christian.cyron@hzg.de", "gender": "F;M;;", "homepage": ";https://www.hereon.de/institutes/material_systems_modeling/machine_learning_and_data/team/098987/index.php.de;;", "dblp": "308/1906;;;", "google_scholar": ";https://scholar.google.de/citations?user=S9dHiKkAAAAJ;;", "orcid": "0000-0001-5520-8325;;;", "linkedin": "http://www.linkedin.com/in/elisabeth-schiessler-947b1314b;;;", "or_profile": "~Elisabeth_Schiessler1;~Roland_Aydin1;kevin.linka@tuhh.de;christian.cyron@hzg.de", "aff": "Helmholtz-Zentrum Hereon;Helmholtz-Zentrum Hereon;;", "aff_domain": "hereon.de;hereon.de;;", "position": "PhD student;Postdoc;;", "bibtex": "@misc{\nschiessler2021neural,\ntitle={Neural Network Surgery: Combining Training with Topology Optimization},\nauthor={Elisabeth Schiessler and Roland Aydin and Kevin Linka and Christian Cyron},\nyear={2021},\nurl={https://openreview.net/forum?id=3JI45wPuReY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=3JI45wPuReY", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;4;4;2", "wc_review": "216;220;104;238", "wc_reply_reviewers": "0;94;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;1;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 194.5, 52.90321351298048 ], "wc_reply_reviewers_avg": [ 23.5, 40.703193977868615 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14700923848738878903&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "aff_unique_index": "0;0", "aff_unique_norm": "Helmholtz-Zentrum Hereon", "aff_unique_dep": "", "aff_unique_url": "https://www.hereon.de", "aff_unique_abbr": "", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "3Jf4Fr2I4T2", "title": "Uncertainty Quantification for Bayesian Optimization", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Bayesian optimization is a class of global optimization techniques. In Bayesian optimization, the underlying objective function is modeled as a realization of a Gaussian process. Although the Gaussian process assumption implies a random distribution of the Bayesian optimization outputs, quantification of this uncertainty is rarely studied in the literature. In this work, we propose a novel approach to assess the output uncertainty of Bayesian optimization algorithms, which proceeds by constructing confidence regions of the maximum point (or value) of the objective function. These regions can be computed efficiently, and their confidence levels are guaranteed by the uniform error bounds for sequential Gaussian process regression newly developed in the present work. Our theory provides a unified uncertainty quantification framework for all existing sequential sampling policies and stopping criteria.", "keywords": "Bayesian optimization;uncertainty quantification;Gaussian process", "primary_area": "", "supplementary_material": "/attachment/19f6085fbee0ed67727198390ef89810319a5e68.zip", "author": "Rui Tuo;Wenjia Wang", "authorids": "~Rui_Tuo1;~Wenjia_Wang2", "gender": "M;M", "homepage": "https://sites.google.com/site/ruituo2017/home?authuser=0;https://www.wenjia-w.com/", "dblp": "184/0554;", "google_scholar": "J_D0pSUAAAAJ;EKS1sO0AAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Rui_Tuo1;~Wenjia_Wang2", "aff": "Texas A&M University - College Station;Hong Kong University of Science and Technology", "aff_domain": "tamu.edu;ust.hk", "position": "Assistant Professor;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=3Jf4Fr2I4T2", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;4;4;3", "wc_review": "211;462;542;192", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 351.75, 153.03655609036684 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16142594250580628100&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1", "aff_unique_norm": "Texas A&M University;Hong Kong University of Science and Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.tamu.edu;https://www.ust.hk", "aff_unique_abbr": "TAMU;HKUST", "aff_campus_unique_index": "0;1", "aff_campus_unique": "College Station;Hong Kong SAR", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;China" }, { "id": "3Jldbtfqfa", "title": "Structural Knowledge Distillation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Knowledge distillation is a critical technique to transfer knowledge between models, typically from a large model (the teacher) to a smaller one (the student). The objective function of knowledge distillation is typically the cross-entropy between the teacher and the student's output distributions. However, for structured prediction problems, the output space is exponential in size; therefore, the cross-entropy objective becomes intractable to compute and optimize directly. In this paper, we derive a factorized form of the knowledge distillation objective for structured prediction, which is tractable for many typical choices of the teacher and student models. In particular, we show the tractability and empirical effectiveness of structural knowledge distillation between sequence labeling and dependency parsing models under four different scenarios: 1) the teacher and student share the same factorization form of the output structure scoring function; 2) the student factorization produces smaller substructures than the teacher factorization; 3) the teacher factorization produces smaller substructures than the student factorization; 4) the factorization forms from the teacher and the student are incompatible.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Xinyu Wang;Yong Jiang;Zhaohui Yan;Zixia Jia;Nguyen Bach;Tao Wang;Zhongqiang Huang;Fei Huang;Kewei Tu", "authorids": "~Xinyu_Wang3;~Yong_Jiang1;~Zhaohui_Yan1;~Zixia_Jia1;~Nguyen_Bach1;~Tao_Wang4;~Zhongqiang_Huang1;~Fei_Huang2;~Kewei_Tu1", "gender": "M;M;M;F;;M;M;M;M", "homepage": "https://wangxinyu0922.github.io;http://jiangyong.site/;;;http://nguyenbh.github.io/;;;https://sites.google.com/view/fei-huang;https://faculty.sist.shanghaitech.edu.cn/faculty/tukw/", "dblp": "68/1277-13;;50/1907;257/1724.html;52/951;12/5838-20;10/3565;h/FeiHuang.html;22/918", "google_scholar": "G33Cf7gAAAAJ;sxXZWQQAAAAJ;R5bvjGMAAAAJ;FdwGDyoAAAAJ;nfHNK9YAAAAJ;;;9r98PpoAAAAJ;5gi3Pm0AAAAJ", "orcid": ";;;;;;;;", "linkedin": ";;;;nguyen-bach-4b37191b/;;;fei-huang-cas-cmu;", "or_profile": "~Xinyu_Wang3;~Yong_Jiang1;~Zhaohui_Yan1;~Zixia_Jia1;~Nguyen_Bach1;~Tao_Wang4;~Zhongqiang_Huang1;~Fei_Huang2;~Kewei_Tu1", "aff": "ShanghaiTech University;Tongyi Lab;Shanghaitech University;ShanghaiTech University;Microsoft;;Alibaba Group;Alibaba Group US;ShanghaiTech University", "aff_domain": "shanghaitech.edu.cn;alibaba-inc.com;shanghaitech.edu.cn;shanghaitech.edu.cn;microsoft.com;;alibaba-inc.com;alibaba-inc.com;shanghaitech.edu.cn", "position": "PhD student;Researcher;PhD student;PhD student;Principal engineer;;Senior Staff Engineer;Principal Researcher;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=3Jldbtfqfa", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "3;4;4;4", "wc_review": "534;102;436;443", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 378.75, 164.39187175769976 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1;0;0;2;3;3;0", "aff_unique_norm": "ShanghaiTech University;Tongyi Lab;Microsoft;Alibaba Group", "aff_unique_dep": ";;Microsoft Corporation;", "aff_unique_url": "https://www.shanghaitech.edu.cn;;https://www.microsoft.com;https://www.alibaba.com", "aff_unique_abbr": "ShanghaiTech;;Microsoft;Alibaba", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;2;0;2;0", "aff_country_unique": "China;;United States" }, { "id": "3LujMJM9EMp", "title": "DEMI: Discriminative Estimator of Mutual Information", "track": "main", "status": "Reject", "tldr": "", "abstract": "Estimating mutual information between continuous random variables is often intractable and extremely challenging for high-dimensional data. Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information. Although showing promise for this difficult problem, the variational methods have been theoretically and empirically proven to have serious statistical limitations: 1) many methods struggle to produce accurate estimates when the underlying mutual information is either low or high; 2) the resulting estimators may suffer from high variance. Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution rather than from the product of its marginal distributions. Moreover, we establish a direct connection between mutual information and the average log odds estimate produced by the classifier on a test set, leading to a simple and accurate estimator of mutual information. We show theoretically that our method and other variational approaches are equivalent when they achieve their optimum, while our method sidesteps the variational bound. Empirical results demonstrate high accuracy of our approach and the advantages of our estimator in the context of representation learning.\n", "keywords": "Mutual information estimation;discriminative classification", "primary_area": "", "supplementary_material": "", "author": "Ruizhi Liao;Daniel Moyer;Polina Golland;William M Wells", "authorids": "~Ruizhi_Liao3;~Daniel_Moyer3;~Polina_Golland1;~William_M_Wells1", "gender": "M;;M;M", "homepage": "http://people.csail.mit.edu/ruizhi/;https://people.csail.mit.edu/polina;https://people.csail.mit.edu/sw/;https://dcmoyer.github.io", "dblp": "40/8436;g/PolinaGolland;w/WilliamMWellsIII;187/6201", "google_scholar": "h1zSzecAAAAJ;;DwXLsT8AAAAJ;sKmoxSMAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Ruizhi_Liao3;~Polina_Golland1;~William_M_Wells1;~Daniel_Moyer2", "aff": "Massachusetts Institute of Technology;Massachusetts Institute of Technology;Computer Science and Artificial Intelligence Laboratory, Electrical Engineering & Computer Science;Massachusetts Institute of Technology", "aff_domain": "mit.edu;mit.edu;csail.mit.edu;csail.mit.edu", "position": "PhD student;Full Professor;Research Scientist;Postdoc", "bibtex": "@misc{\nliao2021demi,\ntitle={{\\{}DEMI{\\}}: Discriminative Estimator of Mutual Information },\nauthor={Ruizhi Liao and Daniel Moyer and Polina Golland and William M Wells},\nyear={2021},\nurl={https://openreview.net/forum?id=3LujMJM9EMp}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=3LujMJM9EMp", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;5;3;2", "wc_review": "355;271;476;38", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "742;539;548;10", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 285.0, 160.145246573228 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 459.75, 272.03343084996004 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7999999999999999, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1696316220124011139&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Massachusetts Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://web.mit.edu", "aff_unique_abbr": "MIT", "aff_campus_unique_index": "1", "aff_campus_unique": ";Cambridge", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "3NG1WgOn0y2", "title": "AETree: Areal Spatial Data Generation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Areal spatial data represent not only geographical locations but also sizes and shapes of physical objects such as buildings in a city. Data-driven generation of such vector-format data requires an effective representation. Inspired by the hierarchical nature of such spatial data, we propose AETree, a tree-based deep auto-encoder network. Unlike common strategies that either treat the data as an unordered set or sort them into a sequence, we preprocess the data into a binary tree via hierarchical clustering. Then a tree encoder learns to extract and merge spatial information from bottom-up iteratively. The resulting global representation is reversely decoded for reconstruction or generation. Experiments on large scale 2D/3D building datasets of both New York and Zurich showed superior performance of AETree than either set-based or sequential auto-regressive deep models.", "keywords": "content generation;spatial data representation;tree-based network", "primary_area": "", "supplementary_material": "/attachment/3f56aa9292c4f4b52f140c0bb2630f4c9c8d2437.zip", "author": "Congcong Wen;Wenyu Han;Hang Zhao;Chen Feng", "authorids": "cw3437@nyu.edu;wenyuhan@nyu.edu;~Hang_Zhao1;~Chen_Feng2", "gender": ";;M;M", "homepage": ";;http://www.mit.edu/~hangzhao/;https://ai4ce.github.io/", "dblp": ";;;01/161-2", "google_scholar": ";;DmahiOYAAAAJ;YeG8ZM0AAAAJ", "orcid": ";;;0000-0003-3211-1576", "linkedin": ";;;simbaforrest/", "or_profile": "cw3437@nyu.edu;wenyuhan@nyu.edu;~Hang_Zhao1;~Chen_Feng2", "aff": ";;Tsinghua University;New York University", "aff_domain": ";;tsinghua.edu.cn;nyu.edu", "position": ";;Assistant Professor;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=3NG1WgOn0y2", "pdf_size": 0, "rating": "2;3;5;5", "confidence": "5;5;4;3", "wc_review": "307;695;91;136", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 307.25, 237.92895473228978 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8703882797784892, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:I-7v9-bod0AJ:scholar.google.com/&scioq=AETree:+Areal+Spatial+Data+Generation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Tsinghua University;New York University", "aff_unique_dep": ";", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.nyu.edu", "aff_unique_abbr": "THU;NYU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "China;United States" }, { "id": "3NemFmEq9jA", "title": "Temporal Attention Modules for Memory-Augmented Neural Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We introduce two temporal attention modules which can be plugged into traditional memory augmented recurrent neural networks to improve their performance in natural language processing tasks.\nThe temporal attention modules provide new inductive biases allowing the models to compute attention distributions over the different time steps of input sequences.\nThe values of these attention distributions can be inspected to identify the sequence's elements that the model considered relevant during the inference.\nUsing the Entity Network (Henaff et al., 2016) as the model backbone, experiments were made on the dataset bAbI tasks, a set of QA tasks.\nDue to the addition of the temporal attention modules, the performance metric increased 26% when the temporal attention was supervised, and 13,5% when it wasn't.\nMoreover, the usage of temporal attention modules proved useful at resolving reasoning tasks that the original model was unable to solve.", "keywords": "multitasking;attention;deep learning;natural language processing", "primary_area": "", "supplementary_material": "/attachment/5dd780fe72b8b5bde7487e060a7699b3da03a3f0.zip", "author": "Rodolfo Palma;Alvaro Soto;Luis Mart\u00ed;Nayat Sanchez-pi", "authorids": "~Rodolfo_Palma1;~Alvaro_Soto1;~Luis_Mart\u00ed1;~Nayat_Sanchez-pi1", "gender": "M;M;;M", "homepage": ";http://asoto.ing.puc.cl;;http://lmarti.com", "dblp": ";25/3682;;46/4895", "google_scholar": ";https://scholar.google.com/citations?hl=en;;https://scholar.google.com.br/citations?user=rT0Cow8AAAAJ", "orcid": ";;;", "linkedin": "rpalmaotero/;;;", "or_profile": "~Rodolfo_Palma1;~Alvaro_Soto1;~Nayat_Sanchez-pi1;~Luis_Marti1", "aff": ";Universidad Cat\u00f3lica de Chile;;Inria Chile Research Center", "aff_domain": ";uc.cl;;inria.cl", "position": ";Associate Professor;;Research Director", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=3NemFmEq9jA", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "4;4;3;3", "wc_review": "548;335;171;276", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 332.5, 137.58724504836923 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:URpMReWldqYJ:scholar.google.com/&scioq=Temporal+Attention+Modules+for+Memory-Augmented+Neural+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Universidad Cat\u00f3lica de Chile;INRIA", "aff_unique_dep": ";Chile Research Center", "aff_unique_url": "https://www.uc.cl;https://www.inria.cl", "aff_unique_abbr": "PUC;Inria", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Chile" }, { "id": "3R--2TdxMps", "title": "Defuse: Debugging Classifiers Through Distilling Unrestricted Adversarial Examples", "track": "main", "status": "Reject", "tldr": "", "abstract": "With the greater proliferation of machine learning models, the imperative of diagnosing and correcting bugs in models has become increasingly clear. As a route to better discover and fix model bugs, we propose failure scenarios: regions on the data manifold that are incorrectly classified by a model. We propose an end-to-end debugging framework called Defuse to use these regions for fixing faulty classifier predictions. The Defuse framework works in three steps. First, Defuse identifies many unrestricted adversarial examples--naturally occurring instances that are misclassified--using a generative model. Next, the procedure distills the misclassified data using clustering into failure scenarios. Last, the method corrects model behavior on the distilled scenarios through an optimization based approach. We illustrate the utility of our framework on a variety of image data sets. We find that Defuse identifies and resolves concerning predictions while maintaining model generalization.", "keywords": "debugging;interpretability;explainability", "primary_area": "", "supplementary_material": "", "author": "Dylan Z Slack;Nathalie Rauschmayr;Krishnaram Kenthapadi", "authorids": "~Dylan_Z_Slack1;rauscn@amazon.com;~Krishnaram_Kenthapadi1", "gender": "M;;M", "homepage": "https://dylanslacks.website;;https://cs.stanford.edu/people/kngk/", "dblp": "https://dblp.org/pers/s/Slack:Dylan.html;;29/4781", "google_scholar": "pyhz-gUAAAAJ;;av5rGaEAAAAJ", "orcid": ";;0000-0003-1237-087X", "linkedin": ";;krishnaramkenthapadi/", "or_profile": "~Dylan_Z_Slack1;rauscn@amazon.com;~Krishnaram_Kenthapadi1", "aff": "University of California, Irvine;;Amazon AWS AI", "aff_domain": "uci.edu;;amazon.com", "position": "PhD student;;Principal Scientist", "bibtex": "@misc{\nslack2021defuse,\ntitle={Defuse: Debugging Classifiers Through Distilling Unrestricted Adversarial Examples},\nauthor={Dylan Z Slack and Nathalie Rauschmayr and Krishnaram Kenthapadi},\nyear={2021},\nurl={https://openreview.net/forum?id=3R--2TdxMps}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=3R--2TdxMps", "pdf_size": 0, "rating": "4;4;6", "confidence": "4;4;3", "wc_review": "549;258;265", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "454;513;70", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 357.3333333333333, 135.5589252768781 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 345.6666666666667, 196.40830487080282 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9999999999999998, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:PqlV7OxifBAJ:scholar.google.com/&scioq=Defuse:+Debugging+Classifiers+Through+Distilling+Unrestricted+Adversarial+Examples&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of California, Irvine;Amazon", "aff_unique_dep": ";Amazon Web Services AI", "aff_unique_url": "https://www.uci.edu;https://aws.amazon.com", "aff_unique_abbr": "UCI;AWS", "aff_campus_unique_index": "0", "aff_campus_unique": "Irvine;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2718", "id": "3RLN4EPMdYd", "poster": "", "openreview": "https://openreview.net/forum?id=3RLN4EPMdYd", "slides": "https://iclr.cc/virtual/2021/poster/2718", "video": "https://iclr.cc/virtual/2021/poster/2718", "author_site": "Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas E Huang, Hyungsuk Yoon, Honglak Lee, Seunghoon Hong", "tldr": "", "abstract": "Learning to predict the long-term future of video frames is notoriously challenging due to the inherent ambiguities in a distant future and dramatic amplification of prediction error over time. Despite the recent advances in the literature, existing approaches are limited to moderately short-term prediction (less than a few seconds), while extrapolating it to a longer future quickly leads to destruction in structure and content. In this work, we revisit the hierarchical models in video prediction. Our method generates future frames by first estimating a sequence of dense semantic structures and subsequently translating the estimated structures to pixels by video-to-video translation model. Despite the simplicity, we show that modeling structures and their dynamics in categorical structure space with stochastic sequential estimator leads to surprisingly successful long-term prediction. We evaluate our method on two challenging video prediction scenarios, \\emph{car driving} and \\emph{human dancing}, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon (\\ie~thousands frames), setting a new standard of video prediction with orders of magnitude longer prediction time than existing approaches. Video results are available at https://1konny.github.io/HVP/.", "keywords": "Video prediction;generative model;long-term prediction", "primary_area": "", "supplementary_material": "", "author": "Wonkwang Lee;Whie Jung;Han Zhang;Ting Chen;Jing Yu Koh;Thomas Huang;Hyungsuk Yoon;Honglak Lee;Seunghoon Hong", "authorids": "~Wonkwang_Lee2;~Whie_Jung1;~Han_Zhang1;~Ting_Chen1;~Jing_Yu_Koh2;thomaseh@umich.edu;~Hyungsuk_Yoon2;~Honglak_Lee2;~Seunghoon_Hong2", "gender": "M;M;M;M;;;M;;", "homepage": "https://www.github.com/1Konny;;https://sites.google.com/corp/view/hanzhang;;;;;;", "dblp": "256/4988;203/5794;;19/1766;;;;;", "google_scholar": "y2p6gTEAAAAJ;https://scholar.google.co.kr/citations?user=hB5GMiIAAAAJ;cxEoVL4AAAAJ;KoXUMbsAAAAJ;;;;;", "orcid": ";;;;;;;;", "linkedin": ";;;;;;hyungsuk-yoon-86711673;;", "or_profile": "~Wonkwang_Lee2;~Whie_Jung1;~Han_Zhang1;~Ting_Chen1;~Jing_Yu_Koh2;thomaseh@umich.edu;~Hyungsuk_Yoon2;~Honglak_Lee2;~Seunghoon_Hong2", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Google;Google;;;MOLOCO;;", "aff_domain": "kaist.ac.kr;kaist.ac.kr;google.com;google.com;;;molocoads.com;;", "position": "MS student;PhD student;Researcher;Research Scientist;;;Machine Learning Engineer;;", "bibtex": "@inproceedings{\nlee2021revisiting,\ntitle={Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction},\nauthor={Wonkwang Lee and Whie Jung and Han Zhang and Ting Chen and Jing Yu Koh and Thomas Huang and Hyungsuk Yoon and Honglak Lee and Seunghoon Hong},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3RLN4EPMdYd}\n}", "github": "[![github](/images/github_icon.svg) 1Konny/HierarchicalVideoPrediction](https://github.com/1Konny/HierarchicalVideoPrediction)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "5;4;4;5", "wc_review": "336;511;174;298", "wc_reply_reviewers": "0;181;0;0", "wc_reply_authors": "736;518;562;524", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 329.75, 120.57855323398104 ], "wc_reply_reviewers_avg": [ 45.25, 78.3752990424917 ], "wc_reply_authors_avg": [ 585.0, 88.79752248796134 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 32, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3252280345602395682&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=3RLN4EPMdYd", "email": "kaist.ac.kr;kaist.ac.kr;google.com;google.com;;;molocoads.com;;", "author_num": 9, "aff_unique_index": "0;0;1;1;2", "aff_unique_norm": "Korea Advanced Institute of Science and Technology;Google;MOLOCO", "aff_unique_dep": ";Google;", "aff_unique_url": "https://www.kaist.ac.kr;https://www.google.com;https://www.moloco.com", "aff_unique_abbr": "KAIST;Google;MOLOCO", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;1;1;0", "aff_country_unique": "South Korea;United States" }, { "title": "Incremental few-shot learning via vector quantization in deep embedded space", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2948", "id": "3SV-ZePhnZM", "poster": "", "openreview": "https://openreview.net/forum?id=3SV-ZePhnZM", "slides": "https://iclr.cc/virtual/2021/poster/2948", "video": "https://iclr.cc/virtual/2021/poster/2948", "author_site": "Kuilin Chen, Chi-Guhn Lee", "tldr": "", "abstract": "The capability of incrementally learning new tasks without forgetting old ones is a challenging problem due to catastrophic forgetting. This challenge becomes greater when novel tasks contain very few labelled training samples. Currently, most methods are dedicated to class-incremental learning and rely on sufficient training data to learn additional weights for newly added classes. Those methods cannot be easily extended to incremental regression tasks and could suffer from severe overfitting when learning few-shot novel tasks. In this study, we propose a nonparametric method in deep embedded space to tackle incremental few-shot learning problems. The knowledge about the learned tasks are compressed into a small number of quantized reference vectors. The proposed method learns new tasks sequentially by adding more reference vectors to the model using few-shot samples in each novel task. For classification problems, we employ the nearest neighbor scheme to make classification on sparsely available data and incorporate intra-class variation, less forgetting regularization and calibration of reference vectors to mitigate catastrophic forgetting. In addition, the proposed learning vector quantization (LVQ) in deep embedded space can be customized as a kernel smoother to handle incremental few-shot regression tasks. Experimental results demonstrate that the proposed method outperforms other state-of-the-art methods in incremental learning.", "keywords": "incremental learning;few-shot;vector quantization", "primary_area": "", "supplementary_material": "", "author": "Kuilin Chen;Chi-Guhn Lee", "authorids": "~Kuilin_Chen1;~Chi-Guhn_Lee1", "gender": "M;M", "homepage": ";http://cglee.mie.utoronto.ca", "dblp": ";62/4690", "google_scholar": "Q7eOfgoAAAAJ;https://scholar.google.ca/citations?user=ZpALG2AAAAAJ", "orcid": ";0000-0002-0916-0241", "linkedin": ";", "or_profile": "~Kuilin_Chen1;~Chi-Guhn_Lee1", "aff": "University of Toronto;University of Toronto", "aff_domain": "toronto.ca;mie.utoronto.ca", "position": "PhD student;Full Professor", "bibtex": "@inproceedings{\nchen2021incremental,\ntitle={Incremental few-shot learning via vector quantization in deep embedded space},\nauthor={Kuilin Chen and Chi-Guhn Lee},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3SV-ZePhnZM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;4;3;4", "wc_review": "666;573;244;339", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "504;513;441;416", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 455.5, 170.60260842085623 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 468.5, 41.08831950810352 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 118, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10481024282156272929&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=3SV-ZePhnZM", "email": "toronto.ca;mie.utoronto.ca", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Toronto", "aff_unique_dep": "", "aff_unique_url": "https://www.utoronto.ca", "aff_unique_abbr": "U of T", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Canada" }, { "title": "WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2533", "id": "3SqrRe8FWQ-", "poster": "", "openreview": "https://openreview.net/forum?id=3SqrRe8FWQ-", "slides": "https://iclr.cc/virtual/2021/poster/2533", "video": "https://iclr.cc/virtual/2021/poster/2533", "author_site": "Renkun Ni, Hong-Min Chu, Oscar Castaneda, Ping-yeh Chiang, Christoph Studer, Tom Goldstein", "tldr": "", "abstract": "Low-precision neural networks represent both weights and activations with few bits, drastically reducing the cost of multiplications. Meanwhile, these products are accumulated using high-precision (typically 32-bit) additions. Additions dominate the arithmetic complexity of inference in quantized (e.g., binary) nets, and high precision is needed to avoid overflow. To further optimize inference, we propose WrapNet, an architecture that adapts neural networks to use low-precision (8-bit) additions while achieving classification accuracy comparable to their 32-bit counterparts. We achieve resilience to low-precision accumulation by inserting a cyclic activation layer that makes results invariant to overflow. We demonstrate the efficacy of our approach using both software and hardware platforms.", "keywords": "quantization;efficient inference", "primary_area": "", "supplementary_material": "/attachment/f76fe0d2bf9e814b5b180eb3a2cbb5b026d4f7ff.zip", "author": "Renkun Ni;Hong-min Chu;Oscar Castaneda;Ping-yeh Chiang;Christoph Studer;Tom Goldstein", "authorids": "~Renkun_Ni1;hmchu@cs.umd.edu;~Oscar_Castaneda1;~Ping-yeh_Chiang1;~Christoph_Studer1;~Tom_Goldstein1", "gender": "M;;;;M;M", "homepage": "https://www.cs.umd.edu/~rn9zm/;;;;http://iis.ee.ethz.ch;https://www.cs.umd.edu/~tomg/", "dblp": "183/7067;;;236/4288;51/3407;25/8184", "google_scholar": ";;qkHVrDgAAAAJ;WUoMq1IAAAAJ;Jco5C7sAAAAJ;KmSuVtgAAAAJ", "orcid": ";;;;0000-0001-8950-6267;", "linkedin": ";;;;christoph-studer-6153a336/;", "or_profile": "~Renkun_Ni1;hmchu@cs.umd.edu;~Oscar_Castaneda1;~Ping-yeh_Chiang1;~Christoph_Studer1;~Tom_Goldstein1", "aff": "Department of Computer Science, University of Maryland, College Park;;Swiss Federal Institute of Technology;University of Maryland, College Park;Swiss Federal Institute of Technology;University of Maryland, College Park", "aff_domain": "cs.umd.edu;;ethz.ch;umd.edu;ethz.ch;umd.edu", "position": "PhD student;;PhD student;PhD student;Associate professor;Associate Professor", "bibtex": "@inproceedings{\nni2021wrapnet,\ntitle={WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic},\nauthor={Renkun Ni and Hong-min Chu and Oscar Castaneda and Ping-yeh Chiang and Christoph Studer and Tom Goldstein},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3SqrRe8FWQ-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "5;7;7;7", "confidence": "4;3;5;4", "wc_review": "319;310;296;294", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "509;343;764;676", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 304.75, 10.280442597476044 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 573.0, 161.31180985904288 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 19, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13089581612501531671&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=3SqrRe8FWQ-", "email": "cs.umd.edu;;ethz.ch;umd.edu;ethz.ch;umd.edu", "author_num": 6, "aff_unique_index": "0;1;2;1;2", "aff_unique_norm": "University of Maryland, College Park;Swiss Federal Institute of Technology;University of Maryland", "aff_unique_dep": "Department of Computer Science;;", "aff_unique_url": "https://www/umd.edu;https://www.ethz.ch;https://www/umd.edu", "aff_unique_abbr": "UMD;ETH Zurich;UMD", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "College Park;", "aff_country_unique_index": "0;1;0;1;0", "aff_country_unique": "United States;Switzerland" }, { "title": "The Recurrent Neural Tangent Kernel", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2590", "id": "3T9iFICe0Y9", "poster": "", "openreview": "https://openreview.net/forum?id=3T9iFICe0Y9", "slides": "https://iclr.cc/virtual/2021/poster/2590", "video": "https://iclr.cc/virtual/2021/poster/2590", "author_site": "Sina Alemohammad, Jack Wang, Randall Balestriero, Richard Baraniuk", "tldr": "", "abstract": "The study of deep neural networks (DNNs) in the infinite-width limit, via the so-called neural tangent kernel (NTK) approach, has provided new insights into the dynamics of learning, generalization, and the impact of initialization. One key DNN architecture remains to be kernelized, namely, the recurrent neural network (RNN). In this paper we introduce and study the Recurrent Neural Tangent Kernel (RNTK), which provides new insights into the behavior of overparametrized RNNs. A key property of the RNTK should greatly benefit practitioners is its ability to compare inputs of different length. To this end, we characterize how the RNTK weights different time steps to form its output under different initialization parameters and nonlinearity choices. A synthetic and 56 real-world data experiments demonstrate that the RNTK offers significant performance gains over other kernels, including standard NTKs, across a wide array of data sets. ", "keywords": "Neural Tangent Kernel;Recurrent Neural Network;Gaussian Process;Overparameterization", "primary_area": "", "supplementary_material": "/attachment/dd17167fccf9dfa0b2f8f6ce87b96205c1f2b33b.zip", "author": "Sina Alemohammad;Zichao Wang;Randall Balestriero;Richard Baraniuk", "authorids": "~Sina_Alemohammad1;~Zichao_Wang1;~Randall_Balestriero1;~Richard_Baraniuk1", "gender": "M;Not Specified;M;", "homepage": ";https://zichaow.github.io;https://randallbalestriero.github.io/;http://richb.rice.edu/", "dblp": "267/9746;188/0340.html;175/5364;32/2804", "google_scholar": "https://scholar.google.co.il/citations?user=ATjmZVsAAAAJ;IbCALKcAAAAJ;S1x_xqcAAAAJ;https://scholar.google.com.tw/citations?user=N-BBA20AAAAJ", "orcid": ";;;", "linkedin": ";;randallbalestriero/;richard-baraniuk", "or_profile": "~Sina_Alemohammad1;~Zichao_Wang1;~Randall_Balestriero1;~Richard_Baraniuk1", "aff": "Rice University;Rice University;Rice University;William Marsh Rice University", "aff_domain": "rice.edu;rice.edu;rice.edu;rice.edu", "position": "PhD student;PhD student;PhD student;C. Sidney Burrus Professor", "bibtex": "@inproceedings{\nalemohammad2021the,\ntitle={The Recurrent Neural Tangent Kernel},\nauthor={Sina Alemohammad and Zichao Wang and Randall Balestriero and Richard Baraniuk},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3T9iFICe0Y9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;3;4", "wc_review": "130;323;141", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "431;401;49", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 198.0, 88.50235401765688 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 293.6666666666667, 173.438429677188 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 97, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2317330622518273962&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=3T9iFICe0Y9", "email": "rice.edu;rice.edu;rice.edu;rice.edu", "author_num": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Rice University", "aff_unique_dep": "", "aff_unique_url": "https://www.rice.edu", "aff_unique_abbr": "Rice", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "RMSprop converges with proper hyper-parameter", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3374", "id": "3UDSdyIcBDA", "poster": "", "openreview": "https://openreview.net/forum?id=3UDSdyIcBDA", "slides": "https://iclr.cc/virtual/2021/poster/3374", "video": "https://iclr.cc/virtual/2021/poster/3374", "author_site": "Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun", "tldr": "", "abstract": "Despite the existence of divergence examples, RMSprop remains \none of the most popular algorithms in machine learning. Towards closing the gap between theory and practice, we prove that RMSprop converges with proper choice of hyper-parameters under certain conditions. More specifically, we prove that when the hyper-parameter $\\beta_2$ is close enough to $1$, RMSprop and its random shuffling version converge to a bounded region in general, and to critical points in the interpolation regime. It is worth mentioning that our results do not depend on ``bounded gradient\" assumption, which is often the key assumption utilized by existing theoretical work for Adam-type adaptive gradient method. Removing this assumption allows us to establish a phase transition from divergence to non-divergence for RMSprop. \n\nFinally, based on our theory, we conjecture that in practice there is a critical threshold $\\sf{\\beta_2^*}$, such that RMSprop generates reasonably good results only if $1>\\beta_2\\ge \\sf{\\beta_2^*}$. We provide empirical evidence for such a phase transition in our numerical experiments.", "keywords": "RMSprop;convergence;hyperparameter", "primary_area": "", "supplementary_material": "/attachment/21858fdc311c3fd76f74c5ec972560ad374b55c9.zip", "author": "Naichen Shi;Dawei Li;Mingyi Hong;Ruoyu Sun", "authorids": "~Naichen_Shi1;~Dawei_Li3;~Mingyi_Hong1;~Ruoyu_Sun1", "gender": ";M;M;", "homepage": ";;http://people.ece.umn.edu/~mhong/mingyi.html;https://ruoyus.github.io/", "dblp": ";;57/8053;30/9879-1", "google_scholar": ";;qRnP-p0AAAAJ;PsfzbCMAAAAJ", "orcid": ";0000-0003-0374-3101;;", "linkedin": ";;;", "or_profile": "~Naichen_Shi1;~Dawei_Li3;~Mingyi_Hong1;~Ruoyu_Sun1", "aff": ";University of Illinois, Urbana Champaign;University of Minnesota, Minneapolis;University of Illinois, Urbana-Champaign", "aff_domain": ";illinois.edu;umn.edu;uiuc.edu", "position": ";PhD student;Associate Professor;Assistant Professor", "bibtex": "@inproceedings{\nshi2021rmsprop,\ntitle={{RMS}prop converges with proper hyper-parameter},\nauthor={Naichen Shi and Dawei Li and Mingyi Hong and Ruoyu Sun},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3UDSdyIcBDA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;8;8", "confidence": "3;3;3", "wc_review": "265;410;288", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "140;660;128", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 7.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 321.0, 63.62913378843583 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 309.3333333333333, 248.00716835518196 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 98, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9047534316058211829&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=3UDSdyIcBDA", "email": ";illinois.edu;umn.edu;uiuc.edu", "author_num": 4, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Illinois Urbana-Champaign;University of Minnesota;University of Illinois", "aff_unique_dep": ";;", "aff_unique_url": "https://illinois.edu;https://www.minnesota.edu;https://illinois.edu", "aff_unique_abbr": "UIUC;UMN;UIUC", "aff_campus_unique_index": "0;1;0", "aff_campus_unique": "Urbana-Champaign;Minneapolis", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "3UTezOEABr", "title": "TimeAutoML: Autonomous Representation Learning for Multivariate Irregularly Sampled Time Series", "track": "main", "status": "Reject", "tldr": "", "abstract": "Multivariate time series (MTS) data are becoming increasingly ubiquitous in diverse domains, e.g., IoT systems, health informatics, and 5G networks. To obtain an effective representation of MTS data, it is not only essential to consider unpredictable dynamics and highly variable lengths of these data but also important to address the irregularities in the sampling rates of MTS. Existing parametric approaches rely on manual hyperparameter tuning and may cost a huge amount of labor effort. Therefore, it is desirable to learn the representation automatically and efficiently. To this end, we propose an autonomous representation learning approach for multivariate time series (TimeAutoML) with irregular sampling rates and variable lengths. As opposed to previous works, we first present a representation learning pipeline in which the configuration and hyperparameter optimization\nare fully automatic and can be tailored for various tasks, e.g., anomaly detection, clustering, etc. Next, a negative sample generation approach and an auxiliary classification task are developed and integrated within TimeAutoML to enhance\nits representation capability. Extensive empirical studies on real-world datasets demonstrate that the proposed TimeAutoML outperforms competing approaches on various tasks by a large margin. In fact, it achieves the best anomaly detection\nperformance among all comparison algorithms on 78 out of all 85 UCR datasets, acquiring up to 20% performance improvement in terms of AUC score.", "keywords": "representation learning;AutoML;irregularly sampled time series;anomaly detection;clustering", "primary_area": "", "supplementary_material": "", "author": "Yang Jiao;Kai Yang;shaoyu dou;pan luo;Sijia Liu;Dongjin Song", "authorids": "yangjiao@tongji.edu.cn;~Kai_Yang3;shaoyu@tongji.edu.cn;lp@tongji.edu.cn;~Sijia_Liu1;~Dongjin_Song2", "gender": ";;;;M;M", "homepage": ";;;;https://lsjxjtu.github.io/;https://songdj.github.io/", "dblp": ";;;;128/6972-1;41/3281", "google_scholar": ";;;;C7dO_UgAAAAJ;BJdHw6AAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "yangjiao@tongji.edu.cn;~Kai_Yang3;shaoyu@tongji.edu.cn;lp@tongji.edu.cn;~Sijia_Liu1;~Dongjin_Song2", "aff": ";;;;Michigan State University;University of Connecticut", "aff_domain": ";;;;msu.edu;uconn.edu", "position": ";;;;Assistant Professor;Assistant Professor", "bibtex": "@misc{\njiao2021timeautoml,\ntitle={TimeAuto{\\{}ML{\\}}: Autonomous Representation Learning for Multivariate Irregularly Sampled Time Series},\nauthor={Yang Jiao and Kai Yang and shaoyu dou and pan luo and Sijia Liu and Dongjin Song},\nyear={2021},\nurl={https://openreview.net/forum?id=3UTezOEABr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=3UTezOEABr", "pdf_size": 0, "rating": "3;4;4", "confidence": "5;4;4", "wc_review": "342;789;319", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "145;577;362", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 483.3333333333333, 216.3428349223108 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 361.3333333333333, 176.36389149205746 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.9999999999999997, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12130192855692221748&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Michigan State University;University of Connecticut", "aff_unique_dep": ";", "aff_unique_url": "https://www.msu.edu;https://www.uconn.edu", "aff_unique_abbr": "MSU;UConn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "3Wp8HM2CNdR", "title": "Whitening for Self-Supervised Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Most of the self-supervised representation learning methods are based on the contrastive loss and the instance-discrimination task, where augmented versions of the same image instance (\"positives\") are contrasted with instances extracted from other images (\"negatives\"). For the learning to be effective, a lot of negatives should be compared with a positive pair, which is computationally demanding. In this paper, we propose a different direction and a new loss function for self-supervised representation learning which is based on the whitening of the latent-space features. The whitening operation has a \"scattering\" effect on the batch samples, which compensates the use of negatives, avoiding degenerate solutions where all the sample representations collapse to a single point. Our Whitening MSE (W-MSE) loss does not require special heuristics (e.g. additional networks) and it is conceptually simple. Since negatives are not needed, we can extract multiple positive pairs from the same image instance. We empirically show that W-MSE is competitive with respect to popular, more complex self-supervised methods. The source code of the method and all the experiments is included in the Supplementary Material.", "keywords": "self-supervised learning;unsupervised learning;contrastive loss;triplet loss;whitening", "primary_area": "", "supplementary_material": "/attachment/2790adbd11faa4324af8cd2b36ea6cbd0941b94f.zip", "author": "Aleksandr Ermolov;Aliaksandr Siarohin;Enver Sangineto;Nicu Sebe", "authorids": "~Aleksandr_Ermolov1;~Aliaksandr_Siarohin1;~Enver_Sangineto1;~Nicu_Sebe1", "gender": "M;M;;M", "homepage": ";;;http://disi.unitn.it/~sebe/", "dblp": ";199/1971;http://dblp.uni-trier.de/pers/hd/s/Sangineto:Enver;20/3519", "google_scholar": ";https://scholar.google.it/citations?user=uMl5-k4AAAAJ;https://scholar.google.it/citations?user=eJZlvlAAAAAJ;https://scholar.google.it/citations?user=stFCYOAAAAAJ", "orcid": ";;;0000-0002-6597-7248", "linkedin": "alexander-ermolov/;;;", "or_profile": "~Aleksandr_Ermolov1;~Aliaksandr_Siarohin1;~Enver_Sangineto1;~Nicu_Sebe1", "aff": "University of Trento;University of Trento;University of Trento;University of Trento", "aff_domain": "unitn.it;unitn.it;unitn.it;unitn.it", "position": "PhD student;PhD student;Postdoc;Full Professor", "bibtex": "@misc{\nermolov2021whitening,\ntitle={Whitening for Self-Supervised Representation Learning},\nauthor={Aleksandr Ermolov and Aliaksandr Siarohin and Enver Sangineto and Nicu Sebe},\nyear={2021},\nurl={https://openreview.net/forum?id=3Wp8HM2CNdR}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=3Wp8HM2CNdR", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;4;3;4", "wc_review": "548;428;459;431", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "129;193;330;267", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 466.5, 48.58240422210494 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 229.75, 75.72771949557176 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 377, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14222215050873553089&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "University of Trento", "aff_unique_dep": "", "aff_unique_url": "https://www.unitn.it", "aff_unique_abbr": "UniTN", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Italy" }, { "id": "3X4JzHq5fU5", "title": "Optimal Designs of Gaussian Processes with Budgets for Hyperparameter Optimization", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The remarkable performance of modern deep learning methods depends critically on the optimization of their hyperparameters. One major challenge is that evaluating a single hyperparameter configuration on large datasets could nowadays easily exceed hours or days. For efficient sampling and fast evaluation, some previous works presented effective computing resource allocation schemes and built a Bayesian surrogate model to sample candidate hyperparameters. However, the model itself is not related to budgets which are set manually. To deal with this problem, a new Gaussian Process model involved in budgets is proposed. Further, for this model, an optimal design is constructed by the equivalence theorem to replace random search as an initial sampling strategy in the search space. Experiments demonstrate that the new model has the best performance among competing methods. Moreover, comparisons between different initial designs with the same model show the advantage of the proposed optimal design.", "keywords": "Automated Hyperparameter Optimization;Budgets;Efficiency;Optimal Initial Design;Robustness", "primary_area": "", "supplementary_material": "/attachment/1c0662b04be5c69835c8dee6aa37f3315c244ff3.zip", "author": "Yimin Huang;Yujun Li;Zhenguo Li;Zhihua Zhang", "authorids": "~Yimin_Huang2;~Yujun_Li1;~Zhenguo_Li1;~Zhihua_Zhang1", "gender": "M;M;M;M", "homepage": ";;http://www.ee.columbia.edu/~zgli/;http://www.math.pku.edu.cn/teachers/zhzhang/", "dblp": "https://dblp.uni-trier.de/pers/hd/h/Huang:Yimin;37/6489;23/6479;52/5331", "google_scholar": ";;XboZC1AAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Yimin_Huang2;~Yujun_Li1;~Zhenguo_Li1;~Zhihua_Zhang1", "aff": "Huawei Technologies Ltd.;Huawei Technologies Ltd.;Huawei Noah's Ark Lab;Peking University", "aff_domain": "huawei.com;huawei.com;huawei.com;pku.edu.cn", "position": "Researcher;Researcher;Principal Researcher;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=3X4JzHq5fU5", "pdf_size": 0, "rating": "3;4;4", "confidence": "4;4;3", "wc_review": "619;907;654", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 726.6666666666666, 128.3129853998504 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.49999999999999983, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:SE3i5lQnoQwJ:scholar.google.com/&scioq=Optimal+Designs+of+Gaussian+Processes+with+Budgets+for+Hyperparameter+Optimization&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Huawei;Peking University", "aff_unique_dep": "Huawei Technologies;", "aff_unique_url": "https://www.huawei.com;http://www.pku.edu.cn", "aff_unique_abbr": "Huawei;Peking U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "title": "Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2821", "id": "3X64RLgzY6O", "poster": "", "openreview": "https://openreview.net/forum?id=3X64RLgzY6O", "slides": "https://iclr.cc/virtual/2021/poster/2821", "video": "https://iclr.cc/virtual/2021/poster/2821", "author_site": "Jingfeng Wu, Difan Zou, vladimir braverman, Quanquan Gu", "tldr": "", "abstract": "Understanding the algorithmic bias of stochastic gradient descent (SGD) is one of the key challenges in modern machine learning and deep learning theory. Most of the existing works, however, focus on very small or even infinitesimal learning rate regime, and fail to cover practical scenarios where the learning rate is moderate and annealing. In this paper, we make an initial attempt to characterize the particular regularization effect of SGD in the moderate learning rate regime by studying its behavior for optimizing an overparameterized linear regression problem. In this case, SGD and GD are known to converge to the unique minimum-norm solution; however, with the moderate and annealing learning rate, we show that they exhibit different directional bias: SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions. Furthermore, we show that such directional bias does matter when early stopping is adopted, where the SGD output is nearly optimal but the GD output is suboptimal. Finally, our theory explains several folk arts in practice used for SGD hyperparameter tuning, such as (1) linearly scaling the initial learning rate with batch size; and (2) overrunning SGD with high learning rate even when the loss stops decreasing.", "keywords": "SGD;regularization;implicit bias", "primary_area": "", "supplementary_material": "/attachment/8d9e52b742e40baed8fb3c33f76999c8e4b1e8bb.zip", "author": "Jingfeng Wu;Difan Zou;Vladimir Braverman;Quanquan Gu", "authorids": "~Jingfeng_Wu1;~Difan_Zou1;~Vladimir_Braverman1;~Quanquan_Gu1", "gender": "M;M;Unspecified;M", "homepage": "https://uuujf.github.io;https://difanzou.github.io/;http://www.cs.jhu.edu/~vova/;http://web.cs.ucla.edu/~qgu/", "dblp": ";161/8923;14/4758;50/4597", "google_scholar": "z-KILD8AAAAJ;Cp4fcTQAAAAJ;https://scholar.google.com.tw/citations?user=DTthB48AAAAJ;GU9HgNAAAAAJ", "orcid": "0009-0009-3414-4487;;;", "linkedin": "jingfeng-wu-79205b184/;;;", "or_profile": "~Jingfeng_Wu1;~Difan_Zou1;~Vladimir_Braverman1;~Quanquan_Gu1", "aff": "Johns Hopkins University;University of California, Los Angeles;Department of Computer Science, Whiting School of Engineering;University of California, Los Angeles", "aff_domain": "jhu.edu;ucla.edu;cs.jhu.edu;cs.ucla.edu", "position": "PhD student;PhD student;Full Professor;Assistant Professor", "bibtex": "@inproceedings{\nwu2021direction,\ntitle={Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate},\nauthor={Jingfeng Wu and Difan Zou and Vladimir Braverman and Quanquan Gu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3X64RLgzY6O}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7", "confidence": "3;3;3", "wc_review": "291;416;440", "wc_reply_reviewers": "136;216;112", "wc_reply_authors": "868;1284;978", "reply_reviewers": "1;1;2", "reply_authors": "2;2;2", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 382.3333333333333, 65.32142748661337 ], "wc_reply_reviewers_avg": [ 154.66666666666666, 44.46221866808818 ], "wc_reply_authors_avg": [ 1043.3333333333333, 176.00252523440932 ], "reply_reviewers_avg": [ 1.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 47, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8396522963722221510&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=3X64RLgzY6O", "email": "jhu.edu;ucla.edu;cs.jhu.edu;cs.ucla.edu", "author_num": 4, "aff_unique_index": "0;1;0;1", "aff_unique_norm": "Johns Hopkins University;University of California, Los Angeles", "aff_unique_dep": ";", "aff_unique_url": "https://www.jhu.edu;https://www.ucla.edu", "aff_unique_abbr": "JHU;UCLA", "aff_campus_unique_index": "1;2;1", "aff_campus_unique": ";Los Angeles;Baltimore", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "3YQAVD9_Dz3", "title": "NOSE Augment: Fast and Effective Data Augmentation Without Searching", "track": "main", "status": "Reject", "tldr": "", "abstract": "Data augmentation has been widely used for enhancing the diversity of training data and model generalization. Different from traditional handcrafted methods, recent research introduced automated search for optimal data augmentation policies and achieved state-of-the-art results on image classification tasks. However, these search-based implementations typically incur high computation cost and long search time because of large search spaces and complex searching algorithms. We revisited automated augmentation from alternate perspectives, such as increasing diversity and manipulating the overall usage of augmented data. In this paper, we present an augmentation method without policy searching called NOSE Augment (NO SEarch Augment). Our method completely skips policy searching; instead, it jointly applies multi-stage augmentation strategy and introduces more augmentation operations on top of a simple stochastic augmentation mechanism. With more augmentation operations, we boost the data diversity of stochastic augmentation; and with the phased complexity driven strategy, we ensure the whole training process converged smoothly to a good quality model. We conducted extensive experiments and showed that our method could match or surpass state-of-the-art results provided by search-based methods in terms of accuracies. Without the need for policy search, our method is much more efficient than the existing AutoAugment series of methods. Besides image classification, we also examine the general validity of our proposed method by applying our method to Face Recognition and Text Detection of the Optical Character Recognition (OCR) problems. The results establish our proposed method as a fast and competitive data augmentation strategy that can be used across various CV tasks.", "keywords": "data augmentation;stochastic policy;multi-stage augmentation", "primary_area": "", "supplementary_material": "", "author": "Qingrui Li;Song Xie;An\u0131l Oymagil;Mustafa Furkan Eseoglu;Ziyin Zhang;CM Lee", "authorids": "~Qingrui_Li1;xiesong@mail.nwpu.edu.cn;~An\u0131l_Oymagil1;~Mustafa_Furkan_Eseoglu1;zhangziyin1@huawei.com;~CM_Lee1", "gender": ";;M;;;", "homepage": ";;;;;", "dblp": ";;;;;https://dblp.uni-trier.de/pid/135/6208.html", "google_scholar": ";;;;;", "orcid": ";;;;;", "linkedin": "richardqingruili;;aniloymagil/;;;", "or_profile": "~Qingrui_Li1;xiesong@mail.nwpu.edu.cn;~An\u0131l_Oymagil1;~Mustafa_Furkan_Eseoglu1;zhangziyin1@huawei.com;~CM_Lee1", "aff": ";;Huawei Technologies Ltd.;;;", "aff_domain": ";;huawei.com;;;", "position": ";;Research Engineer;;;", "bibtex": "@misc{\nli2021nose,\ntitle={{\\{}NOSE{\\}} Augment: Fast and Effective Data Augmentation Without Searching},\nauthor={Qingrui Li and Song Xie and An{\\i}l Oymagil and Mustafa Furkan Eseoglu and Ziyin Zhang and CM Lee},\nyear={2021},\nurl={https://openreview.net/forum?id=3YQAVD9_Dz3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=3YQAVD9_Dz3", "pdf_size": 0, "rating": "3;4;5", "confidence": "5;3;4", "wc_review": "758;279;191", "wc_reply_reviewers": "153;0;0", "wc_reply_authors": "1047;838;1026", "reply_reviewers": "1;0;0", "reply_authors": "3;2;3", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 409.3333333333333, 249.14832708426698 ], "wc_reply_reviewers_avg": [ 51.0, 72.12489168102785 ], "wc_reply_authors_avg": [ 970.3333333333334, 93.96571478765834 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.6666666666666665, 0.4714045207910317 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:enzyZf2qAvkJ:scholar.google.com/&scioq=NOSE+Augment:+Fast+and+Effective+Data+Augmentation+Without+Searching&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Huawei", "aff_unique_dep": "Huawei Technologies", "aff_unique_url": "https://www.huawei.com", "aff_unique_abbr": "Huawei", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "3YdNZD5dMxI", "title": "Unconditional Synthesis of Complex Scenes Using a Semantic Bottleneck", "track": "main", "status": "Reject", "tldr": "", "abstract": "Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes. We assume pixel-wise segmentation labels are available during training and use them to learn the scene structure through an unconditional progressive segmentation generation network. During inference, our model first synthesizes a realistic segmentation layout from scratch, then synthesizes a realistic scene conditioned on that layout through a conditional segmentation-to-image synthesis network. When trained end-to-end, the resulting model outperforms state-of-the-art generative models in unsupervised image synthesis on two challenging domains in terms of the Frechet Inception Distance and perceptual evaluations. Moreover, we demonstrate that the end-to-end training significantly improves the segmentation-to-image synthesis sub-network, which results in superior performance over the state-of-the-art when conditioning on real segmentation layouts.", "keywords": "Unconditional Image Synthesis;Complex Scene;GAN;Semantic Bottleneck", "primary_area": "", "supplementary_material": "", "author": "Samaneh Azadi;Michael Tschannen;Eric Tzeng;Sylvain Gelly;Trevor Darrell;Mario Lucic", "authorids": "~Samaneh_Azadi1;~Michael_Tschannen1;~Eric_Tzeng1;~Sylvain_Gelly1;~Trevor_Darrell2;~Mario_Lucic1", "gender": ";;M;M;;M", "homepage": ";https://mitscha.github.io/;;;;http://lucic.ai", "dblp": ";134/9824;136/5767;;;155/1945", "google_scholar": ";https://scholar.google.ch/citations?user=TSj_8nYAAAAJ;;https://scholar.google.ch/citations?user=m7LvuTkAAAAJ;;SzZRlcMAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Samaneh_Azadi1;~Michael_Tschannen1;~Eric_Tzeng1;~Sylvain_Gelly1;~Trevor_Darrell2;~Mario_Lucic1", "aff": ";Apple;University of California, Berkeley;Google Brain;;Google", "aff_domain": ";apple.com;berkeley.edu;google.com;;deepmind.com", "position": ";Researcher;PhD student;Software Engineer;;Senior Staff Research Scientist", "bibtex": "@misc{\nazadi2021unconditional,\ntitle={Unconditional Synthesis of Complex Scenes Using a Semantic Bottleneck},\nauthor={Samaneh Azadi and Michael Tschannen and Eric Tzeng and Sylvain Gelly and Trevor Darrell and Mario Lucic},\nyear={2021},\nurl={https://openreview.net/forum?id=3YdNZD5dMxI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=3YdNZD5dMxI", "pdf_size": 0, "rating": "4;6;6;8", "confidence": "4;4;5;4", "wc_review": "553;286;196;295", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "766;307;367;177", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 332.5, 133.06107620187055 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 404.25, 219.8583350705631 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:tlRbsGgZyvkJ:scholar.google.com/&scioq=Unconditional+Synthesis+of+Complex+Scenes+Using+a+Semantic+Bottleneck&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "Apple;University of California, Berkeley;Google", "aff_unique_dep": "Apple Inc.;;Google Brain", "aff_unique_url": "https://www.apple.com;https://www.berkeley.edu;https://brain.google.com", "aff_unique_abbr": "Apple;UC Berkeley;Google Brain", "aff_campus_unique_index": "1;2;2", "aff_campus_unique": ";Berkeley;Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "3ZeGLibhFo0", "title": "Enabling counterfactual survival analysis with balanced representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Balanced representation learning methods have been applied successfully to counterfactual\ninference from observational data. However, approaches that account for\nsurvival outcomes are relatively limited. Survival data are frequently encountered\nacross diverse medical applications, i.e., drug development, risk profiling, and clinical\ntrials, and such data are also relevant in fields like manufacturing (for equipment\nmonitoring). When the outcome of interest is time-to-event, special precautions\nfor handling censored events need to be taken, as ignoring censored outcomes may\nlead to biased estimates. We propose a theoretically grounded unified framework\nfor counterfactual inference applicable to survival outcomes. Further, we formulate\na nonparametric hazard ratio metric for evaluating average and individualized\ntreatment effects. Experimental results on real-world and semi-synthetic datasets,\nthe latter which we introduce, demonstrate that the proposed approach significantly\noutperforms competitive alternatives in both survival-outcome predictions and\ntreatment-effect estimation.", "keywords": "survival analysis;time-to-event;counterfactual inference;causal survival analysis", "primary_area": "", "supplementary_material": "/attachment/63da9893304f49327a863dd9820567cab9495bfe.zip", "author": "Paidamoyo Chapfuwa;Serge Assaad;Shuxi Zeng;Michael Pencina;Lawrence Carin;Ricardo Henao", "authorids": "~Paidamoyo_Chapfuwa1;~Serge_Assaad1;zengshx777@gmail.com;michal.pencina@duke.edu;~Lawrence_Carin2;~Ricardo_Henao1", "gender": ";M;;;M;M", "homepage": "https://paidamoyo.github.io/;https://sergea.net;;;https://people.ee.duke.edu/~lcarin/;http://rhenaog.github.io", "dblp": "218/6496;267/5522;;;;27/3207", "google_scholar": "aPFR6-AAAAAJ;n-A4-zIAAAAJ;;;yuxwFscAAAAJ;p_mm4-YAAAAJ", "orcid": "0000-0003-0518-565X;;;;;0000-0003-4980-845X", "linkedin": ";sergeassaad/;;;;", "or_profile": "~Paidamoyo_Chapfuwa1;~Serge_Assaad1;zengshx777@gmail.com;michal.pencina@duke.edu;~Lawrence_Carin2;~Ricardo_Henao1", "aff": "Duke University;Duke University;;;Duke University;Duke University", "aff_domain": "duke.edu;duke.edu;;;duke.edu;duke.edu", "position": "PhD;PhD student;;;Full Professor;Assistant Professor", "bibtex": "@misc{\nchapfuwa2021enabling,\ntitle={Enabling counterfactual survival analysis with balanced representations},\nauthor={Paidamoyo Chapfuwa and Serge Assaad and Shuxi Zeng and Michael Pencina and Lawrence Carin and Ricardo Henao},\nyear={2021},\nurl={https://openreview.net/forum?id=3ZeGLibhFo0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=3ZeGLibhFo0", "pdf_size": 0, "rating": "4;5;7;7", "confidence": "5;4;4;4", "wc_review": "452;434;258;282", "wc_reply_reviewers": "207;0;0;0", "wc_reply_authors": "792;135;332;171", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 356.5, 87.14786285388759 ], "wc_reply_reviewers_avg": [ 51.75, 89.6336292916894 ], "wc_reply_authors_avg": [ 357.5, 261.5955848251266 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.7777777777777777, "gs_citation": 28, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10527634708750269367&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Duke University", "aff_unique_dep": "", "aff_unique_url": "https://www.duke.edu", "aff_unique_abbr": "Duke", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "3b76QBOlYW", "title": "Learned residual Gerchberg-Saxton network for computer generated holography", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Computer generated holography (CGH) aims to generate phase plates that create an intensity pattern at a certain distance behind the holography plate when illuminated. Since only the intensity and not the phase of the wave is of interest, this is an ill-defined inverse problem. Usually these problems are tackled by iterative optimization algorithms which are part of the convex optimization framework. These algorithms essentially minimize a loss using a forward model. Even though many of the tackled inverse problems are non-convex, these algorithms reach acceptable solutions by finding a local minimum. The ability of Deep Neural Networks to estimate a large range of functions has made a different approach to these problems possible. Instead of an iterative optimization algorithm that converges to a (sub-)optimal solution, the inverse problem can be solved by training a neural network to directly estimate the inverse operator. However simple convolutional neural networks tend to overfit when learning the inverse operator and do not generalize well outside the training distribution. Therefore this paper introduces a hybrid approach that can be interpreted as an unrolled Gerchberg-Saxton algorithm, which we term Learned Residual Gerchberg-Saxton (LRGS) network. We train this network for the generation of multi-focus computer generated holograms, and beat state-of-the-art existing methods.", "keywords": "computer generated holography;inverse problems;deep learning", "primary_area": "", "supplementary_material": "", "author": "Lennart Schlieder;Heiner Kremer;Valentin Volchkov;Kai Melde;Peer Fischer;Bernhard Sch\u00f6lkopf", "authorids": "~Lennart_Schlieder1;hkremer@tuebingen.mpg.de;valentin.volchkov@tuebinge.mpg.de;melde@is.mpg.de;fischer@is.mpg.de;~Bernhard_Sch\u00f6lkopf1", "gender": "M;;;;;", "homepage": "https://is.mpg.de/person/lschlieder;;;;;", "dblp": ";;;;;", "google_scholar": "0LlIFCwAAAAJ;;;;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Lennart_Schlieder1;hkremer@tuebingen.mpg.de;valentin.volchkov@tuebinge.mpg.de;melde@is.mpg.de;fischer@is.mpg.de;~Bernhard_Sch\u00f6lkopf1", "aff": "Max-Planck-Institute for Intelligent Systems, Max-Planck Institute;;;;;", "aff_domain": "is.mpg.de;;;;;", "position": "PhD student;;;;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=3b76QBOlYW", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "5;4;4;2", "wc_review": "324;420;1257;579", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 645.0, 364.8855985099988 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.899228803025897, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1650425541064142877&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Max-Planck-Institute for Intelligent Systems", "aff_unique_dep": "Intelligent Systems", "aff_unique_url": "https://www.mpi-is.mpg.de", "aff_unique_abbr": "MPI-IS", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "id": "3c3EhwbKoXw", "title": "Spectral Synthesis for Satellite-to-Satellite Translation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Earth observing satellites carrying multi-spectral sensors are widely used to monitor the physical and biological states of the atmosphere, land, and oceans. These satellites have different vantage points above the earth and different spectral imaging bands resulting in inconsistent imagery from one to another. This presents challenges in building downstream applications. What if we could generate synthetic bands for existing satellites from the union of all domains? We tackle the problem of generating synthetic spectral imagery for multispectral sensors as an unsupervised image-to-image translation problem with partial labels and introduce a novel shared spectral reconstruction loss. Simulated experiments performed by dropping one or more spectral bands show that cross-domain reconstruction outperforms measurements obtained from a second vantage point. On a downstream cloud detection task, we show that generating synthetic bands with our model improves segmentation performance beyond our baseline. Our proposed approach enables synchronization of multispectral data and provides a basis for more homogeneous remote sensing datasets.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Thomas Vandal;Daniel McDuff;Weile Wang;Andrew Michaelis;Ramakrishna Nemani", "authorids": "~Thomas_Vandal1;~Daniel_McDuff1;weile.wang@gmail.com;michaelis@hyperplane.org;rama.nemani@nasa.gov", "gender": "M;M;;;", "homepage": "https://thomasvandal.com/;http://alumni.media.mit.edu/~djmcduff/;;;", "dblp": "165/8172;63/9606;;;", "google_scholar": "8cLfchMAAAAJ;m7Jr-b4AAAAJ;;;", "orcid": ";;;;", "linkedin": "tjvandal/;;;;", "or_profile": "~Thomas_Vandal1;~Daniel_McDuff1;weile.wang@gmail.com;michaelis@hyperplane.org;rama.nemani@nasa.gov", "aff": "NASA Ames Research Center;Microsoft;;;", "aff_domain": "nasa.gov;microsoft.com;;;", "position": "Researcher;Principal Researcer;;;", "bibtex": "@misc{\nvandal2021spectral,\ntitle={Spectral Synthesis for Satellite-to-Satellite Translation},\nauthor={Thomas Vandal and Daniel McDuff and Weile Wang and Andrew Michaelis and Ramakrishna Nemani},\nyear={2021},\nurl={https://openreview.net/forum?id=3c3EhwbKoXw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=3c3EhwbKoXw", "pdf_size": 0, "rating": "5;5;6", "confidence": "4;5;4", "wc_review": "346;771;283", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "316;528;267", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 466.6666666666667, 216.72768371596854 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 370.3333333333333, 113.26762801240059 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:sjjs8B2WLeUJ:scholar.google.com/&scioq=Spectral+Synthesis+for+Satellite-to-Satellite+Translation&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "NASA Ames Research Center;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "https://ames.nasa.gov;https://www.microsoft.com", "aff_unique_abbr": "NASA Ames;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "3cCWBFRuZBI", "title": "Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas. However, most state-of-the-art deep learning models either fail to obtain uncertainty estimation or need significant modification (e.g., formulating a proper Bayesian treatment) to obtain it. Most previous methods are not able to take an arbitrary model off the shelf and generate uncertainty estimation without retraining or redesigning it. To address this gap, we perform a systematic exploration into training-free uncertainty estimation for dense regression, an unrecognized yet important problem, and provide a theoretical construction justifying such estimations. We propose three simple and scalable methods to analyze the variance of outputs from a trained network under tolerable perturbations: infer-transformation, infer-noise, and infer-dropout. They operate solely during inference, without the need to re-train, re-design, or fine-tune the model, as typically required by state-of-the-art uncertainty estimation methods. Surprisingly, even without involving such perturbations in training, our methods produce comparable or even better uncertainty estimation when compared to training-required state-of-the-art methods. ", "keywords": "training-free;uncertainty estimation;dense regression;super resolution;depth estimation;deep learning", "primary_area": "", "supplementary_material": "", "author": "Lu Mi;Hao Wang;Yonglong Tian;Nir Shavit", "authorids": "~Lu_Mi1;~Hao_Wang3;~Yonglong_Tian1;~Nir_Shavit1", "gender": "F;;M;M", "homepage": "https://lumimim.github.io;http://people.csail.mit.edu/yonglong/;http://people.csail.mit.edu/shanir/;http://www.wanghao.in", "dblp": "185/3258;151/6328;s/NirShavit;w/HaoWang-14", "google_scholar": "vokCG-MAAAAJ;https://scholar.google.com.hk/citations?user=OsP7JHAAAAAJ;;NrOA9QoAAAAJ", "orcid": ";;;", "linkedin": "lu-mi-698899172/;;;", "or_profile": "~Lu_Mi1;~Yonglong_Tian1;~Nir_Shavit1;~Hao_Wang4", "aff": "Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology;Rutgers University", "aff_domain": "mit.edu;mit.edu;mit.edu;cs.rutgers.edu", "position": "PhD student;PhD student;;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=3cCWBFRuZBI", "pdf_size": 0, "rating": "3;4;6", "confidence": "5;3;4", "wc_review": "555;308;289", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 1.247219128924647 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 384.0, 121.1638009748236 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3273268353539886, "gs_citation": 28, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14944580226621072242&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 12, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Massachusetts Institute of Technology;Rutgers University", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://www.rutgers.edu", "aff_unique_abbr": "MIT;Rutgers", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "3eNrIs9I78x", "title": "SALR: Sharpness-aware Learning Rates for Improved Generalization", "track": "main", "status": "Reject", "tldr": "", "abstract": "In an effort to improve generalization in deep learning, we propose SALR: a sharpness-aware learning rate update technique designed to recover flat minimizers. Our method dynamically updates the learning rate of gradient-based optimizers based on the local sharpness of the loss function. This allows optimizers to automatically increase learning rates at sharp valleys to increase the chance of escaping them. We demonstrate the effectiveness of SALR when adopted by various algorithms over a broad range of networks. Our experiments indicate that SALR improves generalization, converges faster, and drives solutions to significantly flatter regions. ", "keywords": "Loss-surface;sharpness;learning rate;generalization", "primary_area": "", "supplementary_material": "", "author": "Xubo Yue;Maher Nouiehed;Raed Al Kontar", "authorids": "~Xubo_Yue1;~Maher_Nouiehed1;~Raed_Al_Kontar1", "gender": ";M;M", "homepage": "https://sites.google.com/a/umich.edu/maxyxb/;;https://alkontar.engin.umich.edu/", "dblp": ";;216/2976", "google_scholar": ";ANPFix4AAAAJ;x0ZxAl4AAAAJ", "orcid": ";;0000-0002-4546-324X", "linkedin": ";;raed-kontar/", "or_profile": "~Xubo_Yue1;~Maher_Nouiehed1;~Raed_Al_Kontar1", "aff": "University of Michigan;American University of Beirut;University of Michigan - Ann Arbor", "aff_domain": "umich.edu;aub.edu.lb;umich.edu", "position": "PhD student;Assistant Professor;Assistant Professor", "bibtex": "@misc{\nyue2021salr,\ntitle={{\\{}SALR{\\}}: Sharpness-aware Learning Rates for Improved Generalization},\nauthor={Xubo Yue and Maher Nouiehed and Raed Al Kontar},\nyear={2021},\nurl={https://openreview.net/forum?id=3eNrIs9I78x}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=3eNrIs9I78x", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;5;3;3", "wc_review": "301;332;410;381", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "774;764;634;299", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 356.0, 42.2551771975932 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 617.75, 192.13845919024124 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6363636363636364, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9090653375523810322&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Michigan;American University of Beirut", "aff_unique_dep": ";", "aff_unique_url": "https://www.umich.edu;https://www.aub.edu.lb", "aff_unique_abbr": "UM;AUB", "aff_campus_unique_index": "1", "aff_campus_unique": ";Ann Arbor", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;Lebanon" }, { "title": "Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3081", "id": "3hGNqpI4WS", "poster": "", "openreview": "https://openreview.net/forum?id=3hGNqpI4WS", "slides": "https://iclr.cc/virtual/2021/poster/3081", "video": "https://iclr.cc/virtual/2021/poster/3081", "author_site": "Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu", "tldr": "", "abstract": "Most reinforcement learning (RL) algorithms assume online access to the environment, in which one may readily interleave updates to the policy with experience collection using that policy. However, in many real-world applications such as health, education, dialogue agents, and robotics, the cost or potential risk of deploying a new data-collection policy is high, to the point that it can become prohibitive to update the data-collection policy more than a few times during learning. With this view, we propose a novel concept of deployment efficiency, measuring the number of distinct data-collection policies that are used during policy learning. We observe that na\u00efvely applying existing model-free offline RL algorithms recursively does not lead to a practical deployment-efficient and sample-efficient algorithm. We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN), that not only performs better than or comparably as the state-of-the-art dynamic-programming-based and concurrently-proposed model-based offline approaches on existing benchmarks, but can also effectively optimize a policy offline using 10-20 times fewer data than prior works. Furthermore, the recursive application of BREMEN achieves impressive deployment efficiency while maintaining the same or better sample efficiency, learning successful policies from scratch on simulated robotic environments with only 5-10 deployments, compared to typical values of hundreds to millions in standard RL baselines.", "keywords": "Reinforcement Learning;deployment-efficiency;offline RL;Model-based RL", "primary_area": "", "supplementary_material": "/attachment/1cf3a632bc4e5b3ebcab2b64fee70a104d3f92ed.zip", "author": "Tatsuya Matsushima;Hiroki Furuta;Yutaka Matsuo;Ofir Nachum;Shixiang Gu", "authorids": "~Tatsuya_Matsushima1;~Hiroki_Furuta1;~Yutaka_Matsuo1;~Ofir_Nachum1;~Shixiang_Gu1", "gender": "M;M;M;M;M", "homepage": "http://t-matsushima.com/;https://github.com/frt03;http://ymatsuo.com;https://scholar.google.com/citations?user=C-ZlBWMAAAAJ&hl=en;https://sites.google.com/view/gugurus/home", "dblp": "238/2850;267/2065;m/YMatsuo.html;;121/0550", "google_scholar": "Wyn6BtsAAAAJ;M0OhM1UAAAAJ;Dy8iau4AAAAJ;C-ZlBWMAAAAJ;B8wslVsAAAAJ", "orcid": "0000-0002-1537-7770;;;;", "linkedin": "tatsuya-matsushima/;;;;", "or_profile": "~Tatsuya_Matsushima1;~Hiroki_Furuta1;~Yutaka_Matsuo1;~Ofir_Nachum1;~Shixiang_Gu1", "aff": "The University of Tokyo;The University of Tokyo;The University of Tokyo;OpenAI;Google", "aff_domain": "u-tokyo.ac.jp;u-tokyo.ac.jp;u-tokyo.ac.jp;openai.com;google.com", "position": "PhD student;MS student;Associate Professor;Researcher;Senior Research Scientist", "bibtex": "@inproceedings{\nmatsushima2021deploymentefficient,\ntitle={Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization},\nauthor={Tatsuya Matsushima and Hiroki Furuta and Yutaka Matsuo and Ofir Nachum and Shixiang Gu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3hGNqpI4WS}\n}", "github": "[![github](/images/github_icon.svg) matsuolab/BREMEN](https://github.com/matsuolab/BREMEN)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "5;7;7;8", "confidence": "4;3;4;4", "wc_review": "278;492;380;416", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1140;234;391;310", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 391.5, 76.99837660626359 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 518.75, 362.9499793359961 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.13245323570650439, "gs_citation": 179, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6135669671400204615&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=3hGNqpI4WS", "email": "u-tokyo.ac.jp;u-tokyo.ac.jp;u-tokyo.ac.jp;openai.com;google.com", "author_num": 5, "aff_unique_index": "0;0;0;1;2", "aff_unique_norm": "University of Tokyo;OpenAI;Google", "aff_unique_dep": ";;Google", "aff_unique_url": "https://www.u-tokyo.ac.jp;https://openai.com;https://www.google.com", "aff_unique_abbr": "UTokyo;OpenAI;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0;1;1", "aff_country_unique": "Japan;United States" }, { "id": "3jJKpFbLkU2", "title": "Amortized Conditional Normalized Maximum Likelihood", "track": "main", "status": "Reject", "tldr": "", "abstract": "While deep neural networks provide good performance for a range of challenging tasks, calibration and uncertainty estimation remain major challenges. In this paper, we propose the amortized conditional normalized maximum likelihood (ACNML)\nmethod as a scalable general-purpose approach for uncertainty estimation, calibration, and out-of-distribution robustness with deep networks. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle, but is computationally intractable to evaluate exactly for all but the simplest of model classes. We propose to use approximate Bayesian inference technqiues to produce a tractable approximation to the CNML distribution. Our approach can be combined with any approximate inference algorithm that provides tractable posterior densities over model parameters. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of and calibration on out-of-distribution inputs.", "keywords": "Uncertainty Estimation;Calibration", "primary_area": "", "supplementary_material": "/attachment/d4861e2ad51b6442f96a78d594630f2844f5bef7.zip", "author": "Aurick Zhou;Sergey Levine", "authorids": "~Aurick_Zhou1;~Sergey_Levine1", "gender": ";M", "homepage": ";https://people.eecs.berkeley.edu/~svlevine/", "dblp": "213/7312;80/7594", "google_scholar": "1O83J5MAAAAJ;8R35rCwAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Aurick_Zhou1;~Sergey_Levine1", "aff": "University of California, Berkeley;Google", "aff_domain": "berkeley.edu;google.com", "position": "PhD student;Research Scientist", "bibtex": "@misc{\nzhou2021amortized,\ntitle={Amortized Conditional Normalized Maximum Likelihood},\nauthor={Aurick Zhou and Sergey Levine},\nyear={2021},\nurl={https://openreview.net/forum?id=3jJKpFbLkU2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=3jJKpFbLkU2", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;3;3;3", "wc_review": "208;195;351;440", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "428;336;744;397", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 298.5, 102.07962578301313 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 476.25, 158.08917578379615 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1781025156733644722&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1", "aff_unique_norm": "University of California, Berkeley;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.berkeley.edu;https://www.google.com", "aff_unique_abbr": "UC Berkeley;Google", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Berkeley;Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Meta Back-Translation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2541", "id": "3jjmdp7Hha", "poster": "", "openreview": "https://openreview.net/forum?id=3jjmdp7Hha", "slides": "https://iclr.cc/virtual/2021/poster/2541", "video": "https://iclr.cc/virtual/2021/poster/2541", "author_site": "Hieu Pham, Xinyi Wang, Yiming Yang, Graham Neubig", "tldr": "", "abstract": "Back-translation is an effective strategy to improve the performance of Neural Machine Translation~(NMT) by generating pseudo-parallel data. However, several recent works have found that better translation quality in the pseudo-parallel data does not necessarily lead to a better final translation model, while lower-quality but diverse data often yields stronger results instead.\nIn this paper we propose a new way to generate pseudo-parallel data for back-translation that directly optimizes the final model performance. Specifically, we propose a meta-learning framework where the back-translation model learns to match the forward-translation model's gradients on the development data with those on the pseudo-parallel data. In our evaluations in both the standard datasets WMT En-De'14 and WMT En-Fr'14, as well as a multilingual translation setting, our method leads to significant improvements over strong baselines. ", "keywords": "meta learning;machine translation;back translation", "primary_area": "", "supplementary_material": "/attachment/632521a4b5d41b4e85249ad11abe1e0472165652.zip", "author": "Hieu Pham;Xinyi Wang;Yiming Yang;Graham Neubig", "authorids": "~Hieu_Pham1;~Xinyi_Wang1;~Yiming_Yang1;~Graham_Neubig1", "gender": "M;F;F;M", "homepage": ";;http://www.cs.cmu.edu/~yiming/;http://phontron.com", "dblp": ";;25/1666;03/8155", "google_scholar": "GpcGdRkAAAAJ;https://scholar.google.com/citations?view_op=list_works;MlZq4XwAAAAJ;wlosgkoAAAAJ", "orcid": ";;0000-0001-8322-607X;", "linkedin": ";;yiming-yang-24100924/;", "or_profile": "~Hieu_Pham1;~Xinyi_Wang1;~Yiming_Yang1;~Graham_Neubig1", "aff": "Carnegie Mellon University;School of Computer Science, Carnegie Mellon University;School of Computer Science, Carnegie Mellon University;Carnegie Mellon University", "aff_domain": "cmu.edu;cs.cmu.edu;cs.cmu.edu;cmu.edu", "position": "PhD student;PhD student;Full Professor;Associate Professor", "bibtex": "@inproceedings{\npham2021meta,\ntitle={Meta Back-Translation},\nauthor={Hieu Pham and Xinyi Wang and Yiming Yang and Graham Neubig},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3jjmdp7Hha}\n}", "github": "[![github](/images/github_icon.svg) google-research/google-research](https://github.com/google-research/google-research)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;4;4;5", "wc_review": "683;335;405;297", "wc_reply_reviewers": "0;0;295;0", "wc_reply_authors": "799;274;939;90", "reply_reviewers": "0;0;3;0", "reply_authors": "1;1;4;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 430.0, 151.11915828246265 ], "wc_reply_reviewers_avg": [ 73.75, 127.7387470582047 ], "wc_reply_authors_avg": [ 525.5, 353.09241000055493 ], "reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 33, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8104983143273406902&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=3jjmdp7Hha", "email": "cmu.edu;cs.cmu.edu;cs.cmu.edu;cmu.edu", "author_num": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Pre-training Text-to-Text Transformers for Concept-centric Common Sense", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3272", "id": "3k20LAiHYL2", "poster": "", "openreview": "https://openreview.net/forum?id=3k20LAiHYL2", "slides": "https://iclr.cc/virtual/2021/poster/3272", "video": "https://iclr.cc/virtual/2021/poster/3272", "author_site": "Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Xiang Ren", "tldr": "", "abstract": "Pretrained language models (PTLM) have achieved impressive results in a range of natural language understanding (NLU) and generation (NLG) tasks that require a syntactic and semantic understanding of the text. However, current pre-training objectives such as masked token prediction (for BERT-style PTLMs) and masked span infilling (for T5-style PTLMs) do not explicitly model the relational and compositional commonsense knowledge about everyday concepts, which is crucial to many downstream tasks requiring commonsense reasoning. To augment PTLMs with common sense, we propose generative and contrastive objectives as intermediate self-supervised pre-training tasks between general pre-training and downstream task-specific fine-tuning. We also propose a joint training framework to unify generative and contrastive objectives so that these objectives can be more effective.\nOur proposed objectives can pack more commonsense knowledge into the parameters of a pre-trained text-to-text transformer without relying on external knowledge bases, yielding better performance on both NLU and NLG tasks. We apply our method on a pre-trained T5 model in an intermediate task transfer learning fashion to train a concept-aware language model (CALM) and experiment with five commonsense benchmarks (four NLU tasks and one NLG task). Experimental results show that CALM outperforms baseline methods by a consistent margin.", "keywords": "Language Model Pre-training;Commonsense Reasoning;Self-supervised Learning", "primary_area": "", "supplementary_material": "/attachment/95c40476c9644ca56b6dc6cbf56ee5c2a64eb8fd.zip", "author": "Wangchunshu Zhou;Dong-Ho Lee;Ravi Kiran Selvam;Seyeon Lee;Xiang Ren", "authorids": "~Wangchunshu_Zhou1;~Dong-Ho_Lee1;~Ravi_Kiran_Selvam1;~Seyeon_Lee1;~Xiang_Ren1", "gender": "M;M;M;F;M", "homepage": "https://michaelzhouwang.github.io;https://danny-lee.info;https://www.sravikiran.com/;;https://shanzhenren.github.io/", "dblp": "245/8640.html;;;131/8900;36/360-1", "google_scholar": "UebIjuQAAAAJ;oei2TXwAAAAJ;;;_moJlrIAAAAJ", "orcid": ";;;;", "linkedin": ";;ravikiran0606/;seyeon-lee-5882b1192/;xren7", "or_profile": "~Wangchunshu_Zhou1;~Dong-Ho_Lee1;~Ravi_Kiran_Selvam1;~Seyeon_Lee1;~Xiang_Ren1", "aff": "Microsoft Research Aisa;University of Southern California;University of Southern California;University of Southern California;University of Southern California", "aff_domain": "microsoft.com;usc.edu;usc.edu;usc.edu;usc.edu", "position": "Intern;PhD student;MS student;MS student;Associate Professor", "bibtex": "@inproceedings{\nzhou2021pretraining,\ntitle={Pre-training Text-to-Text Transformers for Concept-centric Common Sense},\nauthor={Wangchunshu Zhou and Dong-Ho Lee and Ravi Kiran Selvam and Seyeon Lee and Xiang Ren},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3k20LAiHYL2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "4;7;8;8", "confidence": "4;4;4;4", "wc_review": "415;772;547;577", "wc_reply_reviewers": "0;388;0;0", "wc_reply_authors": "1790;1266;1033;1354", "reply_reviewers": "0;1;0;0", "reply_authors": "5;3;2;3", "rating_avg": [ 6.75, 1.6393596310755 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 577.75, 127.63889493410697 ], "wc_reply_reviewers_avg": [ 97.0, 168.0089283341811 ], "wc_reply_authors_avg": [ 1360.75, 274.18002753665337 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 3.25, 1.0897247358851685 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 75, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8101587242954788676&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=3k20LAiHYL2", "email": "microsoft.com;usc.edu;usc.edu;usc.edu;usc.edu", "author_num": 5, "aff_unique_index": "0;1;1;1;1", "aff_unique_norm": "Microsoft;University of Southern California", "aff_unique_dep": "Microsoft Research;", "aff_unique_url": "https://www.microsoft.com/en-us/research/group/asia;https://www.usc.edu", "aff_unique_abbr": "MSR Asia;USC", "aff_campus_unique_index": "0;1;1;1;1", "aff_campus_unique": "Beijing;Los Angeles", "aff_country_unique_index": "0;1;1;1;1", "aff_country_unique": "China;United States" }, { "id": "3l4Dlrgm92Q", "title": "Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Backdoor attack against deep neural networks is currently being profoundly investigated due to its severe security consequences. Current state-of-the-art backdoor attacks require the adversary to modify the input, usually by adding a trigger to it, for the target model to activate the backdoor. This added trigger not only increases the difficulty of launching the backdoor attack in the physical world, but also can be easily detected by multiple defense mechanisms. In this paper, we present the first triggerless backdoor attack against deep neural networks, where the adversary does not need to modify the input for triggering the backdoor. Our attack is based on the dropout technique. Concretely, we associate a set of target neurons that are dropped out during model training with the target label. In the prediction phase, the model will output the target label when the target neurons are dropped again, i.e., the backdoor attack is launched. This triggerless feature of our attack makes it practical in the physical world. Extensive experiments show that our triggerless backdoor attack achieves a perfect attack success rate with a negligible damage to the model's utility.", "keywords": "Backdoor attack;Machine learning security", "primary_area": "", "supplementary_material": "", "author": "Ahmed Salem;Michael Backes;Yang Zhang", "authorids": "~Ahmed_Salem2;~Michael_Backes1;~Yang_Zhang15", "gender": ";M;M", "homepage": ";https://yangzhangalmo.github.io/;https://ahmedsalem2.github.io/", "dblp": ";06/6785-16;41/506-1", "google_scholar": ";Xeb2888AAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";0000-0003-3612-7348;", "linkedin": ";;", "or_profile": "~Michael_Backes1;~Yang_Zhang15;~ahmed_salem1", "aff": ";CISPA Helmholtz Center for Information Security;The CISPA Helmholtz Center for Information Security", "aff_domain": ";cispa.de;cispa.saarland", "position": ";Assistant Professor;PhD student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer5;AnonReviewer2", "site": "https://openreview.net/forum?id=3l4Dlrgm92Q", "pdf_size": 0, "rating": "3;3;5", "confidence": "4;4;4", "wc_review": "525;1456;263", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 3.6666666666666665, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 748.0, 511.9303338020386 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 41, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10846625911396342704&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "CISPA Helmholtz Center for Information Security", "aff_unique_dep": "", "aff_unique_url": "https://www.cispa.de/", "aff_unique_abbr": "CISPA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "3nSU-sDEOG9", "title": "Empirical Sufficiency Featuring Reward Delay Calibration", "track": "main", "status": "Reject", "tldr": "", "abstract": "Appropriate credit assignment for delay rewards is a fundamental challenge in various deep reinforcement learning tasks. To tackle this problem, we introduce a delay reward calibration paradigm inspired from a classification perspective. We hypothesize that when an agent's behavior satisfies an equivalent sufficient condition to be awarded, well-represented state vectors should share similarities. To this end, we define an empirical sufficient distribution, where the state vectors within the distribution will lead agents to environmental reward signals in consequent steps. Therefore, an overfitting classifier is established to handle the distribution and generate calibrated rewards. We examine the correctness of sufficient state extraction by tracking the real-time extraction and building hybrid different reward functions in environments with different levels of awarding latency. The results demonstrate that the classifier could generate timely and accurate calibrated rewards, and the rewards could make the training more efficient. Finally, we find that the sufficient states extracted by our model resonate with observations of human cognition.", "keywords": "Deep Reinforcement Learning;Reward Calibration;Empirical Sufficiency;Overfitting.", "primary_area": "", "supplementary_material": "", "author": "Yixuan Liu;Hu Wang;Xiaowei Wang;Xiaoyue Sun;Liuyue Jiang;Minhui Xue", "authorids": "~Yixuan_Liu1;~Hu_Wang1;xiaowei.wang01@student.adelaide.edu.au;a1782027@student.adelaide.edu.au;liuyue.jiang@adelaide.edu.au;jason.xue@adelaide.edu.au", "gender": "M;M;;;;", "homepage": ";https://huwang01.github.io/;;;;", "dblp": ";62/2712-5.html;;;;", "google_scholar": ";https://scholar.google.com.au/citations?user=K_6dgCgAAAAJ;;;;", "orcid": ";0000-0003-1725-873X;;;;", "linkedin": "yixuan-liu-0720a9135;;;;;", "or_profile": "~Yixuan_Liu1;~Hu_Wang1;xiaowei.wang01@student.adelaide.edu.au;a1782027@student.adelaide.edu.au;liuyue.jiang@adelaide.edu.au;jason.xue@adelaide.edu.au", "aff": "The University of Adelaide;The University of Adelaide;;;;", "aff_domain": "adelaide.edu.au;adelaide.edu.au;;;;", "position": "PhD student;PhD student;;;;", "bibtex": "@misc{\nliu2021empirical,\ntitle={Empirical Sufficiency Featuring Reward Delay Calibration},\nauthor={Yixuan Liu and Hu Wang and Xiaowei Wang and Xiaoyue Sun and Liuyue Jiang and Minhui Xue},\nyear={2021},\nurl={https://openreview.net/forum?id=3nSU-sDEOG9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=3nSU-sDEOG9", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "3;4;3;3", "wc_review": "308;252;197;585", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "86;295;242;694", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 335.5, 149.29919624699926 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 329.25, 224.1644206826766 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:lGheJ3-L7mMJ:scholar.google.com/&scioq=Empirical+Sufficiency+Featuring+Reward+Delay+Calibration&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Adelaide", "aff_unique_dep": "", "aff_unique_url": "https://www.adelaide.edu.au", "aff_unique_abbr": "Adelaide", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Australia" }, { "title": "Implicit Gradient Regularization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3150", "id": "3q5IqUrkcF", "poster": "", "openreview": "https://openreview.net/forum?id=3q5IqUrkcF", "slides": "https://iclr.cc/virtual/2021/poster/3150", "video": "https://iclr.cc/virtual/2021/poster/3150", "author_site": "David Barrett, Benoit Dherin", "tldr": "", "abstract": "Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient descent trajectories that have large loss gradients. We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization. We confirm empirically that implicit gradient regularization biases gradient descent toward flat minima, where test errors are small and solutions are robust to noisy parameter perturbations. Furthermore, we demonstrate that the implicit gradient regularization term can be used as an explicit regularizer, allowing us to control this gradient regularization directly. More broadly, our work indicates that backward error analysis is a useful theoretical approach to the perennial question of how learning rate, model size, and parameter regularization interact to determine the properties of overparameterized models optimized with gradient descent.", "keywords": "implicit regularization;deep learning;deep learning theory;theoretical issues in deep learning;theory;regularization", "primary_area": "", "supplementary_material": "", "author": "David Barrett;Benoit Dherin", "authorids": "~David_Barrett1;dherin@google.com", "gender": ";", "homepage": ";", "dblp": ";", "google_scholar": "Whh_d2EAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~David_Barrett1;dherin@google.com", "aff": "Google;", "aff_domain": "google.com;", "position": "Research Scientist;", "bibtex": "@inproceedings{\nbarrett2021implicit,\ntitle={Implicit Gradient Regularization},\nauthor={David Barrett and Benoit Dherin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3q5IqUrkcF}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;4;5", "wc_review": "358;2909;953", "wc_reply_reviewers": "53;679;0", "wc_reply_authors": "1505;2322;1202", "reply_reviewers": "1;2;0", "reply_authors": "4;5;3", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 1406.6666666666667, 1089.7278967195841 ], "wc_reply_reviewers_avg": [ 244.0, 308.3515309945236 ], "wc_reply_authors_avg": [ 1676.3333333333333, 473.01609087030243 ], "reply_reviewers_avg": [ 1.0, 0.816496580927726 ], "reply_authors_avg": [ 4.0, 0.816496580927726 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.9999999999999998, "gs_citation": 189, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5741489315826027816&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=3q5IqUrkcF", "email": "google.com;", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "3rRgu7OGgBI", "title": "Bi-tuning of Pre-trained Representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "It is common within the deep learning community to first pre-train a deep neural network from a large-scale dataset and then fine-tune the pre-trained model to a specific downstream task. Recently, both supervised and unsupervised pre-training approaches to learning representations have achieved remarkable advances, which exploit the discriminative knowledge of labels and the intrinsic structure of data, respectively. It follows natural intuition that both discriminative knowledge and intrinsic structure of the downstream task can be useful for fine-tuning, however, existing fine-tuning methods mainly leverage the former and discard the latter. A question arises: How to fully explore the intrinsic structure of data for boosting fine-tuning? In this paper, we propose Bi-tuning, a general learning framework to fine-tuning both supervised and unsupervised pre-trained representations to downstream tasks. Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations: a classifier head with an improved contrastive cross-entropy loss to better leverage the label information in an instance-contrast way, and a projector head with a newly-designed categorical contrastive learning loss to fully exploit the intrinsic structure of data in a category-consistent way. Comprehensive experiments confirm that Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins (e.g.~10.7\\% absolute rise in accuracy on CUB in low-data regime).", "keywords": "Deep learning;fine-tuning;pre-training", "primary_area": "", "supplementary_material": "", "author": "Jincheng Zhong;Ximei Wang;Zhi Kou;Jianmin Wang;Mingsheng Long", "authorids": "zhongjinchengwork@gmail.com;~Ximei_Wang1;kz19@mails.tsinghua.edu.cn;~Jianmin_Wang1;~Mingsheng_Long5", "gender": ";M;;M;", "homepage": ";https://wxm17.github.io/;;https://www.thss.tsinghua.edu.cn/en/faculty/jianminwang.htm;", "dblp": ";89/8876;;06/3456-1.html;", "google_scholar": ";WmOCCVgAAAAJ;;https://scholar.google.com.tw/citations?user=MiovcboAAAAJ;", "orcid": ";;;0000-0001-6841-7943;", "linkedin": ";;;;", "or_profile": "zhongjinchengwork@gmail.com;~Ximei_Wang1;kz19@mails.tsinghua.edu.cn;~Jianmin_Wang1;~Mingsheng_Long5", "aff": ";Tsinghua University;;Tsinghua University;", "aff_domain": ";tsinghua.edu.cn;;tsinghua.edu.cn;", "position": ";PhD student;;Full Professor;", "bibtex": "@misc{\nzhong2021bituning,\ntitle={Bi-tuning of Pre-trained Representations},\nauthor={Jincheng Zhong and Ximei Wang and Zhi Kou and Jianmin Wang and Mingsheng Long},\nyear={2021},\nurl={https://openreview.net/forum?id=3rRgu7OGgBI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=3rRgu7OGgBI", "pdf_size": 0, "rating": "4;4;5;8", "confidence": "5;4;3;4", "wc_review": "577;336;791;737", "wc_reply_reviewers": "0;0;26;0", "wc_reply_authors": "542;586;1434;165", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 5.25, 1.6393596310755 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 610.25, 176.81540515464144 ], "wc_reply_reviewers_avg": [ 6.5, 11.258330249197702 ], "wc_reply_authors_avg": [ 681.75, 464.1144120796078 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.2156655464068768, "gs_citation": 27, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9754497455638478514&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Tsinghua University", "aff_unique_dep": "", "aff_unique_url": "https://www.tsinghua.edu.cn", "aff_unique_abbr": "THU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "title": "Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2713", "id": "3tFAs5E-Pe", "poster": "", "openreview": "https://openreview.net/forum?id=3tFAs5E-Pe", "slides": "https://iclr.cc/virtual/2021/poster/2713", "video": "https://iclr.cc/virtual/2021/poster/2713", "author_site": "Alexander Korotin, Lingxiao Li, Justin Solomon, Evgeny Burnaev", "tldr": "", "abstract": "Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport. In this paper, we present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures, which are not restricted to being discrete. While past approaches rely on entropic or quadratic regularization, we employ input convex neural networks and cycle-consistency regularization to avoid introducing bias. As a result, our approach does not resort to minimax optimization. We provide theoretical analysis on error bounds as well as empirical evidence of the effectiveness of the proposed approach in low-dimensional qualitative scenarios and high-dimensional quantitative experiments.", "keywords": "wasserstein-2 barycenters;non-minimax optimization;cycle-consistency regularizer;input convex neural networks;continuous case", "primary_area": "", "supplementary_material": "/attachment/8635c88d2a2e34d1c1873ee4a0244e5463958e96.zip", "author": "Alexander Korotin;Lingxiao Li;Justin Solomon;Evgeny Burnaev", "authorids": "~Alexander_Korotin2;lingxiao@mit.edu;~Justin_Solomon1;~Evgeny_Burnaev1", "gender": ";;M;M", "homepage": ";;http://people.csail.mit.edu/jsolomon/;http://faculty.skoltech.ru/people/evgenyburnaev", "dblp": ";;80/5094;144/7845", "google_scholar": ";;pImSVwoAAAAJ;https://scholar.google.ru/citations?user=pCRdcOwAAAAJ", "orcid": ";;0000-0002-7701-7586;0000-0001-8424-0690", "linkedin": ";;justin-solomon-8a587914/;", "or_profile": "~Alexander_Korotin2;lingxiao@mit.edu;~Justin_Solomon1;~Evgeny_Burnaev1", "aff": ";;Massachusetts Institute of Technology;Skolkovo Institute of Science and Technology", "aff_domain": ";;mit.edu;skoltech.ru", "position": ";;Associate Professor;Associate Professor", "bibtex": "@inproceedings{\nkorotin2021continuous,\ntitle={Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization},\nauthor={Alexander Korotin and Lingxiao Li and Justin Solomon and Evgeny Burnaev},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=3tFAs5E-Pe}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=3tFAs5E-Pe)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;4;3;5", "wc_review": "267;139;391;206", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "495;336;167;326", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 250.75, 92.76953972075101 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 331.0, 116.0193949303305 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 57, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3747206149818093427&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=3tFAs5E-Pe", "email": ";;mit.edu;skoltech.ru", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Massachusetts Institute of Technology;Skolkovo Institute of Science and Technology", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://www.skoltech.ru", "aff_unique_abbr": "MIT;Skoltech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Russian Federation" }, { "id": "3teh9zI0j4L", "title": "Quantifying Exposure Bias for Open-ended Language Generation", "track": "main", "status": "Reject", "tldr": "", "abstract": "The exposure bias problem refers to the incrementally distorted generation induced by the training-generation discrepancy, in teacher-forcing training for auto-regressive neural network language models (LM). It has been regarded as a central problem for LMs trained for open-ended language generation. Although a lot of algorithms have been proposed to avoid teacher forcing and therefore alleviate exposure bias, there is little work showing how serious the exposure bias problem actually is. In this work, we propose novel metrics to quantify the impact of exposure bias in the generation of MLE-trained LMs. Our key intuition is that if we feed ground-truth data prefixes (instead of prefixes generated by the model itself) into the model and ask it to continue the generation, the performance should become much better because the training-generation discrepancy in the prefix is removed. We conduct both automatic and human evaluation in our experiments, and our observations are two-fold: (1) We confirm that the prefix discrepancy indeed induces some level of performance loss. (2) However, the induced distortion seems to be limited, and is not incremental during the generation, which contradicts the claim of exposure bias.", "keywords": "exposure bias;natural language generation;autoregressive", "primary_area": "", "supplementary_material": "", "author": "Tianxing He;Jingzhao Zhang;Zhiming Zhou;James R. Glass", "authorids": "~Tianxing_He1;~Jingzhao_Zhang2;~Zhiming_Zhou2;~James_R._Glass1", "gender": "M;M;M;", "homepage": "https://cloudygoose.github.io/;https://sites.google.com/view/jingzhao/home;https://zhimingzhou.github.io/;", "dblp": "149/0111;220/5559;56/321-2.html;", "google_scholar": "egmfjjwAAAAJ;8NudxYsAAAAJ;b8YJ1EMAAAAJ;", "orcid": ";;0000-0002-2407-961X;", "linkedin": ";;;", "or_profile": "~Tianxing_He1;~Jingzhao_Zhang2;~Zhiming_Zhou2;~James_R._Glass1", "aff": "Massachusetts Institute of Technology;Massachusetts Institute of Technology;Shanghai University of Finance and Economics;", "aff_domain": "mit.edu;mit.edu;shufe.edu.cn;", "position": "PhD student;PhD student;Assistant Professor;", "bibtex": "@misc{\nhe2021quantifying,\ntitle={Quantifying Exposure Bias for Open-ended Language Generation},\nauthor={Tianxing He and Jingzhao Zhang and Zhiming Zhou and James R. Glass},\nyear={2021},\nurl={https://openreview.net/forum?id=3teh9zI0j4L}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=3teh9zI0j4L", "pdf_size": 0, "rating": "3;3;6;6", "confidence": "4;4;4;4", "wc_review": "925;1200;751;287", "wc_reply_reviewers": "1048;0;16;109", "wc_reply_authors": "681;696;290;354", "reply_reviewers": "2;0;1;1", "reply_authors": "1;1;1;2", "rating_avg": [ 4.5, 1.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 790.75, 331.98371571509347 ], "wc_reply_reviewers_avg": [ 293.25, 437.73814946837797 ], "wc_reply_authors_avg": [ 505.25, 184.71785917988547 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17631346829930178668&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "Massachusetts Institute of Technology;Shanghai University of Finance and Economics", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;http://www.sufe.edu.cn", "aff_unique_abbr": "MIT;SUFE", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;China" }, { "id": "3u3ny6UYmjy", "title": "RetCL: A Selection-based Approach for Retrosynthesis via Contrastive Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules).\nIn this paper, we propose a new approach that mitigates the issues by reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules. To this end, we design an efficient reactant selection framework, named RetCL (retrosynthesis via contrastive learning), for enumerating all of the candidate molecules based on selection scores computed by graph neural networks. For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining. Extensive experiments demonstrate the benefits of the proposed selection-based approach. For example, when all 671k reactants in the USPTO database are given as candidates, our RetCL achieves top-1 exact match accuracy of 71.3% for the USPTO-50k benchmark, while a recent transformer-based approach achieves 59.6%. We also demonstrate that RetCL generalizes well to unseen templates in various settings in contrast to template-based approaches. The code will be released.", "keywords": "molecule;retrosynthesis;contrastive learning;graph representation learning", "primary_area": "", "supplementary_material": "", "author": "Hankook Lee;Sungsoo Ahn;Seung-Woo Seo;You Young Song;Eunho Yang;Sung Ju Hwang;Jinwoo Shin", "authorids": "~Hankook_Lee1;~Sungsoo_Ahn1;~Seung-Woo_Seo3;~You_Young_Song1;~Eunho_Yang1;~Sung_Ju_Hwang1;~Jinwoo_Shin1", "gender": "M;M;M;F;M;;M", "homepage": "https://hankook.github.io;https://sungsooahn.super.site/;https://scholar.google.co.kr/citations?user=ZPT4bv0AAAAJ&hl=en;;https://sites.google.com/site/hleehome2/;;https://sites.google.com/site/mijirim/", "dblp": "223/4393;90/5164;;;96/2621;;31/7062", "google_scholar": "CgqswXUAAAAJ;XTenHs0AAAAJ;https://scholar.google.co.kr/citations?user=ZPT4bv0AAAAJ;LMdfID0AAAAJ;;;https://scholar.google.com.tw/citations?user=m3eDp7kAAAAJ", "orcid": ";;;;;;", "linkedin": ";;;;;;", "or_profile": "~Hankook_Lee1;~Sungsoo_Ahn1;~Seung-Woo_Seo3;~You_Young_Song1;~Eunho_Yang1;~Sung_Ju_Hwang1;~Jinwoo_Shin1", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;;;Korea Advanced Institute of Science & Technology;;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr;;;kaist.ac.kr;;kaist.ac.kr", "position": "PhD student;PhD student;;;Associate Professor;;Associate Professor", "bibtex": "@misc{\nlee2021retcl,\ntitle={Ret{\\{}CL{\\}}: A Selection-based Approach for Retrosynthesis via Contrastive Learning},\nauthor={Hankook Lee and Sungsoo Ahn and Seung-Woo Seo and You Young Song and Eunho Yang and Sung Ju Hwang and Jinwoo Shin},\nyear={2021},\nurl={https://openreview.net/forum?id=3u3ny6UYmjy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=3u3ny6UYmjy", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "5;5;5;4", "wc_review": "403;505;204;371", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "341;478;388;375", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 370.75, 108.24595835411131 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 395.5, 50.62854925829892 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 23, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17394420502338883096&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 8, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "3uiR9bkbDjL", "title": "ColdExpand: Semi-Supervised Graph Learning in Cold Start", "track": "main", "status": "Reject", "tldr": "", "abstract": "Most real-world graphs are dynamic and eventually face the cold start problem. A fundamental question is how the new cold nodes acquire initial information in order to be adapted into the existing graph. Here we postulates the cold start problem as a fundamental issue in graph learning and propose a new learning setting, \"``Expanded Semi-supervised Learning.\" In expanded semi-supervised learning we extend the original semi-supervised learning setting even to new cold nodes that are disconnected from the graph. To this end, we propose ColdExpand model that classifies the cold nodes based on link prediction with multiple goals to tackle. We experimentally prove that by adding additional goal to existing link prediction method, our method outperforms the baseline in both expanded semi-supervised link prediction (at most 24\\%) and node classification tasks (at most 15%). To the best of our knowledge this is the first study to address expansion of semi-supervised learning to unseen nodes.", "keywords": "Graph Neural Networks;Cold Start;Semi-supervised Learning", "primary_area": "", "supplementary_material": "", "author": "Il-Jae Kwon;Kyoung-Woon On;Dong-Geon Lee;Byoung-Tak Zhang", "authorids": "~Il-Jae_Kwon1;~Kyoung-Woon_On1;~Dong-Geon_Lee1;~Byoung-Tak_Zhang1", "gender": "M;M;M;", "homepage": ";https://bi.snu.ac.kr/~btzhang/;;", "dblp": "175/0873;09/5682;;", "google_scholar": ";sYTUOu8AAAAJ;;dUL2BPUAAAAJ", "orcid": ";;;", "linkedin": ";;donggeon/;", "or_profile": "~Kyoung-Woon_On1;~Byoung-Tak_Zhang1;~Dong-Geon_Lee2;~IL_JAE_KWON1", "aff": "Kakaobrain;Seoul National University;Yonsei University;Seoul National University", "aff_domain": "kakaobrain.com;snu.ac.kr;yonsei.ac.kr;snu.ac.kr", "position": "Researcher;Full Professor;Undergrad student;MS student", "bibtex": "@misc{\nkwon2021coldexpand,\ntitle={ColdExpand: Semi-Supervised Graph Learning in Cold Start},\nauthor={Il-Jae Kwon and Kyoung-Woon On and Dong-Geon Lee and Byoung-Tak Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=3uiR9bkbDjL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=3uiR9bkbDjL", "pdf_size": 0, "rating": "5;6;6;9", "confidence": "3;4;5;5", "wc_review": "171;404;782;532", "wc_reply_reviewers": "0;249;0;0", "wc_reply_authors": "272;981;524;497", "reply_reviewers": "0;3;0;0", "reply_authors": "1;3;1;1", "rating_avg": [ 6.5, 1.5 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 472.25, 220.75141562400003 ], "wc_reply_reviewers_avg": [ 62.25, 107.82016277116261 ], "wc_reply_authors_avg": [ 568.5, 257.46893016439867 ], "reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.7035264706814485, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Hr690UES1cMJ:scholar.google.com/&scioq=ColdExpand:+Semi-Supervised+Graph+Learning+in+Cold+Start&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2;1", "aff_unique_norm": "Kakao Brain;Seoul National University;Yonsei University", "aff_unique_dep": ";;", "aff_unique_url": "https://brain.kakao.com;https://www.snu.ac.kr;https://www.yonsei.ac.kr", "aff_unique_abbr": "Kakao Brain;SNU;Yonsei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "3xUBgZQ04X", "title": "The Bures Metric for Taming Mode Collapse in Generative Adversarial Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Generative Adversarial Networks (GANs) are performant generative methods yielding high-quality samples. However, under certain circumstances, the training of GANs can lead to mode collapse or mode dropping, i.e. the generative models not being able to sample from the entire probability distribution. To address this problem, we use the last layer of the discriminator as a feature map to study the distribution of the real and the fake data. During training, we propose to match the real batch diversity to the fake batch diversity by using the Bures distance between covariance matrices in feature space. The computation of the Bures distance can be conveniently done in either feature space or kernel space in terms of the covariance and kernel matrix respectively. We observe that diversity matching reduces mode collapse substantially and has a positive effect on the sample quality. On the practical side, a very simple training procedure, that does not require additional hyperparameter tuning, is proposed and assessed on several datasets. ", "keywords": "Generative Adversarial Networks;Deep Learning;Neural Networks", "primary_area": "", "supplementary_material": "/attachment/9791d963cd9ef45a7292afaf1b4e9c6203278972.zip", "author": "Hannes De Meulemeester;Joachim Schreurs;Micha\u00ebl Fanuel;Bart De Moor;Johan Suykens", "authorids": "~Hannes_De_Meulemeester1;joachim.schreurs@esat.kuleuven.be;michael.fanuel@kuleuven.be;bart.demoor@esat.kuleuven.be;~Johan_Suykens1", "gender": ";;;;M", "homepage": ";;;;https://www.kuleuven.be/wieiswie/nl/person/00015385", "dblp": ";;;;61/3224", "google_scholar": "https://scholar.google.be/citations?user=hoTz9VsAAAAJ;;;;https://scholar.google.be/citations?user=WtBmh0UAAAAJ", "orcid": "0000-0002-5938-2387;;;;0000-0002-8846-6352", "linkedin": ";;;;", "or_profile": "~Hannes_De_Meulemeester1;joachim.schreurs@esat.kuleuven.be;michael.fanuel@kuleuven.be;bart.demoor@esat.kuleuven.be;~Johan_Suykens1", "aff": "KU Leuven;;;;KU Leuven", "aff_domain": "kuleuven.be;;;;kuleuven.be", "position": "PhD student;;;;Full Professor", "bibtex": "@misc{\nmeulemeester2021the,\ntitle={The Bures Metric for Taming Mode Collapse in Generative Adversarial Networks},\nauthor={Hannes De Meulemeester and Joachim Schreurs and Micha{\\\"e}l Fanuel and Bart De Moor and Johan Suykens},\nyear={2021},\nurl={https://openreview.net/forum?id=3xUBgZQ04X}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=3xUBgZQ04X", "pdf_size": 0, "rating": "3;5;6;6", "confidence": "4;4;4;4", "wc_review": "526;606;356;742", "wc_reply_reviewers": "0;0;0;546", "wc_reply_authors": "204;200;330;834", "reply_reviewers": "0;0;0;3", "reply_authors": "1;1;1;2", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 557.5, 139.63076308607643 ], "wc_reply_reviewers_avg": [ 136.5, 236.42493523315176 ], "wc_reply_authors_avg": [ 392.0, 260.4880035625441 ], "reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2253556393712747130&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Katholieke Universiteit Leuven", "aff_unique_dep": "", "aff_unique_url": "https://www.kuleuven.be", "aff_unique_abbr": "KU Leuven", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Belgium" }, { "id": "3zaVN0M0BIb", "title": "Learning and Generalization in Univariate Overparameterized Normalizing Flows", "track": "main", "status": "Reject", "tldr": "", "abstract": "In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using Stochastic Gradient Descent (SGD). In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) learn to map complex real-world distributions into simple base distributions, and constitute an important class of models in unsupervised learning for sampling and density estimation. In this paper, we theoretically and empirically analyze these models when the underlying neural network is one hidden layer overparametrized network. On the one hand we provide evidence that for a class of NFs, overparametrization hurts training. On the other, we prove that another class of NFs, with similar underlying networks can efficiently learn any reasonable data distribution under minimal assumptions. We extend theoretical ideas on learning and generalization from overparameterized neural networks in supervised learning to overparameterized normalizing flows in unsupervised learning. We also provide experimental validation to support our theoretical analysis in practice.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/011444e6383112b1e26c2e123c6e9361068941c5.zip", "author": "Kulin Shah;Amit Deshpande;Navin Goyal", "authorids": "~Kulin_Shah1;~Amit_Deshpande1;~Navin_Goyal1", "gender": "M;M;", "homepage": "https://kulinshah98.github.io/;;", "dblp": "215/3581;28/6953-1;20/6275", "google_scholar": "https://scholar.google.co.in/citations?user=67OmLg4AAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Kulin_Shah1;~Amit_Deshpande1;~Navin_Goyal1", "aff": "Microsoft;Microsoft Research;Microsoft", "aff_domain": "microsoft.com;microsoft.com;microsoft.com", "position": "Research Fellow;Researcher;Researcher", "bibtex": "@misc{\nshah2021learning,\ntitle={Learning and Generalization in Univariate Overparameterized Normalizing Flows},\nauthor={Kulin Shah and Amit Deshpande and Navin Goyal},\nyear={2021},\nurl={https://openreview.net/forum?id=3zaVN0M0BIb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=3zaVN0M0BIb", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;3;3", "wc_review": "766;539;584;245", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1062;868;792;496", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 533.5, 186.99532079707237 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 804.5, 203.51105621071304 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:72-N8Yw_KjIJ:scholar.google.com/&scioq=Learning+and+Generalization+in+Univariate+Overparameterized+Normalizing+Flows&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Microsoft Corporation", "aff_unique_url": "https://www.microsoft.com", "aff_unique_abbr": "Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "412_KkkGjJ4", "title": "Weakly Supervised Scene Graph Grounding", "track": "main", "status": "Reject", "tldr": "", "abstract": " Recent researches have achieved substantial advances in learning structured representations from images. However, current methods rely heavily on the annotated mapping between the nodes of scene graphs and object bounding boxes inside images. Here, we explore the problem of learning the mapping between scene graph nodes and visual objects under weak supervision. Our proposed method learns a metric among visual objects and scene graph nodes by incorporating information from both object features and relational features. Extensive experiments on Visual Genome (VG) and Visual Relation Detection (VRD) datasets verify that our model post an improvement on scene graph grounding task over current state-of-the-art approaches. Further experiments on scene graph parsing task verify the grounding found by our model can reinforce the performance of the existing method. ", "keywords": "Weakly Supervised Learning;Scene Graph Grounding;Visual Relation;Computer Vision", "primary_area": "", "supplementary_material": "", "author": "Yizhou Zhang;Zhaoheng Zheng;Yan Liu", "authorids": "~Yizhou_Zhang3;~Zhaoheng_Zheng1;~Yan_Liu1", "gender": ";M;F", "homepage": "https://yizhouzhang1997.netlify.app/;;http://www-bcf.usc.edu/~liu32/", "dblp": ";277/9864.html;150/4295", "google_scholar": "k127fcwAAAAJ;36e4ADAAAAAJ;UUKLPMYAAAAJ", "orcid": ";;0000-0002-7055-9518", "linkedin": ";;", "or_profile": "~Yizhou_Zhang3;~Zhaoheng_Zheng1;~Yan_Liu1", "aff": "University of Southern California;Amazon Alexa AI;University of Southern California", "aff_domain": "usc.edu;amazon.com;usc.edu", "position": "PhD student;Intern;Professor", "bibtex": "@misc{\nzhang2021weakly,\ntitle={Weakly Supervised Scene Graph Grounding},\nauthor={Yizhou Zhang and Zhaoheng Zheng and Yan Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=412_KkkGjJ4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=412_KkkGjJ4", "pdf_size": 0, "rating": "4;5;5;7", "confidence": "5;5;4;5", "wc_review": "318;316;316;886", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "661;580;308;635", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 1.0897247358851685 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 459.0, 246.5299170486211 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 546.0, 140.4866541704229 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Southern California;Amazon", "aff_unique_dep": ";Amazon Alexa AI", "aff_unique_url": "https://www.usc.edu;https://www.amazon.com", "aff_unique_abbr": "USC;Amazon", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "The geometry of integration in text classification RNNs", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2576", "id": "42kiJ7n_8xO", "poster": "", "openreview": "https://openreview.net/forum?id=42kiJ7n_8xO", "slides": "https://iclr.cc/virtual/2021/poster/2576", "video": "https://iclr.cc/virtual/2021/poster/2576", "author_site": "Kyle Aitken, Vinay Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan", "tldr": "", "abstract": "Despite the widespread application of recurrent neural networks (RNNs), a unified understanding of how RNNs solve particular tasks remains elusive. In particular, it is unclear what dynamical patterns arise in trained RNNs, and how those pat-terns depend on the training dataset or task. This work addresses these questions in the context of text classification, building on earlier work studying the dynamics of binary sentiment-classification networks (Maheswaranathan et al., 2019). We study text-classification tasks beyond the binary case, exploring the dynamics ofRNNs trained on both natural and synthetic datasets. These dynamics, which we find to be both interpretable and low-dimensional, share a common mechanism across architectures and datasets: specifically, these text-classification networks use low-dimensional attractor manifolds to accumulate evidence for each class as they process the text. The dimensionality and geometry of the attractor manifold are determined by the structure of the training dataset, with the dimensionality reflecting the number of scalar quantities the network remembers in order to classify.In categorical classification, for example, we show that this dimensionality is one less than the number of classes. Correlations in the dataset, such as those induced by ordering, can further reduce the dimensionality of the attractor manifold; we show how to predict this reduction using simple word-count statistics computed on the training dataset. To the degree that integration of evidence towards a decision is a common computational primitive, this work continues to lay the foundation for using dynamical systems techniques to study the inner workings of RNNs.", "keywords": "Recurrent neural networks;dynamical systems;interpretability;document classification;reverse engineering", "primary_area": "", "supplementary_material": "/attachment/a604ec751d4048700db465dae41abbe04fab8e17.zip", "author": "Kyle Aitken;Vinay Venkatesh Ramasesh;Ankush Garg;Yuan Cao;David Sussillo;Niru Maheswaranathan", "authorids": "~Kyle_Aitken1;~Vinay_Venkatesh_Ramasesh1;~Ankush_Garg1;~Yuan_Cao2;~David_Sussillo1;~Niru_Maheswaranathan1", "gender": "M;M;M;M;;M", "homepage": ";http://ramasesh.github.io;;;;http://niru.dev/", "dblp": ";;86/7221;52/4472-7.html;56/9314;155/7407", "google_scholar": "VIlm2HwAAAAJ;;https://scholar.google.com/citations?hl=en;Q82vvqcAAAAJ;ebBgMSkAAAAJ;bEOT7ScAAAAJ", "orcid": ";;;0000-0002-1267-8930;;", "linkedin": ";;agbgarg/;;david-sussillo-736a1290;", "or_profile": "~Kyle_Aitken1;~Vinay_Venkatesh_Ramasesh1;~Ankush_Garg1;~Yuan_Cao2;~David_Sussillo1;~Niru_Maheswaranathan2", "aff": ";;Google;Google DeepMind;;Google", "aff_domain": ";;google.com;google.com;;google.com", "position": ";;research engineer;Research scientist;;Research Engineer", "bibtex": "@inproceedings{\naitken2021the,\ntitle={The geometry of integration in text classification {\\{}RNN{\\}}s},\nauthor={Kyle Aitken and Vinay Venkatesh Ramasesh and Ankush Garg and Yuan Cao and David Sussillo and Niru Maheswaranathan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=42kiJ7n_8xO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer5;AnonReviewer2", "pdf_size": 0, "rating": "5;7;7;7;8", "confidence": "4;5;4;4;4", "wc_review": "264;904;754;316;357", "wc_reply_reviewers": "30;0;28;0;0", "wc_reply_authors": "560;1357;256;170;756", "reply_reviewers": "1;0;1;0;0", "reply_authors": "1;2;1;1;1", "rating_avg": [ 6.8, 0.9797958971132712 ], "confidence_avg": [ 4.2, 0.39999999999999997 ], "wc_review_avg": [ 519.0, 259.20185184523666 ], "wc_reply_reviewers_avg": [ 11.6, 14.22111106770494 ], "wc_reply_authors_avg": [ 619.8, 424.3255354088415 ], "reply_reviewers_avg": [ 0.4, 0.48989794855663565 ], "reply_authors_avg": [ 1.2, 0.4000000000000001 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.10206207261596574, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2065276965620091862&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=42kiJ7n_8xO", "email": ";;google.com;google.com;;google.com", "author_num": 6, "aff_unique_index": "0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;United Kingdom" }, { "title": "Unsupervised Audiovisual Synthesis via Exemplar Autoencoders", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2705", "id": "43VKWxg_Sqr", "poster": "", "openreview": "https://openreview.net/forum?id=43VKWxg_Sqr", "slides": "https://iclr.cc/virtual/2021/poster/2705", "video": "https://iclr.cc/virtual/2021/poster/2705", "author_site": "Kangle Deng, Aayush Bansal, Deva Ramanan", "tldr": "", "abstract": "We present an unsupervised approach that converts the input speech of any individual into audiovisual streams of potentially-infinitely many output speakers. Our approach builds on simple autoencoders that project out-of-sample data onto the distribution of the training set. We use exemplar autoencoders to learn the voice, stylistic prosody, and visual appearance of a specific target exemplar speech. In contrast to existing methods, the proposed approach can be easily extended to an arbitrarily large number of speakers and styles using only 3 minutes of target audio-video data, without requiring any training data for the input speaker. To do so, we learn audiovisual bottleneck representations that capture the structured linguistic content of speech. We outperform prior approaches on both audio and video synthesis.\n", "keywords": "unsupervised learning;autoencoders;speech-impaired;assistive technology;audiovisual synthesis;voice conversion", "primary_area": "", "supplementary_material": "/attachment/bddf7672a4fee393766190ac41e16f21d674bf2b.zip", "author": "Kangle Deng;Aayush Bansal;Deva Ramanan", "authorids": "~Kangle_Deng1;~Aayush_Bansal1;~Deva_Ramanan1", "gender": "M;;M", "homepage": "https://dunbar12138.github.io;;https://www.cs.cmu.edu/~deva/", "dblp": "246/3131;;49/488", "google_scholar": ";;9B8PoXUAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Kangle_Deng1;~Aayush_Bansal1;~Deva_Ramanan1", "aff": "Carnegie Mellon University;;School of Computer Science, Carnegie Mellon University", "aff_domain": "cmu.edu;;cs.cmu.edu", "position": "PhD student;;Full Professor", "bibtex": "@inproceedings{\ndeng2021unsupervised,\ntitle={Unsupervised Audiovisual Synthesis via Exemplar Autoencoders},\nauthor={Kangle Deng and Aayush Bansal and Deva Ramanan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=43VKWxg_Sqr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;6;9", "confidence": "3;3;4", "wc_review": "175;392;114", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "329;279;181", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 7.0, 1.4142135623730951 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 227.0, 119.30074042799008 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 263.0, 61.47086030524273 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16877203563908649712&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=43VKWxg_Sqr", "email": "cmu.edu;;cs.cmu.edu", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Identifying Physical Law of Hamiltonian Systems via Meta-Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3130", "id": "45NZvF1UHam", "poster": "", "openreview": "https://openreview.net/forum?id=45NZvF1UHam", "slides": "https://iclr.cc/virtual/2021/poster/3130", "video": "https://iclr.cc/virtual/2021/poster/3130", "author_site": "Seungjun Lee, Haesang Yang, Woojae Seong", "tldr": "", "abstract": "Hamiltonian mechanics is an effective tool to represent many physical processes with concise yet well-generalized mathematical expressions. A well-modeled Hamiltonian makes it easy for researchers to analyze and forecast many related phenomena that are governed by the same physical law. However, in general, identifying a functional or shared expression of the Hamiltonian is very difficult. It requires carefully designed experiments and the researcher's insight that comes from years of experience. We propose that meta-learning algorithms can be potentially powerful data-driven tools for identifying the physical law governing Hamiltonian systems without any mathematical assumptions on the representation, but with observations from a set of systems governed by the same physical law. We show that a well meta-trained learner can identify the shared representation of the Hamiltonian by evaluating our method on several types of physical systems with various experimental settings.", "keywords": "Learning physical laws;meta-learning;Hamiltonian systems", "primary_area": "", "supplementary_material": "", "author": "Seungjun Lee;Haesang Yang;Woojae Seong", "authorids": "~Seungjun_Lee1;~Haesang_Yang1;~Woojae_Seong1", "gender": "M;M;", "homepage": "https://7tl7qns7ch.github.io/seungjunlee.github.io/;;", "dblp": ";;", "google_scholar": "https://scholar.google.com/citations?hl=ko;;", "orcid": "0009-0001-4314-0260;;", "linkedin": "seungjun-lee-656946213/;;", "or_profile": "~Seungjun_Lee1;~Haesang_Yang1;~Woojae_Seong1", "aff": "Seoul National University;Seoul National University;", "aff_domain": "snu.ac.kr;snu.ac.kr;", "position": "PhD student;Postdoc;", "bibtex": "@inproceedings{\nlee2021identifying,\ntitle={Identifying Physical Law of Hamiltonian Systems via Meta-Learning},\nauthor={Seungjun Lee and Haesang Yang and Woojae Seong},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=45NZvF1UHam}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7", "confidence": "4;3;4", "wc_review": "758;734;280", "wc_reply_reviewers": "67;17;48", "wc_reply_authors": "1190;759;376", "reply_reviewers": "1;1;1", "reply_authors": "2;1;2", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 590.6666666666666, 219.8929032253858 ], "wc_reply_reviewers_avg": [ 44.0, 20.607442021431645 ], "wc_reply_authors_avg": [ 775.0, 332.5066415376792 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16421601538365365056&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=45NZvF1UHam", "email": "snu.ac.kr;snu.ac.kr;", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Seoul National University", "aff_unique_dep": "", "aff_unique_url": "https://www.snu.ac.kr", "aff_unique_abbr": "SNU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "title": "Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2660", "id": "45uOPa46Kh", "poster": "", "openreview": "https://openreview.net/forum?id=45uOPa46Kh", "slides": "https://iclr.cc/virtual/2021/poster/2660", "video": "https://iclr.cc/virtual/2021/poster/2660", "author_site": "Shuhei Kurita, Kyunghyun Cho", "tldr": "", "abstract": "Vision-and-language navigation (VLN) is a task in which an agent is embodied in a realistic 3D environment and follows an instruction to reach the goal node. While most of the previous studies have built and investigated a discriminative approach, we notice that there are in fact two possible approaches to building such a VLN agent: discriminative and generative. In this paper, we design and investigate a generative language-grounded policy which uses a language model to compute the distribution over all possible instructions i.e. all possible sequences of vocabulary tokens given action and the transition history. In experiments, we show that the proposed generative approach outperforms the discriminative approach in the Room-2-Room (R2R) and Room-4-Room (R4R) datasets, especially in the unseen environments. We further show that the combination of the generative and discriminative policies achieves close to the state-of-the art results in the R2R dataset, demonstrating that the generative and discriminative policies capture the different aspects of VLN.", "keywords": "vision-and-language-navigation", "primary_area": "", "supplementary_material": "", "author": "Shuhei Kurita;Kyunghyun Cho", "authorids": "~Shuhei_Kurita1;~Kyunghyun_Cho1", "gender": ";M", "homepage": ";http://kyunghyuncho.me", "dblp": ";41/9736", "google_scholar": ";https://scholar.google.fi/citations?user=0RAmmIAAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Shuhei_Kurita1;~Kyunghyun_Cho1", "aff": ";New York University", "aff_domain": ";nyu.edu", "position": ";Associate Professor", "bibtex": "@inproceedings{\nkurita2021generative,\ntitle={Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule},\nauthor={Shuhei Kurita and Kyunghyun Cho},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=45uOPa46Kh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "4;5;8;8", "confidence": "4;4;3;5", "wc_review": "431;451;301;370", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "628;707;381;540", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 1.7853571071357126 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 388.25, 58.546455913231846 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 564.0, 121.04751133336033 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 27, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17750880421728483593&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=45uOPa46Kh", "email": ";nyu.edu", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "New York University", "aff_unique_dep": "", "aff_unique_url": "https://www.nyu.edu", "aff_unique_abbr": "NYU", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "48goXfYCVFX", "title": "Interpretable Relational Representations for Food Ingredient Recommendation Systems", "track": "main", "status": "Reject", "tldr": "", "abstract": "Supporting chefs with ingredient recommender systems to create new recipes is challenging, as good ingredient combinations depend on many factors like taste, smell, cuisine style, texture among others. There have been few attempts to address these issues using machine learning. Importantly, useful models do obviously need to be accurate but importantly -- especially for food professionals -- interpretable. In order to address these issues, we propose the Interpretable Relational Representation Model (IRRM). The main component of the model is a key-value memory network to represent relationships of ingredients. We propose and test two variants of the model.\nOne can learn latent relational representations over a trainable memory network (Implicit model), and the other can learn explainable relational representations over a pre-trained memory network that integrates an external knowledge base (Explicit model).\nThe relational representations resulting from the model are interpretable -- they allow to inspect why certain ingredient pairings have been suggested. The Explicit model additionally allows to integrate any number of manually specified constraints.\nWe conduct experiments on two recipe datasets, including CulinaryDB with 45,772 recipes and Flavornet with 55,001 recipes, respectively. The experimental results show that our models are both predictive and informative.", "keywords": "Metric Learning;Gastronomy;Memory Network;Knowledge Graph;Interpretable", "primary_area": "", "supplementary_material": "", "author": "Kana Maruyama;Michael Spranger", "authorids": "~Kana_Maruyama1;~Michael_Spranger2", "gender": "Unspecified;", "homepage": "https://github.com/maru8xx;", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Kana_Maruyama1;~Michael_Spranger2", "aff": "Sony AI;", "aff_domain": "sony.com;", "position": "Engineer;", "bibtex": "@misc{\nmaruyama2021interpretable,\ntitle={Interpretable Relational Representations for Food Ingredient Recommendation Systems},\nauthor={Kana Maruyama and Michael Spranger},\nyear={2021},\nurl={https://openreview.net/forum?id=48goXfYCVFX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=48goXfYCVFX", "pdf_size": 0, "rating": "3;5;5;7", "confidence": "4;5;4;3", "wc_review": "397;406;418;330", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.4142135623730951 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 387.75, 34.16412592179112 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12885475231241424918&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Sony", "aff_unique_dep": "Sony AI", "aff_unique_url": "https://www.sony.com", "aff_unique_abbr": "Sony AI", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "id": "49V11oUejQ", "title": "Efficient Robust Training via Backward Smoothing", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adversarial training is so far the most effective strategy in defending against adversarial examples. However, it suffers from high computational cost due to the iterative adversarial attacks in each training step. Recent studies show that it is possible to achieve Fast Adversarial Training by performing a single-step attack with random initialization. Yet, it remains a mystery why random initialization helps. Besides, such an approach still lags behind state-of-the-art adversarial training algorithms on both stability and model robustness. In this work, we develop a new understanding towards Fast Adversarial Training, by viewing random initialization as performing randomized smoothing for better optimization of the inner maximization problem. From this perspective, we show that the smoothing effect by random initialization is not sufficient under the adversarial perturbation constraint. A new initialization strategy, \\emph{backward smoothing}, is proposed to address this issue and significantly improves both stability and model robustness over single-step robust training methods. Experiments on multiple benchmarks demonstrate that our method achieves similar model robustness as the original TRADES method, while using much less training time (~3x improvement with the same training schedule). ", "keywords": "Efficient Robust Training;Backward Smoothing;Robustness", "primary_area": "", "supplementary_material": "/attachment/f69958051f5c543363c667035bcf9f4de83ec9e2.zip", "author": "Jinghui Chen;Yu Cheng;Zhe Gan;Quanquan Gu;Jingjing Liu", "authorids": "~Jinghui_Chen1;~Yu_Cheng1;~Zhe_Gan1;~Quanquan_Gu1;~Jingjing_Liu2", "gender": "M;M;M;M;", "homepage": "https://jinghuichen.github.io/;https://ych133.github.io;http://zhegan27.github.io/;http://web.cs.ucla.edu/~qgu/;https://air.tsinghua.edu.cn/en/info/1046/1194.htm#:~:text=Jingjing%20Liu%20is%20Professor%2C%20Principal,CVPR%2C%20ACL%2C%20etc.)", "dblp": "67/5633;96/3060-1.html;41/7845;50/4597;30/3008-1", "google_scholar": "mKia7Y4AAAAJ;https://scholar.google.com/citations?hl=en;E64XWyMAAAAJ;GU9HgNAAAAAJ;BzJ_GboAAAAJ", "orcid": ";;;;", "linkedin": ";chengyu05/;zhe-gan-a2229a78/;;jingjing-liu-65703431/", "or_profile": "~Jinghui_Chen1;~Yu_Cheng1;~Zhe_Gan1;~Quanquan_Gu1;~Jingjing_Liu2", "aff": "University of California, Los Angeles;Microsoft Research;Microsoft;University of California, Los Angeles;Microsoft", "aff_domain": "ucla.edu;microsoft.com;microsoft.com;cs.ucla.edu;microsoft.com", "position": "PhD student;Principal Researcher;Principal Researcher;Assistant Professor;Sr Principal Research Manager", "bibtex": "@misc{\nchen2021efficient,\ntitle={Efficient Robust Training via Backward Smoothing},\nauthor={Jinghui Chen and Yu Cheng and Zhe Gan and Quanquan Gu and Jingjing Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=49V11oUejQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=49V11oUejQ", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "5;5;4;4", "wc_review": "781;1090;436;559", "wc_reply_reviewers": "289;522;146;164", "wc_reply_authors": "811;1162;662;839", "reply_reviewers": "1;1;1;1", "reply_authors": "2;2;2;3", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 716.5, 248.57041255949994 ], "wc_reply_reviewers_avg": [ 280.25, 150.04728421401035 ], "wc_reply_authors_avg": [ 868.5, 182.31908841369298 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 33, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14385892815704849325&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;1;0;1", "aff_unique_norm": "University of California, Los Angeles;Microsoft", "aff_unique_dep": ";Microsoft Research", "aff_unique_url": "https://www.ucla.edu;https://www.microsoft.com/en-us/research", "aff_unique_abbr": "UCLA;MSR", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "49mMdsxkPlD", "title": "Iterative Amortized Policy Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control, enabling the estimation and sampling of high-value actions. From the variational inference perspective on RL, policy networks, when employed with entropy or KL regularization, are a form of amortized optimization, optimizing network parameters rather than the policy distributions directly. However, this direct amortized mapping can empirically yield suboptimal policy estimates and limited exploration. Given this perspective, we consider the more flexible class of iterative amortized optimizers. We demonstrate that the resulting technique, iterative amortized policy optimization, yields performance improvements over direct amortization methods on benchmark continuous control tasks.", "keywords": "Reinforcement Learning;Policy Optimization;Amortization;Variational Inference", "primary_area": "", "supplementary_material": "/attachment/3b20148ae761b21486b6a0a3c183c1564a185c45.zip", "author": "Joseph Marino;Alexandre Pich\u00e9;Alessandro Davide Ialongo;Yisong Yue", "authorids": "~Joseph_Marino1;~Alexandre_Pich\u00e91;~Alessandro_Davide_Ialongo1;~Yisong_Yue1", "gender": "M;M;M;M", "homepage": "http://joelouismarino.github.io;;http://www.yisongyue.com;https://github.com/AlexPiche", "dblp": "31/8756;https://dblp.uni-trier.de/pers/hd/i/Ialongo:Alessandro_Davide;28/1244;", "google_scholar": "LTprTF0AAAAJ;Z2tqKq4AAAAJ;tEk4qo8AAAAJ;", "orcid": "0000-0001-6387-8062;;0000-0001-9127-1989;", "linkedin": ";alessandro-ialongo/;yisongyue/;", "or_profile": "~Joseph_Marino1;~Alessandro_Davide_Ialongo1;~Yisong_Yue1;~Alexandre_Piche1", "aff": "California Institute of Technology;Max Planck Institute for Intelligent Systems, Max-Planck Institute;California Institute of Technology;University of Montreal", "aff_domain": "caltech.edu;tuebingen.mpg.de;caltech.edu;umontreal.ca", "position": "PhD student;PhD student;Full Professor;PhD student", "bibtex": "@misc{\nmarino2021iterative,\ntitle={Iterative Amortized Policy Optimization},\nauthor={Joseph Marino and Alexandre Pich{\\'e} and Alessandro Davide Ialongo and Yisong Yue},\nyear={2021},\nurl={https://openreview.net/forum?id=49mMdsxkPlD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=49mMdsxkPlD", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "2;3;3;4", "wc_review": "328;423;417;580", "wc_reply_reviewers": "142;659;112;0", "wc_reply_authors": "629;1046;222;969", "reply_reviewers": "1;2;1;0", "reply_authors": "1;2;1;2", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 437.0, 90.72761431890514 ], "wc_reply_reviewers_avg": [ 228.25, 254.26204494576064 ], "wc_reply_authors_avg": [ 716.5, 325.77331075457977 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5877339606852616235&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "California Institute of Technology;Max Planck Institute for Intelligent Systems;University of Montreal", "aff_unique_dep": ";Intelligent Systems;", "aff_unique_url": "https://www.caltech.edu;https://www.mpi-is.mpg.de;https://wwwumontreal.ca", "aff_unique_abbr": "Caltech;MPI-IS;UM", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Pasadena;", "aff_country_unique_index": "0;1;0;2", "aff_country_unique": "United States;Germany;Canada" }, { "id": "4ADnf1HqIw", "title": "Recovering Geometric Information with Learned Texture Perturbations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Regularization is used to avoid overfitting when training a neural network; unfortunately, this reduces the attainable level of detail hindering the ability to capture high-frequency information present in the training data. Even though various approaches may be used to re-introduce high-frequency detail, it typically does not match the training data and is often not time coherent. In the case of network inferred cloth, these sentiments manifest themselves via either a lack of detailed wrinkles or unnaturally appearing and/or time incoherent surrogate wrinkles. Thus, we propose a general strategy whereby high-frequency information is procedurally embedded into low-frequency data so that when the latter is smeared out by the network the former still retains its high-frequency detail. We illustrate this approach by learning texture coordinates which when smeared do not in turn smear out the high-frequency detail in the texture itself but merely smoothly distort it. Notably, we prescribe perturbed texture coordinates that are subsequently used to correct the over-smoothed appearance of inferred cloth, and correcting the appearance from multiple camera views naturally recovers lost geometric information.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/f34354159d131d48594b18bff8087fd22c2b6e61.zip", "author": "Jane Wu;Yongxu Jin;Zhenglin Geng;Hui Zhou;Ronald Fedkiw", "authorids": "~Jane_Wu2;yxjin@stanford.edu;zhenglin@stanford.edu;hui.zhou@jd.com;~Ronald_Fedkiw1", "gender": ";;;;", "homepage": ";;;;", "dblp": ";;;;", "google_scholar": ";;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": ";;;;", "aff": ";;;;", "aff_domain": ";;;;", "position": ";;;;", "bibtex": "@misc{\nwu2021recovering,\ntitle={Recovering Geometric Information with Learned Texture Perturbations},\nauthor={Jane Wu and Yongxu Jin and Zhenglin Geng and Hui Zhou and Ronald Fedkiw},\nyear={2021},\nurl={https://openreview.net/forum?id=4ADnf1HqIw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=4ADnf1HqIw", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "2;3;4;4", "wc_review": "250;528;350;399", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "502;575;476;424", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 381.75, 100.06591577555267 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 494.25, 54.42598184690838 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8528028654224418, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1869554897949951332&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3 }, { "id": "4AWko4A35ss", "title": "Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper proposes a novel pretext task for self-supervised video representation learning by exploiting spatiotemporal continuity in videos. It is motivated by the fact that videos are spatiotemporal by nature and a representation learned to detect spatiotemporal continuity/discontinuity is thus beneficial for downstream video content analysis tasks. A natural choice of such a pretext task is to construct spatiotemporal (3D) jigsaw puzzles and learn to solve them. However, this task turns out to be intractable. We thus propose Constrained Spatiotemporal Jigsaw (CSJ) whereby the 3D jigsaws are formed in a constrained manner to ensure that large continuous spatiotemporal cuboids exist in a shuffled clip to provide sufficient cues for the model to reason about the continuity. With the constrained jigsaw puzzles, instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable but meanwhile still ensure that the learned representation is sensitive to spatiotemporal continuity at both the local and global levels. Extensive experiments show that our CSJ achieves state-of-the-art on two downstream tasks across various benchmarks.", "keywords": "self-supervised learning;video representation learning;spatiotemporal jigsaw", "primary_area": "", "supplementary_material": "", "author": "Yuqi Huo;Mingyu Ding;Haoyu Lu;Zhiwu Lu;Tao Xiang;Ji-Rong Wen;Ziyuan Huang;Jianwen Jiang;Shiwei Zhang;Mingqian Tang;Songfang Huang;Ping Luo", "authorids": "~Yuqi_Huo1;~Mingyu_Ding1;~Haoyu_Lu1;~Zhiwu_Lu1;~Tao_Xiang1;~Ji-Rong_Wen1;~Ziyuan_Huang1;~Jianwen_Jiang2;~Shiwei_Zhang2;~Mingqian_Tang1;~Songfang_Huang1;~Ping_Luo2", "gender": "M;M;;M;M;M;M;;M;F;;", "homepage": ";https://dingmyu.github.io/;https://haoyulu1998.github.io/;https://gsai.ruc.edu.cn/luzhiwu;https://www.surrey.ac.uk/people/tao-xiang;https://gsai.ruc.edu.cn/english/jrwen;https://huang-ziyuan.github.io/;;https://www.researchgate.net/profile/Shiwei_Zhang7/research;;https://www.coe.pku.edu.cn/teaching/all_time/13007.html;", "dblp": "219/6931.html;188/5243;240/2720;53/5234;22/4460-2.html;w/JRWen;;;;;05/4919;", "google_scholar": "3oryMg0AAAAJ;w4yTWwoAAAAJ;https://scholar.google.com.hk/citations?view_op=list_works;OUXS8doAAAAJ;MeS5d4gAAAAJ;tbxCHJgAAAAJ;A9D-disAAAAJ;;ZO3OQ-8AAAAJ;;3So9lV8AAAAJ;", "orcid": ";0000-0001-6556-8359;;;0000-0002-2530-1059;0000-0002-9777-9676;;;0000-0002-6929-5295;0000-0002-7117-6666;;", "linkedin": ";dingmyu/;%E6%B5%A9%E5%AE%87-%E5%8D%A2-4b42b7198/;;;;ziyuan-huang-731b78177/;;;;;", "or_profile": "~Yuqi_Huo1;~Mingyu_Ding1;~Haoyu_Lu1;~Zhiwu_Lu1;~Tao_Xiang1;~Ji-Rong_Wen1;~Ziyuan_Huang1;~Jianwen_Jiang2;~Shiwei_Zhang2;~Mingqian_Tang1;~Songfang_Huang1;~Ping_Luo2", "aff": "Renmin University of China;University of Hong Kong;Renmin University of China;Renmin University of China;University of Surrey;Renmin University of China;National University of Singapore;;Alibaba Group;Alibaba Group;Alibaba Group;", "aff_domain": "ruc.edu.cn;hku.hk;ruc.edu.cn;ruc.edu.cn;surrey.ac.uk;ruc.edu.cn;u.nus.edu;;alibaba-inc.com;alibaba-inc.com;alibaba-inc.com;", "position": "PhD student;PhD student;Undergrad student;Full Professor;Full Professor;Full Professor;PhD student;;Researcher;Staff Algorithm Engineer;Senior Staff Engineer;", "bibtex": "@misc{\nhuo2021selfsupervised,\ntitle={Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw},\nauthor={Yuqi Huo and Mingyu Ding and Haoyu Lu and Zhiwu Lu and Tao Xiang and Ji-Rong Wen and Ziyuan Huang and Jianwen Jiang and Shiwei Zhang and Mingqian Tang and Songfang Huang and Ping Luo},\nyear={2021},\nurl={https://openreview.net/forum?id=4AWko4A35ss}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=4AWko4A35ss", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;4;4;4", "wc_review": "332;778;329;537", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "475;799;483;460", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 494.0, 184.37326270367947 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 554.25, 141.54747436814264 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 12, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16107310292622185664&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;0;0;2;0;3;4;4;4", "aff_unique_norm": "Renmin University of China;University of Hong Kong;University of Surrey;National University of Singapore;Alibaba Group", "aff_unique_dep": ";;;;", "aff_unique_url": "http://www.ruc.edu.cn;https://www.hku.hk;https://www.surrey.ac.uk;https://www.nus.edu.sg;https://www.alibaba.com", "aff_unique_abbr": "RUC;HKU;Surrey;NUS;Alibaba", "aff_campus_unique_index": "1", "aff_campus_unique": ";Hong Kong SAR", "aff_country_unique_index": "0;0;0;0;1;0;2;0;0;0", "aff_country_unique": "China;United Kingdom;Singapore" }, { "id": "4CqesJ7GO7Q", "title": "Intriguing class-wise properties of adversarial training", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. However, previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. In this paper, we provide the first detailed class-wise diagnosis of adversarial training on six widely used datasets, $\\textit{i.e.}$, MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. Surprisingly, we find that there are $\\textit{remarkable robustness discrepancies among classes}$, demonstrating the following intriguing properties: 1) Many examples from a certain class could only be maliciously attacked to some specific semantic-similar classes, and these examples will not exist adversarial counterparts in bounded $\\epsilon$-ball if we re-train the model without those specific classes; 2) The robustness of each class is positively correlated with its norm of classifier weight in deep neural networks; 3) Stronger attacks are usually more powerful for vulnerable classes. Finally, we propose an attack to better understand the defense mechanism of some state-of-the-art models from the class-wise perspective. We believe these findings can contribute to a more comprehensive understanding of adversarial training as well as further improvement of adversarial robustness.", "keywords": "adversarial training;class-wise properties;robustness;adversarial example", "primary_area": "", "supplementary_material": "", "author": "Qi Tian;Kun Kuang;Fei Wu;Yisen Wang", "authorids": "~Qi_Tian6;~Kun_Kuang1;~Fei_Wu1;~Yisen_Wang1", "gender": "M;M;M;M", "homepage": "https://github.com/TianQi-777;http://kunkuang.github.io;https://person.zju.edu.cn/wufei;https://yisenwang.github.io/", "dblp": "78/1467-3;194/4245;84/3254-1;172/1346-1", "google_scholar": ";https://scholar.google.com.hk/citations?user=FOsNiMQAAAAJ;XJLn4MYAAAAJ;uMWPDboAAAAJ", "orcid": ";0009-0000-7528-8131;;", "linkedin": ";;;", "or_profile": "~Qi_Tian6;~Kun_Kuang1;~Fei_Wu1;~Yisen_Wang1", "aff": "Zhejiang University;Zhejiang University;Zhejiang University;Peking University", "aff_domain": "zju.edu.cn;zju.edu.cn;zju.edu.cn;pku.edu.cn", "position": "PhD student;Associate Professor;Full Professor;Assistant Professor", "bibtex": "@misc{\ntian2021intriguing,\ntitle={Intriguing class-wise properties of adversarial training},\nauthor={Qi Tian and Kun Kuang and Fei Wu and Yisen Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=4CqesJ7GO7Q}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=4CqesJ7GO7Q", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "5;4;4;5", "wc_review": "608;455;1047;499", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "760;691;883;756", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 652.25, 234.6160427166054 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 772.5, 69.42802027999934 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:QWg9HxA72QsJ:scholar.google.com/&scioq=Intriguing+class-wise+properties+of+adversarial+training&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Zhejiang University;Peking University", "aff_unique_dep": ";", "aff_unique_url": "https://www.zju.edu.cn;http://www.pku.edu.cn", "aff_unique_abbr": "ZJU;Peking U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "4CxsUBDQJqv", "title": "Learning Intrinsic Symbolic Rewards in Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning effective policies for sparse objectives is a key challenge in Deep Reinforcement Learning (RL). A common approach is to design task-related dense rewards to improve task learnability. While such rewards are easily interpreted, they rely on heuristics and domain expertise. Alternate approaches that train neural networks to discover dense surrogate rewards avoid heuristics, but are high-dimensional, black-box solutions offering little interpretability. In this paper, we present a method that discovers dense rewards in the form of low-dimensional symbolic trees - thus making them more tractable for analysis. The trees use simple functional operators to map an agent's observations to a scalar reward, which then supervises the policy gradient learning of a neural network policy. We test our method on continuous action spaces in Mujoco and discrete action spaces in Atari and Pygames environments. We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks. Notably, we significantly outperform a widely used, contemporary neural-network based reward-discovery algorithm in all environments considered.", "keywords": "Reinforcement Learning;Intrinsic Rewards;Symbolic Regression", "primary_area": "", "supplementary_material": "/attachment/1d017ee8b3c2a53706299618364b921e8dd3271d.zip", "author": "Hassam Sheikh;Shauharda Khadka;Santiago Miret;Somdeb Majumdar", "authorids": "~Hassam_Sheikh1;~Shauharda_Khadka1;~Santiago_Miret1;~Somdeb_Majumdar1", "gender": "M;M;M;M", "homepage": ";https://sites.google.com/oregonstate.edu/skhadka;https://www.intel.ai/bio/santiago-miret/;https://www.intel.ai/bio/somdeb-majumdar/", "dblp": ";183/9233;241/5030;63/8320", "google_scholar": "https://scholar.google.co.uk/citations?user=QTCAAGQAAAAJ;s-4Eoi8AAAAJ;HLQ_te4AAAAJ;", "orcid": ";;0000-0002-5121-3853;", "linkedin": ";;santiago-miret/;somdebmajumdar/", "or_profile": "~Hassam_Sheikh1;~Shauharda_Khadka1;~Santiago_Miret1;~Somdeb_Majumdar1", "aff": "Intel Labs;Microsoft;Intel;Intel", "aff_domain": "intel.com;microsoft.com;intel.com;intel.com", "position": "Research Scientist;Applied Scientist;Researcher;AI/ML Researcher", "bibtex": "@misc{\nsheikh2021learning,\ntitle={Learning Intrinsic Symbolic Rewards in Reinforcement Learning},\nauthor={Hassam Sheikh and Shauharda Khadka and Santiago Miret and Somdeb Majumdar},\nyear={2021},\nurl={https://openreview.net/forum?id=4CxsUBDQJqv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=4CxsUBDQJqv", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;4;4", "wc_review": "557;553;157", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1045;734;365", "reply_reviewers": "0;0;0", "reply_authors": "2;1;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 422.3333333333333, 187.62610574106034 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 714.6666666666666, 277.9452384113741 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1535926653332478235&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Intel;Microsoft", "aff_unique_dep": "Intel Labs;Microsoft Corporation", "aff_unique_url": "https://www.intel.com;https://www.microsoft.com", "aff_unique_abbr": "Intel;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "4D4Rjrwaw3q", "title": "Black-Box Optimization Revisited: Improving Algorithm Selection Wizards through Massive Benchmarking", "track": "main", "status": "Reject", "tldr": "", "abstract": "Existing studies in black-box optimization for machine learning suffer from low\ngeneralizability, caused by a typically selective choice of problem instances used\nfor training and testing different optimization algorithms. Among other issues,\nthis practice promotes overfitting and poor-performing user guidelines. To address\nthis shortcoming, we propose in this work a benchmark suite, OptimSuite,\nwhich covers a broad range of black-box optimization problems, ranging from\nacademic benchmarks to real-world applications, from discrete over numerical\nto mixed-integer problems, from small to very large-scale problems, from noisy\nover dynamic to static problems, etc. We demonstrate the advantages of such a\nbroad collection by deriving from it Automated Black Box Optimizer (ABBO), a\ngeneral-purpose algorithm selection wizard. Using three different types of algorithm\nselection techniques, ABBO achieves competitive performance on all\nbenchmark suites. It significantly outperforms previous state of the art on some of\nthem, including YABBOB and LSGO. ABBO relies on many high-quality base\ncomponents. Its excellent performance is obtained without any task-specific\nparametrization. The benchmark collection, the ABBO wizard, its base solvers,\nas well as all experimental data are reproducible and open source in OptimSuite.", "keywords": "black-box optimization;mujoco;wizard;benchmarking;BBOB;LSGO", "primary_area": "", "supplementary_material": "", "author": "Laurent Meunier;Herilalaina Rakotoarison;Jeremy Rapin;Paco Wong;Baptiste Roziere;Olivier Teytaud;Antoine Moreau;Carola Doerr", "authorids": "~Laurent_Meunier1;~Herilalaina_Rakotoarison1;jrapin@fb.com;paco.pkwong@gmail.com;broz@fb.com;~Olivier_Teytaud2;antoine.moreau@uca.fr;~Carola_Doerr1", "gender": "M;M;;;;;;F", "homepage": ";https://scholar.google.fr/citations?user=pyws4AQAAAAJ&hl=en;;;;;;http://www-ia.lip6.fr/~doerr/", "dblp": "15/4624;242/7961;;;;;;https://dblp.uni-trier.de/pid/62/8086", "google_scholar": ";https://scholar.google.fr/citations?user=pyws4AQAAAAJ;;;;;;CU-V1sEAAAAJ", "orcid": ";;;;;;;0000-0002-4981-3227", "linkedin": ";;;;;;;", "or_profile": "~Laurent_Meunier1;~Herilalaina_Rakotoarison1;jrapin@fb.com;paco.pkwong@gmail.com;broz@fb.com;~Olivier_Teytaud2;antoine.moreau@uca.fr;~Carola_Doerr1", "aff": "Univerist\u00e9 Paris-Dauphine;INRIA;;;;;;LIP6, CNRS, Sorbonne Universit\u00e9", "aff_domain": "dauphine.fr;inria.fr;;;;;;lip6.fr", "position": "PhD student;PhD student;;;;;;CNRS researcher at Sorbonne Universit\u00e9", "bibtex": "@misc{\nmeunier2021blackbox,\ntitle={Black-Box Optimization Revisited: Improving Algorithm Selection Wizards through Massive Benchmarking},\nauthor={Laurent Meunier and Herilalaina Rakotoarison and Jeremy Rapin and Paco Wong and Baptiste Roziere and Olivier Teytaud and Antoine Moreau and Carola Doerr},\nyear={2021},\nurl={https://openreview.net/forum?id=4D4Rjrwaw3q}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=4D4Rjrwaw3q", "pdf_size": 0, "rating": "5;6;7;9", "confidence": "4;3;3;5", "wc_review": "500;318;215;862", "wc_reply_reviewers": "482;76;82;82", "wc_reply_authors": "1371;1142;724;460", "reply_reviewers": "2;2;2;1", "reply_authors": "3;3;2;2", "rating_avg": [ 6.75, 1.479019945774904 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 473.75, 246.29085955430827 ], "wc_reply_reviewers_avg": [ 180.5, 174.0883396439865 ], "wc_reply_authors_avg": [ 924.25, 354.4815756848302 ], "reply_reviewers_avg": [ 1.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.5, 0.5 ], "replies_avg": [ 23, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.560611910581388, "gs_citation": 55, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11642004469183775132&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 12, "aff_unique_index": "0;1;2", "aff_unique_norm": "Universit\u00e9 Paris-Dauphine;INRIA;Sorbonne Universit\u00e9", "aff_unique_dep": ";;LIP6", "aff_unique_url": "https://www.univ-paris-dauphine.fr;https://www.inria.fr;https://www.sorbonne-universite.fr", "aff_unique_abbr": "UPD;INRIA;SU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "France" }, { "id": "4HGL3H9eL9U", "title": "AT-GAN: An Adversarial Generative Model for Non-constrained Adversarial Examples", "track": "main", "status": "Reject", "tldr": "", "abstract": "With the rapid development of adversarial machine learning, numerous adversarial attack methods have been proposed. Typical attacks are based on a search in the neighborhood of input image to generate a perturbed adversarial example. Since 2017, generative models are adopted for adversarial attacks, and most of them focus on generating adversarial perturbations from input noise or input image. Thus the output is restricted by input for these works. A recent work targets unrestricted adversarial example using generative model but their method is based on a search in the neighborhood of input noise, so actually their output is still constrained by input. In this work, we propose AT-GAN (Adversarial Transfer on Generative Adversarial Net) to train an adversarial generative model that can directly produce adversarial examples. Different from previous works, we aim to learn the distribution of adversarial examples so as to generate semantically meaningful adversaries. AT-GAN achieves this goal by first learning a generative model for real data, followed by transfer learning to obtain the desired generative model. Once trained and transferred, AT-GAN could generate adversarial examples directly and quickly for any input noise, denoted as non-constrained adversarial examples. Extensive experiments and visualizations show that AT-GAN can efficiently generate diverse adversarial examples that are realistic to human perception, and yields higher attack success rates against adversarially trained models.\n\n ", "keywords": "adversarial examples;adversarial attack;generation-based attack;adversarial generative model;non-constrained adversarial examples", "primary_area": "", "supplementary_material": "/attachment/ebde588d468e467cfea7f8052198fa191c3aaf36.zip", "author": "Xiaosen Wang;Kun He;Chuanbiao Song;Liwei Wang;John E. Hopcroft", "authorids": "~Xiaosen_Wang1;~Kun_He1;~Chuanbiao_Song2;~Liwei_Wang1;~John_E._Hopcroft1", "gender": "M;F;M;M;M", "homepage": "https://xiaosen-wang.github.io/;http://faculty.hust.edu.cn/hekun/zh_CN/more/1411001/jsjjgd/index.htm;https://scholar.google.com/citations?user=el17bJoAAAAJ&hl=zh-CN;http://www.liweiwang-pku.com/;http://www.cs.cornell.edu/jeh/", "dblp": "241/6284;59/1028-1;228/6660;;h/JohnEHopcroft", "google_scholar": "sVeDOcsAAAAJ;YTQnGJsAAAAJ;el17bJoAAAAJ;VZHxoh8AAAAJ;4Z6vo5QAAAAJ", "orcid": ";0000-0001-7627-4604;;;0000-0001-8681-6075", "linkedin": ";;;;", "or_profile": "~Xiaosen_Wang1;~Kun_He1;~Chuanbiao_Song2;~Liwei_Wang1;~John_E._Hopcroft1", "aff": "Huazhong University of Science and Technology;Huazhong University of Sceince and Technology;Huazhong University of Science and Technology;Peking University;Department of Computer Science, Cornell University", "aff_domain": "hust.edu.cn;hust.edu.cn;hust.edu.cn;pku.edu.cn;cs.cornell.edu", "position": "MS student;Full Professor;MS student;Full Professor;Full Professor", "bibtex": "@misc{\nwang2021atgan,\ntitle={{\\{}AT{\\}}-{\\{}GAN{\\}}: An Adversarial Generative Model for Non-constrained Adversarial Examples},\nauthor={Xiaosen Wang and Kun He and Chuanbiao Song and Liwei Wang and John E. Hopcroft},\nyear={2021},\nurl={https://openreview.net/forum?id=4HGL3H9eL9U}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=4HGL3H9eL9U", "pdf_size": 0, "rating": "5;6;7", "confidence": "4;4;3", "wc_review": "498;291;105", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1380;1000;208", "reply_reviewers": "0;0;0", "reply_authors": "2;2;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 298.0, 160.51791177311023 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 862.6666666666666, 488.2221716482045 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9904227579441582615&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;1;2", "aff_unique_norm": "Huazhong University of Science and Technology;Peking University;Cornell University", "aff_unique_dep": ";;Department of Computer Science", "aff_unique_url": "http://www.hust.edu.cn;http://www.pku.edu.cn;https://www.cornell.edu", "aff_unique_abbr": "HUST;Peking U;Cornell", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;1", "aff_country_unique": "China;United States" }, { "id": "4I5THWNSjC", "title": "BasisNet: Two-stage Model Synthesis for Efficient Inference", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present BasisNet which combines recent advancements in efficient neural network architectures, conditional computation, and early termination in a simple new form. Our approach uses a lightweight model to preview an image and generate input-dependent combination coefficients, which are later used to control the synthesis of a specialist model for making more accurate final prediction. The two-stage model synthesis strategy can be used with any network architectures and both stages can be jointly trained end to end. We validated BasisNet on ImageNet classification with MobileNets as backbone, and demonstrated clear advantage on accuracy-efficiency trade-off over strong baselines such as EfficientNet (Tan & Le, 2019), FBNetV3 (Dai et al., 2020) and OFA (Cai et al., 2019). Specifically, BasisNet-MobileNetV3 obtained 80.3% top-1 accuracy with only 290M Multiply-Add operations (MAdds), halving the computational cost of previous state-of-the-art without sacrificing accuracy. Besides, since the first-stage lightweight model can independently make predictions, inference can be terminated early if the prediction is sufficiently confident. With early termination, the average cost can be further reduced to 198M MAdds while maintaining accuracy of 80.0%.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Mingda Zhang;Andrey Zhmoginov;Andrew G. Howard;Brendan Jou;Yukun Zhu;Li Zhang;Rebecca Hwa;Adriana Kovashka", "authorids": "~Mingda_Zhang1;~Andrey_Zhmoginov1;~Andrew_G._Howard1;~Brendan_Jou1;~Yukun_Zhu1;zhl@google.com;~Rebecca_Hwa1;~Adriana_Kovashka1", "gender": "M;M;;M;M;;;F", "homepage": "https://people.cs.pitt.edu/~mzhang/;;;;;;;http://people.cs.pitt.edu/~kovashka/", "dblp": "25/10133;182/1825;139/0987;120/8567;18/10777;;;51/8652.html", "google_scholar": "4aIwj4QAAAAJ;jj6IfzEAAAAJ;_9l8vD8AAAAJ;k7eC8-0AAAAJ;;;;Dl949GoAAAAJ", "orcid": ";;;0000-0001-8033-0330;;;;", "linkedin": ";;;brendanjou/;;;;", "or_profile": "~Mingda_Zhang1;~Andrey_Zhmoginov1;~Andrew_G._Howard1;~Brendan_Jou1;~Yukun_Zhu1;zhl@google.com;~Rebecca_Hwa1;~Adriana_Kovashka1", "aff": "Department of Computer Science, University of Pittsburgh;Google DeepMind;Google;Google DeepMind;Google;;;University of Pittsburgh", "aff_domain": "cs.pitt.edu;google.com;google.com;google.com;google.com;;;pitt.edu", "position": "PhD student;Researcher;Software Engineer;Research Manager;SWE;;;Assistant Professor", "bibtex": "@misc{\nzhang2021basisnet,\ntitle={BasisNet: Two-stage Model Synthesis for Efficient Inference},\nauthor={Mingda Zhang and Andrey Zhmoginov and Andrew G. Howard and Brendan Jou and Yukun Zhu and Li Zhang and Rebecca Hwa and Adriana Kovashka},\nyear={2021},\nurl={https://openreview.net/forum?id=4I5THWNSjC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=4I5THWNSjC", "pdf_size": 0, "rating": "3;6;7", "confidence": "5;4;5", "wc_review": "407;385;250", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "434;248;467", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 1.699673171197595 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 347.3333333333333, 69.4086129781856 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 383.0, 96.40539403996023 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.2773500981126146, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8868779920895586678&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;1;1;1;0", "aff_unique_norm": "University of Pittsburgh;Google", "aff_unique_dep": "Department of Computer Science;Google DeepMind", "aff_unique_url": "https://www.pitt.edu;https://deepmind.com", "aff_unique_abbr": "Pitt;DeepMind", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;1;0;1;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "4IU_xHbLiH", "title": "Fast and Differentiable Matrix Inverse and Its Extension to SVD", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Matrix inverse (Minv) and singular value decomposition (SVD) are among the most widely used matrix operations in massive data analysis, machine learning, and statistics. Although well-studied, they still encounter difficulties in practical use due to inefficiency and non-differentiability. In this paper, we aim at solving efficiency and differentiability issues through learning-based methods. First of all, to perform matrix inverse, we provide a differentiable yet efficient way, named LD-Minv, which is a learnable deep neural network (DNN) with each layer being an $L$-th order matrix polynomial. We show that, with proper initialization, the difference between LD-Minv's output and exact pseudo-inverse is in the order $O(exp{-L^K})$ where $K$ is the depth of the LD-Minv. Moreover, by learning from data, LD-Minv further reduces the difference between the output and the exact pseudo-inverse. We prove that gradient descent finds an $\\epsilon$-error minimum within $O(nKL\\log(1/\\epsilon))$ steps for LD-Minv, where n is the data size. At last, we provide the generalization bound for LD-Minv in both under-parameterized and over-parameterized settings. As an application of LD-Minv, we provide a learning-based optimization method to solve the problem with orthogonality constraints and utilize it to differentiate SVD (D-SVD). We also offer a theoretical generalization guarantee for D-SVD. Finally, we demonstrate the superiority of our methods on the synthetic and real data in the supplementary materials.", "keywords": "learning-based optimization;learning-based iterative method;differentiable matrix inverse;differentiable singular value decomposition;convergence;generalization", "primary_area": "", "supplementary_material": "", "author": "Xingyu Xie;Hao Kong;Jianlong Wu;Guangcan Liu;Zhouchen Lin", "authorids": "~Xingyu_Xie1;konghao@pku.edu.cn;~Jianlong_Wu1;~Guangcan_Liu3;~Zhouchen_Lin1", "gender": "M;;;;M", "homepage": ";;https://jlwu1992.github.io/;;https://zhouchenlin.github.io", "dblp": "174/9633;;170/4643;;l/ZhouchenLin", "google_scholar": "BpFCmZMAAAAJ;;XGeEH-IAAAAJ;;https://scholar.google.com.tw/citations?user=TanjFwoAAAAJ", "orcid": ";;;;0000-0003-1493-7569", "linkedin": ";;;;", "or_profile": "~Xingyu_Xie1;konghao@pku.edu.cn;~Jianlong_Wu1;~Guangcan_Liu3;~Zhouchen_Lin1", "aff": "Peking University;;Shandong University;;Peking University", "aff_domain": "pku.edu.cn;;sdu.edu.cn;;pku.edu.cn", "position": "PhD student;;Assistant Professor;;Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=4IU_xHbLiH", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "4;3;3;3", "wc_review": "560;319;395;259", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 383.25, 112.85471855443174 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.9271726499455306, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:DJiUpBK-KgcJ:scholar.google.com/&scioq=Fast+and+Differentiable+Matrix+Inverse+and+Its+Extension+to+SVD&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Peking University;Shandong University", "aff_unique_dep": ";", "aff_unique_url": "http://www.pku.edu.cn;http://www.sdu.edu.cn", "aff_unique_abbr": "Peking U;SDU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "title": "Fooling a Complete Neural Network Verifier", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3351", "id": "4IwieFS44l", "poster": "", "openreview": "https://openreview.net/forum?id=4IwieFS44l", "slides": "https://iclr.cc/virtual/2021/poster/3351", "video": "https://iclr.cc/virtual/2021/poster/3351", "author_site": "D\u00e1niel Zombori, Bal\u00e1zs B\u00e1nhelyi, Tibor Csendes, Istv\u00e1n Megyeri, M\u00e1rk Jelasity", "tldr": "", "abstract": "The efficient and accurate characterization of the robustness of neural networks to input perturbation is an important open problem. Many approaches exist including heuristic and exact (or complete) methods. Complete methods are expensive but their mathematical formulation guarantees that they provide exact robustness metrics. However, this guarantee is valid only if we assume that the verified network applies arbitrary-precision arithmetic and the verifier is reliable. In practice, however, both the networks and the verifiers apply limited-precision floating point arithmetic. In this paper, we show that numerical roundoff errors can be exploited to craft adversarial networks, in which the actual robustness and the robustness computed by a state-of-the-art complete verifier radically differ. We also show that such adversarial networks can be used to insert a backdoor into any network in such a way that the backdoor is completely missed by the verifier. The attack is easy to detect in its naive form but, as we show, the adversarial network can be transformed to make its detection less trivial. We offer a simple defense against our particular attack based on adding a very small perturbation to the network weights. However, our conjecture is that other numerical attacks are possible, and exact verification has to take into account all the details of the computation executed by the verified networks, which makes the problem significantly harder.\n\n", "keywords": "adversarial examples;complete verifiers;numerical errors", "primary_area": "", "supplementary_material": "", "author": "D\u00e1niel Zombori;Bal\u00e1zs B\u00e1nhelyi;Tibor Csendes;Istv\u00e1n Megyeri;M\u00e1rk Jelasity", "authorids": "zomborid@inf.u-szeged.hu;banhelyi@inf.u-szeged.hu;csendes@inf.u-szeged.hu;imegyeri@inf.u-szeged.hu;~M\u00e1rk_Jelasity1", "gender": ";;;;M", "homepage": ";;;;", "dblp": ";;;;99/618", "google_scholar": ";;;;ScDJx68AAAAJ", "orcid": ";;;;0000-0001-9363-1482", "linkedin": ";;;;", "or_profile": "zomborid@inf.u-szeged.hu;banhelyi@inf.u-szeged.hu;csendes@inf.u-szeged.hu;imegyeri@inf.u-szeged.hu;~M\u00e1rk_Jelasity1", "aff": ";;;;University of Szeged", "aff_domain": ";;;;u-szeged.hu", "position": ";;;;Full Professor", "bibtex": "@inproceedings{\nzombori2021fooling,\ntitle={Fooling a Complete Neural Network Verifier},\nauthor={D{\\'a}niel Zombori and Bal{\\'a}zs B{\\'a}nhelyi and Tibor Csendes and Istv{\\'a}n Megyeri and M{\\'a}rk Jelasity},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=4IwieFS44l}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "5;3;4;4", "wc_review": "601;409;315;238", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "544;226;81;231", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 390.75, 135.65466265484574 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 270.5, 169.0066566736352 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13803733819570187195&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=4IwieFS44l", "email": ";;;;u-szeged.hu", "author_num": 5, "aff_unique_index": "0", "aff_unique_norm": "University of Szeged", "aff_unique_dep": "", "aff_unique_url": "https://www.sze.hu", "aff_unique_abbr": "SZU", "aff_country_unique_index": "0", "aff_country_unique": "Hungary" }, { "id": "4JLiaohIk9", "title": "Motion Forecasting with Unlikelihood Training", "track": "main", "status": "Reject", "tldr": "", "abstract": "Motion forecasting is essential for making safe and intelligent decisions in robotic applications such as autonomous driving. State-of-the-art methods formulate it as a sequence-to-sequence prediction problem, which is solved in an encoder-decoder framework with a maximum likelihood estimation objective. In this paper, we show that the likelihood objective itself results in a model assigning too much probability to trajectories that are unlikely given the contextual information such as maps and states of surrounding agents. This is despite the fact that many state-of-the-art models do take contextual information as part of their input. We propose a new objective, unlikelihood training, which forces generated trajectories that conflicts with contextual information to be assigned a lower probability by our model. We demonstrate that our method can significantly improve state-of-art models\u2019 performance on challenging real-world trajectory forecasting datasets (nuScenes and Argoverse) by 8% and reduce the standard deviation by up to 50%. The code will be made available.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/81dabd0141b2e6ca76a898e0ce945c662002cb4c.zip", "author": "Deyao Zhu;Mohamed Zahran;Li Erran Li;Mohamed Elhoseiny", "authorids": "~Deyao_Zhu1;~Mohamed_Zahran1;~Li_Erran_Li1;~Mohamed_Elhoseiny1", "gender": "M;M;;M", "homepage": "https://tsutikgiau.github.io/;;http://www.cs.columbia.edu/~lierranli/;http://www.mohamed-elhoseiny.com", "dblp": "251/6017;;l/ErranLLi.html;125/2894", "google_scholar": "dENNKrsAAAAJ;https://scholar.google.com.eg/citations?user=Wdv4WLYAAAAJ;GkMfzy4AAAAJ;iRBUTOAAAAAJ", "orcid": ";0000-0002-4082-814X;;0000-0001-9659-1551", "linkedin": "deyao-zhu-205774154/;mzahran001/;;mohamed-elhoseiny-8a836215/", "or_profile": "~Deyao_Zhu1;~Mohamed_Zahran1;~Li_Erran_Li1;~Mohamed_Elhoseiny1", "aff": "KAUST;Udacity;Columbia University;KAUST", "aff_domain": "kaust.edu.sa;udacity.com;columbia.edu;kaust.edu.sa", "position": "PhD student;Program Experience Manager;Adjunct Professor;Associate Professor", "bibtex": "@misc{\nzhu2021motion,\ntitle={Motion Forecasting with Unlikelihood Training},\nauthor={Deyao Zhu and Mohamed Zahran and Li Erran Li and Mohamed Elhoseiny},\nyear={2021},\nurl={https://openreview.net/forum?id=4JLiaohIk9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=4JLiaohIk9", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "5;5;4;3", "wc_review": "734;1505;280;241", "wc_reply_reviewers": "0;172;0;0", "wc_reply_authors": "822;1635;239;199", "reply_reviewers": "0;1;0;0", "reply_authors": "2;4;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 690.0, 508.8865295918138 ], "wc_reply_reviewers_avg": [ 43.0, 74.47818472546173 ], "wc_reply_authors_avg": [ 723.75, 581.0281296976938 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 1.224744871391589 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ylGJVYhJzXMJ:scholar.google.com/&scioq=Motion+Forecasting+with+Unlikelihood+Training&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "King Abdullah University of Science and Technology;Udacity;Columbia University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.kaust.edu.sa;https://www.udacity.com;https://www.columbia.edu", "aff_unique_abbr": "KAUST;Udacity;Columbia", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "Saudi Arabia;United States" }, { "id": "4K_NaDAHc0d", "title": "Unsupervised Task Clustering for Multi-Task Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Meta-learning, transfer learning and multi-task learning have recently laid a path towards more generally applicable reinforcement learning agents that are not limited to a single task. However, most existing approaches implicitly assume a uniform similarity between tasks. We argue that this assumption is limiting in settings where the relationship between tasks is unknown a-priori. In his work, we propose a general approach to automatically cluster together similar tasks during training. Our method, inspired by the expectation-maximization algorithm, succeeds at finding clusters of related tasks and uses these to improve sample complexity. We achieve this by designing an agent with multiple policies. In the expectation step, we evaluate the performance of the policies on all tasks and assign each task to the best performing policy. In the maximization step, each policy trains by sampling tasks from its assigned set. This method is intuitive, simple to implement and orthogonal to other multi-task learning algorithms. We show the generality of our approach by evaluating on simple discrete and continuous control tasks, as well as complex bipedal walker tasks and Atari games. Results show improvements in sample complexity as well as a more general applicability when compared to other approaches.", "keywords": "Reinforcement Learning;Multi-Task Learning;Clustering;Expectation-Maximization", "primary_area": "", "supplementary_material": "/attachment/f21cfba6814b0b22c801070f0d0724dfc34a8ef0.zip", "author": "Johannes Ackermann;Oliver Paul Richter;Roger Wattenhofer", "authorids": "~Johannes_Ackermann1;~Oliver_Paul_Richter1;~Roger_Wattenhofer1", "gender": ";M;Not Specified", "homepage": "https://johannesack.github.io/;https://disco.ethz.ch/members/richtero;https://disco.ethz.ch/members/wroger", "dblp": "https://dblp.uni-trier.de/pid/249/9298;;w/RogerWattenhofer", "google_scholar": "2HvSMI8AAAAJ;;https://scholar.google.ch/citations?user=EG3VPm4AAAAJ", "orcid": ";;", "linkedin": ";;roger-wattenhofer-4466731/", "or_profile": "~Johannes_Ackermann1;~Oliver_Paul_Richter1;~Roger_Wattenhofer1", "aff": "Technical University Munich;Swiss Federal Institute of Technology;Swiss Federal Institute of Technology", "aff_domain": "tum.de;ethz.ch;ethz.ch", "position": "MS student;PhD student;Full Professor", "bibtex": "@misc{\nackermann2021unsupervised,\ntitle={Unsupervised Task Clustering for Multi-Task Reinforcement Learning},\nauthor={Johannes Ackermann and Oliver Paul Richter and Roger Wattenhofer},\nyear={2021},\nurl={https://openreview.net/forum?id=4K_NaDAHc0d}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=4K_NaDAHc0d", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "3;5;4;5", "wc_review": "501;515;419;950", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "514;629;736;1179", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 596.25, 207.50346382651063 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 764.5, 251.85958389547142 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8828738463747046115&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;1", "aff_unique_norm": "Technical University of Munich;Swiss Federal Institute of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.tum.de;https://www.ethz.ch", "aff_unique_abbr": "TUM;ETH Zurich", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1", "aff_country_unique": "Germany;Switzerland" }, { "id": "4LHz4IFGLQ-", "title": "Discrete Word Embedding for Logical Natural Language Understanding", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We propose an unsupervised neural model for learning a discrete embedding of words.\nUnlike existing discrete embeddings, our binary embedding supports vector arithmetic operations similar to continuous embeddings.\nOur embedding represents each word as a set of propositional statements describing a transition rule in classical/STRIPS planning formalism.\nThis makes the embedding directly compatible with symbolic, state of the art classical planning solvers.", "keywords": "NLP;word embedding;discrete VAE;classical planning;neural symbolic", "primary_area": "", "supplementary_material": "/attachment/a357d360c4c55bc7b9ebe5de354072ece1e89235.zip", "author": "Zilu Tang;Masataro Asai", "authorids": "~Zilu_Tang1;~Masataro_Asai1", "gender": "M;M", "homepage": "https://pootiet.github.io/;https://guicho271828.github.io/", "dblp": "266/2889;149/1319", "google_scholar": "E9g28XEAAAAJ;https://scholar.google.co.jp/citations?user=b4UzH5AAAAAJ", "orcid": ";", "linkedin": "peter-tang-83802495/;masataro-asai-158a0638/", "or_profile": "~Zilu_Tang1;~Masataro_Asai1", "aff": "International Business Machines;IBM Research / MIT-IBM Watson AI Lab", "aff_domain": "ibm.com;ibm.com", "position": "Researcher;Research Staff Member", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=4LHz4IFGLQ-", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "3;2;3;3", "wc_review": "498;992;257;217", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "370;982;192;348", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 491.0, 308.5781910634645 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 473.0, 301.7764072952026 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3513475670377511291&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1", "aff_unique_norm": "International Business Machines Corporation;IBM", "aff_unique_dep": ";Research", "aff_unique_url": "https://www.ibm.com;https://www.ibm.com/research", "aff_unique_abbr": "IBM;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "4NNQ3l2hbN0", "title": "Search Data Structure Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "In our modern world, an enormous amount of data surrounds us, and we are rarely interested in more than a handful of data points at once. It is like searching for needles in a haystack, and in many cases, there is no better algorithm than a random search, which might not be viable. Previously proposed algorithms for efficient database access are made for particular applications such as finding the min/max, finding all points within a range or finding the k-nearest neighbours. Consequently, there is a lack of versatility concerning what we can search when it comes to a gigantic database. In this work, we propose Search Data Structure Learning (SDSL), a generalization of the standard Search Data Structure (SDS) in which the machine has to learn how to search in the database. To evaluate approaches in this field, we propose a novel metric called Sequential Search Work Ratio (SSWR), a natural way of measuring a search's efficiency and quality. Finally, we inaugurate the field with the Efficient Learnable Binary Access (ELBA), a family of models for Search Data Structure Learning. It requires a means to train two parametric functions and a search data structure for binary codes. For the training, we developed a novel loss function, the F-beta Loss. For the SDS, we describe the Multi-Bernoulli Search (MBS), a novel approach for probabilistic binary codes. Finally, we exhibit the F-beta Loss and the MBS synergy by experimentally showing that it is at least twice as better than using the alternative loss functions of MIHash and HashNet and twenty times better than with another SDS based on the Hamming radius.", "keywords": "Machine Learning;Search Data Structure;Information Retrieval;Binary Embeddings", "primary_area": "", "supplementary_material": "", "author": "Mathieu Duchesneau;Hansenclever Bassani;Alain Tapp", "authorids": "~Mathieu_Duchesneau1;~Hansenclever_Bassani1;tappa@iro.umontreal.ca", "gender": "M;M;", "homepage": ";https://hfbassani.github.io/;", "dblp": ";93/6335;", "google_scholar": "i-ydcvoAAAAJ;https://scholar.google.ca/citations?user=s14pJ00AAAAJ;", "orcid": ";0000-0001-5307-9400;", "linkedin": ";hansbassani/;", "or_profile": "~Mathieu_Duchesneau1;~Hansenclever_Bassani1;tappa@iro.umontreal.ca", "aff": "Montreal Institute for Learning Algorithms, University of Montreal, Universit\u00e9 de Montr\u00e9al;Universidade Federal de Pernambuco, Federal University of Pernambuco;", "aff_domain": "mila.umontreal.ca;cin.ufpe.br;", "position": "PhD student;Assistant Professor;", "bibtex": "@misc{\nduchesneau2021search,\ntitle={Search Data Structure Learning},\nauthor={Mathieu Duchesneau and Hansenclever Bassani and Alain Tapp},\nyear={2021},\nurl={https://openreview.net/forum?id=4NNQ3l2hbN0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=4NNQ3l2hbN0", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "4;3;4;4", "wc_review": "665;192;263;301", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1065;0;538;237", "reply_reviewers": "0;0;0;0", "reply_authors": "2;0;1;1", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 355.25, 183.06334286251848 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 460.0, 397.9440915505594 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.7071067811865476 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1", "aff_unique_norm": "University of Montreal;Universidade Federal de Pernambuco", "aff_unique_dep": "Montreal Institute for Learning Algorithms;", "aff_unique_url": "https://www.mila.quebec;https://ufpe.br", "aff_unique_abbr": "MILA;UFPE", "aff_campus_unique_index": "0", "aff_campus_unique": "Montreal;", "aff_country_unique_index": "0;1", "aff_country_unique": "Canada;Brazil" }, { "id": "4NrO5vqdkwj", "title": "Semantic Inference Network for Few-shot Streaming Label Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Streaming label learning aims to model newly emerged labels for multi-label classification systems, which requires plenty of new label data for training. However, in changing environments, only a small amount of new label data can practically be collected. In this work, we formulate and study few-shot streaming label learning (FSLL), which models emerging new labels with only a few annotated examples by utilizing the knowledge learned from past labels. We propose a meta-learning framework, Semantic Inference Network (SIN), which can learn and infer the semantic correlation between new labels and past labels to adapt FSLL tasks from a few examples effectively. SIN leverages label semantic representation to regularize the output space and acquires label-wise meta-knowledge based on the gradient-based meta-learning. Moreover, SIN incorporates a novel label decision module with a meta-threshold loss to find the optimal confidence thresholds for each new label. Theoretically, we demonstrate that the proposed semantic inference mechanism could constrain the complexity of hypotheses space to reduce the risk of overfitting and achieve better generalizability. Experimentally, extensive empirical results and ablation studies illustrate the superior performance of SIN over the prior state-of-the-art methods on FSLL.", "keywords": "streaming label learning;multi-label learning;smenatci inference;few-shot learning", "primary_area": "", "supplementary_material": "/attachment/6b7d086ddd3f3c9efe4f7c2f9d743dcb8790530c.zip", "author": "Zhen Wang;Liu Liu;Yiqun Duan;Dacheng Tao", "authorids": "~Zhen_Wang9;~Liu_Liu8;~Yiqun_Duan1;~Dacheng_Tao1", "gender": ";F;M;", "homepage": ";;https://github.com/DuanYiqun;", "dblp": ";74/7037-14;248/5526;", "google_scholar": ";FvGjCqEAAAAJ;https://scholar.google.com.au/citations?user=GoQKrD0AAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Zhen_Wang9;~Liu_Liu8;~Yiqun_Duan1;~Dacheng_Tao1", "aff": ";University of Sydney;University of Technology Sydney;", "aff_domain": ";sydney.edu.au;uts.edu.au;", "position": ";Postdoc;PhD student;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=4NrO5vqdkwj", "pdf_size": 0, "rating": "4;4;5;8", "confidence": "5;5;4;4", "wc_review": "302;251;669;279", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.25, 1.6393596310755 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 375.25, 170.55552614911076 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7624928516630233, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Fdnnoes6p38J:scholar.google.com/&scioq=Semantic+Inference+Network+for+Few-shot+Streaming+Label+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Sydney;University of Technology Sydney", "aff_unique_dep": ";", "aff_unique_url": "https://www.sydney.edu.au;https://www.uts.edu.au", "aff_unique_abbr": "USYD;UTS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Australia" }, { "id": "4Nt1F3qf9Gn", "title": "CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients", "track": "main", "status": "Reject", "tldr": "", "abstract": "The healthcare industry generates troves of unlabelled physiological data. This data can be exploited via contrastive learning, a self-supervised pre-training method that encourages representations of instances to be similar to one another. We propose a family of contrastive learning methods, CLOCS, that encourages representations across space, time, \\textit{and} patients to be similar to one another. We show that CLOCS consistently outperforms the state-of-the-art methods, BYOL and SimCLR, when performing a linear evaluation of, and fine-tuning on, downstream tasks. We also show that CLOCS achieves strong generalization performance with only 25\\% of labelled training data. Furthermore, our training procedure naturally generates patient-specific representations that can be used to quantify patient-similarity. ", "keywords": "Contrastive learning;physiological signals;healthcare", "primary_area": "", "supplementary_material": "/attachment/7c3e0c5429a691eacd3a74640898fc0e988df17c.zip", "author": "Dani Kiyasseh;Tingting Zhu;David A. Clifton", "authorids": "~Dani_Kiyasseh1;tingting.zhu@eng.ox.ac.uk;~David_A._Clifton1", "gender": ";;M", "homepage": "https://danikiyasseh.github.io/;;http://www.eng.ox.ac.uk/chi", "dblp": ";;89/6424", "google_scholar": "UD1oO4MAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Dani_Kiyasseh1;tingting.zhu@eng.ox.ac.uk;~David_A._Clifton1", "aff": "University of Oxford;;University of Oxford", "aff_domain": "oxford.ac.uk;;ox.ac.uk", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nkiyasseh2021clocs,\ntitle={{\\{}CLOCS{\\}}: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients},\nauthor={Dani Kiyasseh and Tingting Zhu and David A. Clifton},\nyear={2021},\nurl={https://openreview.net/forum?id=4Nt1F3qf9Gn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=4Nt1F3qf9Gn", "pdf_size": 0, "rating": "4;4;5;7", "confidence": "5;5;3;4", "wc_review": "323;405;615;189", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "569;201;570;253", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 383.0, 154.5509624686951 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 398.25, 172.23439697110447 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4923659639173309, "gs_citation": 222, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16333919134757348473&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "4P35MfnBQIY", "title": "Consistency and Monotonicity Regularization for Neural Knowledge Tracing", "track": "main", "status": "Reject", "tldr": "", "abstract": "Knowledge Tracing (KT), tracking a human's knowledge acquisition, is a central component in online learning and AI in Education. In this paper, we present a simple, yet effective strategy to improve the generalization ability of KT models: we propose three types of novel data augmentation, coined replacement, insertion, and deletion, along with corresponding regularization losses that impose certain consistency or monotonicity bias on model's predictions for the original and augmented sequence. Extensive experiments on various KT benchmarks show that our regularization scheme significantly improve the prediction performances, under 3 widely-used neural networks and 4 public benchmarks for KT, e.g., it yields 6.3% improvement in AUC under the DKT model and the ASSISTmentsChall dataset. ", "keywords": "knowledge tracing;data augmentation;regularization", "primary_area": "", "supplementary_material": "/attachment/4e3ad5156e989413dcf4d97600a30d985a6de88f.zip", "author": "Seewoo Lee;Youngduck Choi;Juneyoung Park;Byungsoo Kim;Jinwoo Shin", "authorids": "~Seewoo_Lee1;~Youngduck_Choi2;~Juneyoung_Park1;~Byungsoo_Kim1;~Jinwoo_Shin1", "gender": "M;;M;M;M", "homepage": "https://seewoo5.github.io/;https://scholar.google.com/citations?user=wIhH6qYAAAAJ&hl=en;;;https://sites.google.com/site/mijirim/", "dblp": ";;;83/10959;31/7062", "google_scholar": "jdFDv6IAAAAJ;;https://scholar.google.com/citations?hl=en;cHHArokAAAAJ;https://scholar.google.com.tw/citations?user=m3eDp7kAAAAJ", "orcid": ";;;;", "linkedin": ";;jason-juneyoung-park-ph-d-1719702a/;;", "or_profile": "~Seewoo_Lee1;~Youngduck_Choi2;~Juneyoung_Park1;~Byungsoo_Kim1;~Jinwoo_Shin1", "aff": "Riiid! AI Research;Riiid! AI Research;Riiid AI Research;Riiid! AI Research;Korea Advanced Institute of Science & Technology", "aff_domain": "riiid.co;riiid.co;riiid.co;riiid.co;kaist.ac.kr", "position": "Research Scientist;Research Scientist;Research Scientist;Research Scientist;Associate Professor", "bibtex": "@misc{\nlee2021consistency,\ntitle={Consistency and Monotonicity Regularization for Neural Knowledge Tracing},\nauthor={Seewoo Lee and Youngduck Choi and Juneyoung Park and Byungsoo Kim and Jinwoo Shin},\nyear={2021},\nurl={https://openreview.net/forum?id=4P35MfnBQIY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=4P35MfnBQIY", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "3;4;2;4", "wc_review": "584;298;159;415", "wc_reply_reviewers": "0;0;0;3", "wc_reply_authors": "1117;380;608;242", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 364.0, 156.03044574697594 ], "wc_reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "wc_reply_authors_avg": [ 586.75, 332.87187850582995 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.1348399724926484, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12409952598120338514&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1;0;2", "aff_unique_norm": "Riiid! AI Research;Riiid;Korea Advanced Institute of Science and Technology", "aff_unique_dep": "AI Research;AI Research;", "aff_unique_url": "https://www.riiid.com;https://www.riiid.com;https://www.kaist.ac.kr", "aff_unique_abbr": "Riiid!;Riiid;KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "4QpDyzCoH01", "title": "Improving Zero-Shot Neural Architecture Search with Parameters Scoring", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The exceptional success of deep learning comes at the cost of long training sessions, and a slow iterative process of proposing new architectures that have to be hand-engineered through years of experience. Neural Architecture Search (NAS) is the line of research that tries to automatically design architectures with better performances at a given task. The performance of a network in a task can be predicted by a score, even before the network is trained: this is referred to as zero-shot NAS. However, the existing score remains unreliable for architectures with high accuracy. We develop in this direction by exploring different related scores. We study their time efficiency and we improve on their dependence with the final accuracy, especially for high values of the score. We propose a monotonicity metric to evaluate the adequate relative scoring of the architectures, as a way to avoid imposing a linearity assumption too early. We find that our use of noise improves the score, but a more substantial improvement comes when the evaluation of the score is done in the parameter space. We hope this effort will help clarify promising directions to speed up automatic discovery of good neural architectures without training.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Luca Celotti;Ismael Balafrej;Emmanuel Calvet", "authorids": "~Luca_Celotti1;ismael.balafrej@usherbrooke.ca;emmanuel.calvet@usherbrooke.ca", "gender": "M;;", "homepage": "https://lucehe.github.io/;;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Luca_Celotti1;ismael.balafrej@usherbrooke.ca;emmanuel.calvet@usherbrooke.ca", "aff": "Universit\u00e9 de Sherbrooke;;", "aff_domain": "usherbrooke.ca;;", "position": "PhD student;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=4QpDyzCoH01", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;4;2;4", "wc_review": "323;278;324;426", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "473;380;293;216", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 337.75, 54.23271614072082 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 340.5, 96.01171803483156 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14299847641312449629&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Universit\u00e9 de Sherbrooke", "aff_unique_dep": "", "aff_unique_url": "https://www.usherbrooke.ca", "aff_unique_abbr": "UdeS", "aff_country_unique_index": "0", "aff_country_unique": "Canada" }, { "title": "Teaching with Commentaries", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2551", "id": "4RbdgBh9gE", "poster": "", "openreview": "https://openreview.net/forum?id=4RbdgBh9gE", "slides": "https://iclr.cc/virtual/2021/poster/2551", "video": "https://iclr.cc/virtual/2021/poster/2551", "author_site": "Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton", "tldr": "", "abstract": "Effective training of deep neural networks can be challenging, and there remain many open questions on how to best learn these models. Recently developed methods to improve neural network training examine teaching: providing learned information during the training process to improve downstream model performance. In this paper, we take steps towards extending the scope of teaching. We propose a flexible teaching framework using commentaries, learned meta-information helpful for training on a particular task. We present gradient-based methods to learn commentaries, leveraging recent work on implicit differentiation for scalability. We explore diverse applications of commentaries, from weighting training examples, to parameterising label-dependent data augmentation policies, to representing attention masks that highlight salient image regions. We find that commentaries can improve training speed and/or performance, and provide insights about the dataset and training process. We also observe that commentaries generalise: they can be reused when training new models to obtain performance benefits, suggesting a use-case where commentaries are stored with a dataset and leveraged in future for improved model training. ", "keywords": "learning to teach;metalearning;hypergradients", "primary_area": "", "supplementary_material": "", "author": "Aniruddh Raghu;Maithra Raghu;Simon Kornblith;David Duvenaud;Geoffrey Hinton", "authorids": "~Aniruddh_Raghu1;~Maithra_Raghu1;~Simon_Kornblith1;~David_Duvenaud2;~Geoffrey_Hinton1", "gender": "M;F;M;M;M", "homepage": "http://aniruddhraghu.com/;http://maithraraghu.com/;;https://www.cs.toronto.edu/~duvenaud/;https://www.cs.toronto.edu/~hinton/bio.html", "dblp": "200/8793;;220/4059;86/9380;10/3248", "google_scholar": "hvnqk7YAAAAJ;tiE4g64AAAAJ;1O3RPmsAAAAJ;https://scholar.google.ca/citations?user=ZLpO3XQAAAAJ;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Aniruddh_Raghu1;~Maithra_Raghu1;~Simon_Kornblith1;~David_Duvenaud2;~Geoffrey_Hinton1", "aff": "Massachusetts Institute of Technology;Google Brain;Google;Department of Computer Science, University of Toronto;University of Toronto", "aff_domain": "mit.edu;cornell.edu;google.com;cs.toronto.edu;utoronto.ca", "position": "PhD student;Senior Research Scientist;Research Scientist;Assistant Professor;Full Professor", "bibtex": "@inproceedings{\nraghu2021teaching,\ntitle={Teaching with Commentaries},\nauthor={Aniruddh Raghu and Maithra Raghu and Simon Kornblith and David Duvenaud and Geoffrey Hinton},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=4RbdgBh9gE}\n}", "github": "[![github](/images/github_icon.svg) googleinterns/commentaries](https://github.com/googleinterns/commentaries)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "5;4;3;4", "wc_review": "338;373;441;522", "wc_reply_reviewers": "0;0;65;0", "wc_reply_authors": "557;560;362;240", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 418.5, 70.30113797087498 ], "wc_reply_reviewers_avg": [ 16.25, 28.145825622994256 ], "wc_reply_authors_avg": [ 429.75, 135.78728769660287 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 31, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12512263277235607257&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=4RbdgBh9gE", "email": "mit.edu;cornell.edu;google.com;cs.toronto.edu;utoronto.ca", "author_num": 5, "aff_unique_index": "0;1;1;2;2", "aff_unique_norm": "Massachusetts Institute of Technology;Google;University of Toronto", "aff_unique_dep": ";Google Brain;Department of Computer Science", "aff_unique_url": "https://web.mit.edu;https://brain.google.com;https://www.utoronto.ca", "aff_unique_abbr": "MIT;Google Brain;U of T", "aff_campus_unique_index": "1;1;2", "aff_campus_unique": ";Mountain View;Toronto", "aff_country_unique_index": "0;0;0;1;1", "aff_country_unique": "United States;Canada" }, { "id": "4SZ9Ft--pDl", "title": "Prior-guided Bayesian Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "While Bayesian Optimization (BO) is a very popular method for optimizing expensive black-box functions, it fails to leverage the experience of domain experts. This causes BO to waste function evaluations on bad design choices (e.g., machine learning hyperparameters) that the expert already knows to work poorly. To address this issue, we introduce Prior-guided Bayesian Optimization (PrBO). PrBO allows users to inject their knowledge into the optimization process in the form of priors about which parts of the input space will yield the best performance, rather than BO\u2019s standard priors over functions (which are much less intuitive for users). PrBO then combines these priors with BO\u2019s standard probabilistic model to form a pseudo-posterior used to select which points to evaluate next. We show that PrBO is around 12x faster than state-of-the-art methods without user priors and 10,000x faster than random search on a common suite of benchmarks, and achieves a new state-of-the-art performance on a real-world hardware design application. We also show that PrBO converges faster even if the user priors are not entirely accurate and that it robustly recovers from misleading priors.", "keywords": "Bayesian Optimization;Automated Machine Learning", "primary_area": "", "supplementary_material": "/attachment/d4b21223a1021cfa956b7086c110f6656a2a969f.zip", "author": "Artur Souza;Luigi Nardi;Leonardo Oliveira;Kunle Olukotun;Marius Lindauer;Frank Hutter", "authorids": "~Artur_Souza1;~Luigi_Nardi1;~Leonardo_Oliveira1;~Kunle_Olukotun1;~Marius_Lindauer1;~Frank_Hutter1", "gender": "M;M;;M;M;M", "homepage": "http://buscatextual.cnpq.br/buscatextual/visualizacv.do;jsessionid=D9822F24D364299D11D8868EF46ADB21.buscatextual_0;;;https://profiles.stanford.edu/oyekunle-olukotun;https://www.ai.uni-hannover.de/de/institut/team/lindauer;http://ml.informatik.uni-freiburg.de/~hutter/", "dblp": ";60/7206;;o/KunleOlukotun;28/9142;89/5383", "google_scholar": ";https://scholar.google.it/citations?user=Kgs3zQoAAAAJ;;https://scholar.google.com.tw/citations?user=IzXDyR8AAAAJ;https://scholar.google.de/citations?user=0Sxx7DUAAAAJ;https://scholar.google.de/citations?user=YUrxwrkAAAAJ", "orcid": ";0000-0002-4601-2264;;;;0000-0002-2037-3694", "linkedin": ";nardiluigi/;;;;frank-hutter-9190b24b/", "or_profile": "~Artur_Souza1;~Luigi_Nardi1;~Leonardo_Oliveira1;~Kunle_Olukotun1;~Marius_Lindauer1;~Frank_Hutter1", "aff": "Universidade Federal de Minas Gerais;Stanford University;Universidade Federal de Minas Gerais, Universidade Federal de Minas Gerais;Stanford University;Leibniz Universit\u00e4t Hannover;Albert-Ludwigs-Universit\u00e4t Freiburg", "aff_domain": "ufmg.br;stanford.edu;dcc.ufmg.br;stanford.edu;uni-hannover.de;uni-freiburg.de", "position": "PhD student;Researcher;Associate Professor;Full Professor;Associate Professor;Full Professor", "bibtex": "@misc{\nsouza2021priorguided,\ntitle={Prior-guided Bayesian Optimization},\nauthor={Artur Souza and Luigi Nardi and Leonardo Oliveira and Kunle Olukotun and Marius Lindauer and Frank Hutter},\nyear={2021},\nurl={https://openreview.net/forum?id=4SZ9Ft--pDl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=4SZ9Ft--pDl", "pdf_size": 0, "rating": "3;4;4;6;8", "confidence": "4;4;4;5;4", "wc_review": "467;363;480;331;332", "wc_reply_reviewers": "0;0;0;32;0", "wc_reply_authors": "1088;690;465;519;281", "reply_reviewers": "0;0;0;1;0", "reply_authors": "2;1;1;1;1", "rating_avg": [ 5.0, 1.7888543819998317 ], "confidence_avg": [ 4.2, 0.39999999999999997 ], "wc_review_avg": [ 394.6, 65.57011514401968 ], "wc_reply_reviewers_avg": [ 6.4, 12.8 ], "wc_reply_authors_avg": [ 608.6, 272.9180096659068 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 1.2, 0.4000000000000001 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.2795084971874737, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14051429574356110814&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;0;1;2;3", "aff_unique_norm": "Universidade Federal de Minas Gerais;Stanford University;Leibniz Universit\u00e4t Hannover;Albert-Ludwigs-Universit\u00e4t Freiburg", "aff_unique_dep": ";;;", "aff_unique_url": "https://ufmg.br;https://www.stanford.edu;https://www.leibniz.uni-hannover.de/;https://www.uni-freiburg.de", "aff_unique_abbr": "UFMG;Stanford;LUH;Albert-Ludwigs-Universit\u00e4t", "aff_campus_unique_index": "1;1;2", "aff_campus_unique": ";Stanford;Freiburg", "aff_country_unique_index": "0;1;0;1;2;2", "aff_country_unique": "Brazil;United States;Germany" }, { "id": "4SiMia0kjba", "title": "Causal Probabilistic Spatio-temporal Fusion Transformers in Two-sided Ride-Hailing Markets", "track": "main", "status": "Reject", "tldr": "", "abstract": "Achieving accurate spatio-temporal predictions in large-scale systems is extremely valuable in many real-world applications, such as weather forecasts, retail forecasting, and urban traffic forecasting. So far, most existing methods for multi-horizon, multi-task and multi-target predictions select important predicting variables via their correlations with responses, and thus it is highly possible that many forecasting models generated from those methods are not causal, leading to poor interpretability. The aim of this paper is to develop a collaborative causal spatio-temporal fusion transformer, named CausalTrans, to establish the collaborative causal effects of predictors on multiple forecasting targets, such as supply and demand in ride-sharing platforms. Specifically, we integrate the causal attention with the Conditional Average Treatment Effect (CATE) estimation method for causal inference. Moreover, we propose a novel and fast multi-head attention evolved from Taylor expansion instead of softmax, reducing time complexity from $O(\\mathcal{V}^2)$ to $O(\\mathcal{V})$, where $\\mathcal{V}$ is the number of nodes in a graph. We further design a spatial graph fusion mechanism to significantly reduce the parameters' scale. We conduct a wide range of experiments to demonstrate the interpretability of causal attention, the effectiveness of various model components, and the time efficiency of our CausalTrans. As shown in these experiments, our CausalTrans framework can achieve up to 15$\\%$ error reduction compared with various baseline methods. ", "keywords": "Spatio-temporal Prediction;Causal Inference;Efficient Transformers;Two-sided Markets", "primary_area": "", "supplementary_material": "", "author": "Shixiang Wan;Shikai Luo;Hongtu Zhu", "authorids": "~Shixiang_Wan1;~Shikai_Luo1;~Hongtu_Zhu2", "gender": "M;M;M", "homepage": "https://shixiangwan.github.io/;;https://bigkp.org", "dblp": ";149/2528;03/5683", "google_scholar": "ejCUXzUAAAAJ;9Gfqd4UAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": "0000-0001-8644-9893;;0000-0002-6781-2690", "linkedin": ";shikai-luo-6b08b559/;", "or_profile": "~Shixiang_Wan1;~Shikai_Luo1;~Hongtu_Zhu2", "aff": "Duxiaoman;Tencent;University of North Carolina at Chapel Hill", "aff_domain": "duxiaoman.com;tencent.com;unc.edu", "position": "Researcher;Principal Researcher;Full Professor", "bibtex": "@misc{\nwan2021causal,\ntitle={Causal Probabilistic Spatio-temporal Fusion Transformers in Two-sided Ride-Hailing Markets},\nauthor={Shixiang Wan and Shikai Luo and Hongtu Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=4SiMia0kjba}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer5", "site": "https://openreview.net/forum?id=4SiMia0kjba", "pdf_size": 0, "rating": "2;5;6;6", "confidence": "5;4;3;2", "wc_review": "733;265;381;249", "wc_reply_reviewers": "300;0;0;0", "wc_reply_authors": "1873;294;533;537", "reply_reviewers": "2;0;0;0", "reply_authors": "5;1;1;1", "rating_avg": [ 4.75, 1.6393596310755 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 407.0, 194.98717906570164 ], "wc_reply_reviewers_avg": [ 75.0, 129.9038105676658 ], "wc_reply_authors_avg": [ 809.25, 621.9888966050761 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.0, 1.7320508075688772 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8865926413116155, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:zEjhTguYs-AJ:scholar.google.com/&scioq=Causal+Probabilistic+Spatio-temporal+Fusion+Transformers+in+Two-sided+Ride-Hailing+Markets&hl=en&as_sdt=0,33", "gs_version_total": 2, "aff_unique_index": "0;1;2", "aff_unique_norm": "Duxiaoman;Tencent;University of North Carolina", "aff_unique_dep": ";Tencent Holdings Limited;", "aff_unique_url": ";https://www.tencent.com;https://www.unc.edu", "aff_unique_abbr": ";Tencent;UNC", "aff_campus_unique_index": "1", "aff_campus_unique": ";Chapel Hill", "aff_country_unique_index": "0;0;1", "aff_country_unique": "China;United States" }, { "title": "Differentiable Segmentation of Sequences", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2993", "id": "4T489T4yav", "poster": "", "openreview": "https://openreview.net/forum?id=4T489T4yav", "slides": "https://iclr.cc/virtual/2021/poster/2993", "video": "https://iclr.cc/virtual/2021/poster/2993", "author_site": "Erik Scharw\u00e4chter, Jonathan Lennartz, Emmanuel M\u00fcller", "tldr": "", "abstract": "Segmented models are widely used to describe non-stationary sequential data with discrete change points. Their estimation usually requires solving a mixed discrete-continuous optimization problem, where the segmentation is the discrete part and all other model parameters are continuous. A number of estimation algorithms have been developed that are highly specialized for their specific model assumptions. The dependence on non-standard algorithms makes it hard to integrate segmented models in state-of-the-art deep learning architectures that critically depend on gradient-based optimization techniques. In this work, we formulate a relaxed variant of segmented models that enables joint estimation of all model parameters, including the segmentation, with gradient descent. We build on recent advances in learning continuous warping functions and propose a novel family of warping functions based on the two-sided power (TSP) distribution. TSP-based warping functions are differentiable, have simple closed-form expressions, and can represent segmentation functions exactly. Our formulation includes the important class of segmented generalized linear models as a special case, which makes it highly versatile. We use our approach to model the spread of COVID-19 with Poisson regression, apply it on a change point detection task, and learn classification models with concept drift. The experiments show that our approach effectively learns all these tasks with standard algorithms for gradient descent.", "keywords": "segmented models;segmentation;change point detection;concept drift;warping functions;gradient descent", "primary_area": "", "supplementary_material": "/attachment/9eea5e4e8e48580fce84defe028906d95f0bd3aa.zip", "author": "Erik Scharw\u00e4chter;Jonathan Lennartz;Emmanuel M\u00fcller", "authorids": "~Erik_Scharw\u00e4chter1;jlen@uni-bonn.de;emmanuel.mueller@cs.tu-dortmund.de", "gender": "M;;", "homepage": ";;", "dblp": "130/9973.html;;", "google_scholar": "https://scholar.google.de/citations?user=VAFW_l0AAAAJ;;", "orcid": "0000-0001-8555-2629;;", "linkedin": "erik-scharwaechter/;;", "or_profile": "~Erik_Scharw\u00e4chter1;jlen@uni-bonn.de;emmanuel.mueller@cs.tu-dortmund.de", "aff": "TU Dortmund;;", "aff_domain": "tu-dortmund.de;;", "position": "PhD student;;", "bibtex": "@inproceedings{\nscharw{\\\"a}chter2021differentiable,\ntitle={Differentiable Segmentation of Sequences},\nauthor={Erik Scharw{\\\"a}chter and Jonathan Lennartz and Emmanuel M{\\\"u}ller},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=4T489T4yav}\n}", "github": "[![github](/images/github_icon.svg) diozaka/diffseg](https://github.com/diozaka/diffseg)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7", "confidence": "3;3;3", "wc_review": "476;240;282", "wc_reply_reviewers": "0;0;80", "wc_reply_authors": "542;358;481", "reply_reviewers": "0;0;1", "reply_authors": "1;1;2", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 332.6666666666667, 102.79213112987892 ], "wc_reply_reviewers_avg": [ 26.666666666666668, 37.71236166328253 ], "wc_reply_authors_avg": [ 460.3333333333333, 76.52595783276563 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=460118456936482519&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=4T489T4yav", "email": "tu-dortmund.de;;", "author_num": 3, "aff_unique_index": "0", "aff_unique_norm": "Technische Universit\u00e4t Dortmund", "aff_unique_dep": "", "aff_unique_url": "https://www.tu-dortmund.de", "aff_unique_abbr": "TU Dortmund", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "title": "Latent Convergent Cross Mapping", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2939", "id": "4TSiOTkKe5P", "poster": "", "openreview": "https://openreview.net/forum?id=4TSiOTkKe5P", "slides": "https://iclr.cc/virtual/2021/poster/2939", "video": "https://iclr.cc/virtual/2021/poster/2939", "author_site": "Edward De Brouwer, Adam Arany, Jaak Simm, Yves Moreau", "tldr": "", "abstract": "Discovering causal structures of temporal processes is a major tool of scientific inquiry because it helps us better understand and explain the mechanisms driving a phenomenon of interest, thereby facilitating analysis, reasoning, and synthesis for such systems. \nHowever, accurately inferring causal structures within a phenomenon based on observational data only is still an open problem. Indeed, this type of data usually consists in short time series with missing or noisy values for which causal inference is increasingly difficult. In this work, we propose a method to uncover causal relations in chaotic dynamical systems from short, noisy and sporadic time series (that is, incomplete observations at infrequent and irregular intervals) where the classical convergent cross mapping (CCM) fails. Our method works by learning a Neural ODE latent process modeling the state-space dynamics of the time series and by checking the existence of a continuous map between the resulting processes. We provide theoretical analysis and show empirically that Latent-CCM can reliably uncover the true causal pattern, unlike traditional methods.", "keywords": "Causality;Time Series;Chaos;Neural ODE;Missing Values", "primary_area": "", "supplementary_material": "/attachment/f6fa79470cb33e34423d0c0a38dd57330fdfb1bb.zip", "author": "Edward De Brouwer;Adam Arany;Jaak Simm;Yves Moreau", "authorids": "~Edward_De_Brouwer1;~Adam_Arany1;~Jaak_Simm1;~Yves_Moreau2", "gender": "M;;;M", "homepage": "https://edwarddebrouwer.xyz;;;", "dblp": ";178/0111;;", "google_scholar": "-Pm4XtAAAAAJ;QH9zWmAAAAAJ;https://scholar.google.be/citations?user=NFMk1ZYAAAAJ;zWftTEUAAAAJ", "orcid": ";0000-0002-4901-7650;;", "linkedin": "edwarddebrouwer/;;;", "or_profile": "~Edward_De_Brouwer1;~Adam_Arany1;~Jaak_Simm1;~Yves_Moreau2", "aff": "KU Leuven;KU Leuven;KU Leuven;University of Leuven", "aff_domain": "kuleuven.be;kuleuven.be;kuleuven.be;kuleuven.be", "position": "PhD student;Researcher;Research Expert;Professor", "bibtex": "@inproceedings{\nbrouwer2021latent,\ntitle={Latent Convergent Cross Mapping},\nauthor={Edward De Brouwer and Adam Arany and Jaak Simm and Yves Moreau},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=4TSiOTkKe5P}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;4;2;4", "wc_review": "484;399;287;956", "wc_reply_reviewers": "124;0;0;0", "wc_reply_authors": "1528;1231;999;1189", "reply_reviewers": "1;0;0;0", "reply_authors": "2;2;2;3", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 531.5, 254.84946537122656 ], "wc_reply_reviewers_avg": [ 31.0, 53.693575034635195 ], "wc_reply_authors_avg": [ 1236.75, 189.51566557939213 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 37, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16959874179055147438&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=4TSiOTkKe5P", "email": "kuleuven.be;kuleuven.be;kuleuven.be;kuleuven.be", "author_num": 4, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Katholieke Universiteit Leuven;University of Leuven", "aff_unique_dep": ";", "aff_unique_url": "https://www.kuleuven.be;https://www.kuleuven.be", "aff_unique_abbr": "KU Leuven;KU Leuven", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Belgium" }, { "id": "4Un_FnHiN8C", "title": "Architecture Agnostic Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we explore an alternate method for synthesizing neural network architectures, inspired by the brain's stochastic synaptic pruning. During a person\u2019s lifetime, numerous distinct neuronal architectures are responsible for performing the same tasks. This indicates that biological neural networks are, to some degree, architecture agnostic. However, artificial networks rely on their fine-tuned weights and hand-crafted architectures for their remarkable performance. This contrast begs the question: Can we build artificial architecture agnostic neural networks? To ground this study we utilize sparse, binary neural networks that parallel the brain\u2019s circuits. Within this sparse, binary paradigm we sample many binary architectures to create families of architecture agnostic neural networks not trained via backpropagation. These high-performing network families share the same sparsity, distribution of binary weights, and succeed in both static and dynamic tasks. In summation, we create an architecture manifold search procedure to discover families of architecture agnostic neural networks.", "keywords": "Architecture Agnostic;Sparse;Binary;Stochastic;Pruning;Biologically Inspired", "primary_area": "", "supplementary_material": "/attachment/3ce65432d64593d8716d6857f4113bc83060c792.zip", "author": "Sabera J Talukder;Guruprasad Raghavan;Yisong Yue", "authorids": "~Sabera_J_Talukder1;~Guruprasad_Raghavan1;~Yisong_Yue1", "gender": ";M;M", "homepage": "https://saberatalukder.com/;;http://www.yisongyue.com", "dblp": ";;28/1244", "google_scholar": "S00bhfIAAAAJ;gUFaHxMAAAAJ;tEk4qo8AAAAJ", "orcid": ";;0000-0001-9127-1989", "linkedin": "sabera-talukder-69600bb1/;;yisongyue/", "or_profile": "~Sabera_J_Talukder1;~Guruprasad_Raghavan1;~Yisong_Yue1", "aff": "California Institute of Technology;California Institute of Technology;California Institute of Technology", "aff_domain": "caltech.edu;caltech.edu;caltech.edu", "position": "PhD student;PhD student;Full Professor", "bibtex": "@misc{\ntalukder2021architecture,\ntitle={Architecture Agnostic Neural Networks},\nauthor={Sabera J Talukder and Guruprasad Raghavan and Yisong Yue},\nyear={2021},\nurl={https://openreview.net/forum?id=4Un_FnHiN8C}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=4Un_FnHiN8C", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;4;5;4", "wc_review": "308;375;211;117", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "744;732;450;483", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 252.75, 97.68412102281516 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 602.25, 136.316497534231 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:cvSmpLV0cC4J:scholar.google.com/&scioq=Architecture+Agnostic+Neural+Networks&hl=en&as_sdt=0,33", "gs_version_total": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "California Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.caltech.edu", "aff_unique_abbr": "Caltech", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Pasadena", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "4VixXVZJkoY", "title": "TRIP: Refining Image-to-Image Translation via Rival Preferences", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a new model to refine image-to-image translation via an adversarial ranking process. In particular, we simultaneously train two modules: a generator that translates an input image to the desired image with smooth subtle changes with respect to some specific attributes; and a ranker that ranks rival preferences consisting of the input image and the desired image. Rival preferences refer to the adversarial ranking process: (1) the ranker thinks no difference between the desired image and the input image in terms of the desired attributes; (2) the generator fools the ranker to believe that the desired image changes the attributes over the input image as desired. Real image preferences are introduced to guide the ranker to rank image pairs regarding the interested attributes only. With an effective ranker, the generator would \u201cwin\u201d the adversarial game by producing high-quality images that present desired changes over the attributes compared to the input image. The experiments demonstrate that our TRIP can generate high-fidelity images which exhibit smooth changes with the strength of the attributes.", "keywords": "Fine-grained image-to-image translation;GAN;relative attributes;ranker", "primary_area": "", "supplementary_material": "/attachment/3e06f9d7b4a7ebcc20e252bdf5d57849a7f4c2d6.zip", "author": "Yinghua Yao;Yuangang Pan;Ivor Tsang;Xin Yao", "authorids": "~Yinghua_Yao1;~Yuangang_Pan2;~Ivor_Tsang1;~Xin_Yao1", "gender": ";;M;M", "homepage": ";http://www.cs.bham.ac.uk/~xin;https://www.a-star.edu.sg/cfar/about-cfar/management/prof-ivor-tsang;https://yuangang-pan.github.io/", "dblp": "256/0363;;35/5873;215/4933", "google_scholar": ";;rJMOlVsAAAAJ;", "orcid": "0000-0003-3204-0739;;;", "linkedin": ";;;", "or_profile": "~Yinghua_Yao1;~Xin_Yao1;~Ivor_W_Tsang1;~Yuangang_Pan1", "aff": "Southern University of Science and Technology;Southern University of Science and Technology;University of Technology Sydney;University of Technology Sydney", "aff_domain": "sustech.edu.cn;sustech.edu.cn;uts.edu.au;uts.edu.au", "position": "PhD student;Full Professor;Full Professor;Postdoc", "bibtex": "@misc{\nyao2021trip,\ntitle={{\\{}TRIP{\\}}: Refining Image-to-Image Translation via Rival Preferences},\nauthor={Yinghua Yao and Yuangang Pan and Ivor Tsang and Xin Yao},\nyear={2021},\nurl={https://openreview.net/forum?id=4VixXVZJkoY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=4VixXVZJkoY", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;4;4", "wc_review": "390;1140;374;168", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "396;1267;649;547", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 518.0, 369.6295442737228 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 714.75, 331.3022600285123 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:GgN_sut6SygJ:scholar.google.com/&scioq=TRIP:+Refining+Image-to-Image+Translation+via+Rival+Preferences&hl=en&as_sdt=0,5", "gs_version_total": 3, "aff_unique_index": "0;0;1;1", "aff_unique_norm": "Southern University of Science and Technology;University of Technology Sydney", "aff_unique_dep": ";", "aff_unique_url": "https://www.sustech.edu.cn;https://www.uts.edu.au", "aff_unique_abbr": "SUSTech;UTS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;1", "aff_country_unique": "China;Australia" }, { "id": "4YzI0KpRQtZ", "title": "Streaming Probabilistic Deep Tensor Factorization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite the success of existing tensor factorization methods, most of them conduct a multilinear decomposition, and rarely exploit powerful modeling frameworks, like deep neural networks, to capture a variety of complicated interactions in data. More important, for highly expressive, deep factorization, we lack an effective approach to handle streaming data, which are ubiquitous in real-world applications. To address these issues, we propose SPIDER, a Streaming ProbabilistIc Deep tEnsoR factorization method. We first use Bayesian neural networks (NNs) to construct a deep tensor factorization model. We assign a spike-and-slab prior over the NN weights to encourage sparsity and prevent overfitting. We then use Taylor expansions and moment matching to approximate the posterior of the NN output and calculate the running model evidence, based on which we develop an efficient streaming posterior inference algorithm in the assumed-density-filtering and expectation propagation framework. Our algorithm provides responsive incremental updates for the posterior of the latent factors and NN weights upon receiving new tensor entries, and meanwhile select and inhibit redundant/useless weights. We show the advantages of our approach in four real-world applications.", "keywords": "Probabilistic Methods;online learing;tensor factorization", "primary_area": "", "supplementary_material": "/attachment/3b7064c9a1a2b5107c802eaaa6fb7d8b904359fe.zip", "author": "shikai fang;Zheng Wang;Zhimeng pan;Ji Liu;Shandian Zhe", "authorids": "~shikai_fang1;~Zheng_Wang2;z.pan@utah.edu;~Ji_Liu1;~Shandian_Zhe1", "gender": "M;M;;M;", "homepage": "https://www.cs.utah.edu/~shikai/;;;http://jiliu-ml.org;", "dblp": "270/2142;;;51/4433-2.html;", "google_scholar": "h280gfwAAAAJ;;;RRzVwKkAAAAJ;", "orcid": "0009-0006-4527-5878;;;;", "linkedin": "shikai-fang-34b143119/;;;;", "or_profile": "~shikai_fang1;~Zheng_Wang2;z.pan@utah.edu;~Ji_Liu1;~Shandian_Zhe1", "aff": "University of Utah;University of Utah;;;", "aff_domain": "utah.edu;utah.edu;;;", "position": "PhD student;PhD student;;;", "bibtex": "@misc{\nfang2021streaming,\ntitle={Streaming Probabilistic Deep Tensor Factorization},\nauthor={shikai fang and Zheng Wang and Zhimeng pan and Ji Liu and Shandian Zhe},\nyear={2021},\nurl={https://openreview.net/forum?id=4YzI0KpRQtZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=4YzI0KpRQtZ", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;4;3;2", "wc_review": "312;194;325;429", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "354;749;750;282", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 315.0, 83.28565302619654 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 533.75, 217.24683541998948 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7071067811865475, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=131462526385897227&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "University of Utah", "aff_unique_dep": "", "aff_unique_url": "https://www.utah.edu", "aff_unique_abbr": "Utah", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "4_57x7xhymn", "title": "Action Concept Grounding Network for Semantically-Consistent Video Generation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent works in self-supervised video prediction have mainly focused on passive forecasting and low-level action-conditional prediction, which sidesteps the problem of semantic learning. We introduce the task of semantic action-conditional video prediction, which can be regarded as an inverse problem of action recognition. The challenge of this new task primarily lies in how to effectively inform the model of semantic action information. To bridge vision and language, we utilize the idea of capsule and propose a novel video prediction model Action Concept Grounding Network (AGCN). Our method is evaluated on two newly designed synthetic datasets, CLEVR-Building-Blocks and Sapien-Kitchen, and experiments show that given different action labels, our ACGN can correctly condition on instructions and generate corresponding future frames without need of bounding boxes. We further demonstrate our trained model can make out-of-distribution predictions for concurrent actions, be quickly adapted to new object categories and exploit its learnt features for object detection. Additional visualizations can be found at https://iclr-acgn.github.io/ACGN/.", "keywords": "action-conditional video prediction;self-supervised learning;counterfactual generation", "primary_area": "", "supplementary_material": "", "author": "Wei Yu;Wenxin Chen;Animesh Garg", "authorids": "~Wei_Yu10;~Wenxin_Chen1;~Animesh_Garg1", "gender": "M;M;M", "homepage": "https://gnosisyuw.github.io/;;http://animesh.garg.tech", "dblp": ";;123/5728", "google_scholar": "smZffVEAAAAJ;;zp8V7ZMAAAAJ", "orcid": ";;0000-0003-0482-4296", "linkedin": ";wenxinchen11/;animeshgarg/", "or_profile": "~Wei_Yu10;~Wenxin_Chen1;~Animesh_Garg1", "aff": "Department of Computer Science, University of Toronto;;University of Toronto", "aff_domain": "cs.toronto.edu;;toronto.edu", "position": "PhD student;;Assistant Professor", "bibtex": "@misc{\nyu2021action,\ntitle={Action Concept Grounding Network for Semantically-Consistent Video Generation},\nauthor={Wei Yu and Wenxin Chen and Animesh Garg},\nyear={2021},\nurl={https://openreview.net/forum?id=4_57x7xhymn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=4_57x7xhymn", "pdf_size": 0, "rating": "5;5;5", "confidence": "5;4;5", "wc_review": "385;235;365", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "258;324;1398", "reply_reviewers": "0;0;0", "reply_authors": "1;1;2", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 328.3333333333333, 66.4997911442 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 660.0, 522.5399506257871 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:PlnzKK-lhX4J:scholar.google.com/&scioq=Action+Concept+Grounding+Network+for+Semantically-Consistent+Video+Generation&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Toronto", "aff_unique_dep": "Department of Computer Science", "aff_unique_url": "https://www.utoronto.ca", "aff_unique_abbr": "U of T", "aff_campus_unique_index": "0", "aff_campus_unique": "Toronto;", "aff_country_unique_index": "0;0", "aff_country_unique": "Canada" }, { "id": "4artD3N3xB0", "title": "Bayesian Learning to Optimize: Quantifying the Optimizer Uncertainty", "track": "main", "status": "Reject", "tldr": "", "abstract": "Optimizing an objective function with uncertainty awareness is well-known to improve the accuracy and confidence of optimization solutions. Meanwhile, another relevant but very different question remains yet open: how to model and quantify the uncertainty of an optimization algorithm itself? To close such a gap, the prerequisite is to consider the optimizers as sampled from a distribution, rather than a few pre-defined and fixed update rules. We first take the novel angle to consider the algorithmic space of optimizers, each being parameterized by a neural network. We then propose a Boltzmann-shaped posterior over this optimizer space, and approximate the posterior locally as Gaussian distributions through variational inference. Our novel model, Bayesian learning to optimize (BL2O) is the first study to recognize and quantify the uncertainty of the optimization algorithm. Our experiments on optimizing test functions, energy functions in protein-protein interactions and loss functions in image classification and data privacy attack demonstrate that, compared to state-of-the-art methods, BL2O improves optimization and uncertainty quantification (UQ) in aforementioned problems as well as calibration and out-of-domain detection in image classification.", "keywords": "Optimizer Uncertainty;Optimization;Uncertainty Quantification", "primary_area": "", "supplementary_material": "", "author": "Yue Cao;Tianlong Chen;Zhangyang Wang;Yang Shen", "authorids": "~Yue_Cao4;~Tianlong_Chen1;~Zhangyang_Wang1;~Yang_Shen4", "gender": "M;M;M;", "homepage": ";https://tianlong-chen.github.io;https://vita-group.github.io;https://shen-lab.github.io/", "dblp": ";;119/4026;95/5308-1.html", "google_scholar": "Q0f5JRAAAAAJ;LE3ctn0AAAAJ;pxFyKAIAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";0000-0001-7774-8197;;0000-0002-1703-7796", "linkedin": ";tianlong-chen-783862167/;;", "or_profile": "~Yue_Cao4;~Tianlong_Chen1;~Zhangyang_Wang1;~Yang_Shen4", "aff": "Texas A&M;University of Texas, Austin;University of Texas, Austin;Texas A&M University - College Station", "aff_domain": "tamu.edu;utexas.edu;utexas.edu;tamu.edu", "position": "PhD student;PhD student;Assistant Professor;Assistant Professor", "bibtex": "@misc{\ncao2021bayesian,\ntitle={Bayesian Learning to Optimize: Quantifying the Optimizer Uncertainty},\nauthor={Yue Cao and Tianlong Chen and Zhangyang Wang and Yang Shen},\nyear={2021},\nurl={https://openreview.net/forum?id=4artD3N3xB0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer5;AnonReviewer4", "site": "https://openreview.net/forum?id=4artD3N3xB0", "pdf_size": 0, "rating": "4;5;6", "confidence": "4;2;3", "wc_review": "1247;400;372", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "734;626;894", "reply_reviewers": "0;0;0", "reply_authors": "1;1;2", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 673.0, 406.04022789209773 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 751.3333333333334, 110.09490855116275 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:HChRyGu4-F0J:scholar.google.com/&scioq=Bayesian+Learning+to+Optimize:+Quantifying+the+Optimizer+Uncertainty&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "Texas A&M University;University of Texas at Austin", "aff_unique_dep": ";", "aff_unique_url": "https://www.tamu.edu;https://www.utexas.edu", "aff_unique_abbr": "TAMU;UT Austin", "aff_campus_unique_index": "1;1;2", "aff_campus_unique": ";Austin;College Station", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Multi-Time Attention Networks for Irregularly Sampled Time Series", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2703", "id": "4c0J6lwQ4_", "poster": "", "openreview": "https://openreview.net/forum?id=4c0J6lwQ4_", "slides": "https://iclr.cc/virtual/2021/poster/2703", "video": "https://iclr.cc/virtual/2021/poster/2703", "author_site": "Satya Narayan Shukla, Benjamin M Marlin", "tldr": "", "abstract": "Irregular sampling occurs in many time series modeling applications where it presents a significant challenge to standard deep learning models. This work is motivated by the analysis of physiological time series data in electronic health records, which are sparse, irregularly sampled, and multivariate. In this paper, we propose a new deep learning framework for this setting that we call Multi-Time Attention Networks. Multi-Time Attention Networks learn an embedding of continuous time values and use an attention mechanism to produce a fixed-length representation of a time series containing a variable number of observations. We investigate the performance of this framework on interpolation and classification tasks using multiple datasets. Our results show that the proposed approach performs as well or better than a range of baseline and recently proposed models while offering significantly faster training times than current state-of-the-art methods.", "keywords": "irregular sampling;multivariate time series;attention;missing data", "primary_area": "", "supplementary_material": "", "author": "Satya Narayan Shukla;Benjamin Marlin", "authorids": "~Satya_Narayan_Shukla1;~Benjamin_Marlin1", "gender": "M;M", "homepage": "https://satyanshukla.github.io/;https://groups.cs.umass.edu/marlin/", "dblp": "161/3356;03/7058.html", "google_scholar": "l1tsmesAAAAJ;ey960FIAAAAJ", "orcid": ";0000-0002-2626-3410", "linkedin": ";", "or_profile": "~Satya_Narayan_Shukla1;~Benjamin_Marlin1", "aff": "College of Information and Computer Sciences, University of Massachusetts, Amherst;University of Massachusetts at Amherst", "aff_domain": "cs.umass.edu;umass.edu", "position": "PhD student;Associate Professor", "bibtex": "@inproceedings{\nshukla2021multitime,\ntitle={Multi-Time Attention Networks for Irregularly Sampled Time Series},\nauthor={Satya Narayan Shukla and Benjamin Marlin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=4c0J6lwQ4_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;4;3", "wc_review": "319;311;262;177", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "186;700;85;320", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 267.25, 56.49059656261385 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 322.75, 233.2116795960271 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 249, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6069781928255471893&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=4c0J6lwQ4_", "email": "cs.umass.edu;umass.edu", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Massachusetts Amherst", "aff_unique_dep": "College of Information and Computer Sciences", "aff_unique_url": "https://www.umass.edu", "aff_unique_abbr": "UMass Amherst", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Amherst", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "4c3WeBTErrE", "title": "Jumpy Recurrent Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recurrent neural networks (RNNs) can learn complex, long-range structure in time series data simply by predicting one point at a time. Because of this ability, they have enjoyed widespread adoption in commercial and academic contexts. Yet RNNs have a fundamental limitation: they represent time as a series of discrete, uniform time steps. As a result, they force a tradeoff between temporal resolution and the computational expense of predicting far into the future. To resolve this tension, we propose a Jumpy RNN model which does not predict state transitions over uniform intervals of time. Instead, it predicts a sequence of linear dynamics functions in latent space and intervals of time over which their predictions can be expected to be accurate. This structure enables our model to jump over long time intervals while retaining the ability to produce fine-grained or continuous-time predictions when necessary. In simple physics simulations, our model can skip over long spans of predictable motion and focus on key events such as collisions between two balls. On a set of physics tasks including coordinate and pixel observations of a small-scale billiards environment, our model matches the performance of a baseline RNN while using a fifth of the compute. On a real-world weather forecasting dataset, it makes more accurate predictions while using fewer sampling steps. When used for model-based planning, our method matches a baseline RNN while using half the compute.", "keywords": "RNNs;temporal abstraction;planning;intuitive physics", "primary_area": "", "supplementary_material": "/attachment/3b3e243ee3dca0599057bc2975956abdc43d7782.zip", "author": "Samuel James Greydanus;Stefan Lee;Alan Fern", "authorids": "~Samuel_James_Greydanus1;~Stefan_Lee1;~Alan_Fern1", "gender": "M;;M", "homepage": "https://greydanus.github.io/about.html;;http://www.eecs.oregonstate.edu/~afern", "dblp": "205/2640;;49/6764", "google_scholar": "SECnlpMAAAAJ;;https://scholar.google.com.tw/citations?user=GaKxFrcAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Samuel_James_Greydanus1;~Stefan_Lee1;~Alan_Fern1", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "@misc{\ngreydanus2021jumpy,\ntitle={Jumpy Recurrent Neural Networks},\nauthor={Samuel James Greydanus and Stefan Lee and Alan Fern},\nyear={2021},\nurl={https://openreview.net/forum?id=4c3WeBTErrE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=4c3WeBTErrE", "pdf_size": 0, "rating": "5;5;5;7", "confidence": "4;4;5;4", "wc_review": "561;529;705;438", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "809;1300;1064;344", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;2;1", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 558.25, 95.99316381909703 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 879.25, 354.4681755813912 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:xwS7CjFy3MMJ:scholar.google.com/&scioq=Jumpy+Recurrent+Neural+Networks&hl=en&as_sdt=0,33", "gs_version_total": 0 }, { "id": "4cC0HFuVd2d", "title": "Decoy-enhanced Saliency Maps", "track": "main", "status": "Reject", "tldr": "", "abstract": "Saliency methods can make deep neural network predictions more interpretable by identifying a set of critical features in an input sample, such as pixels that contribute most strongly to a prediction made by an image classifier. Unfortunately, recent evidence suggests that many saliency methods poorly perform, especially in situations where gradients are saturated, inputs contain adversarial perturbations, or predictions rely upon inter-feature dependence. To address these issues, we propose a framework that improves the robustness of saliency methods by following a two-step procedure. First, we introduce a perturbation mechanism that subtly varies the input sample without changing its intermediate representations. Using this approach, we can gather a corpus of perturbed data samples while ensuring that the perturbed and original input samples follow the same distribution. Second, we compute saliency maps for the perturbed samples and propose a new method to aggregate saliency maps. With this design, we offset the gradient saturation influence upon interpretation. From a theoretical perspective, we show that the aggregated saliency map not only captures inter-feature dependence but, more importantly, is robust against previously described adversarial perturbation methods. Following our theoretical analysis, we present experimental results suggesting that, both qualitatively and quantitatively, our saliency method outperforms existing methods, in a variety of applications.", "keywords": "Deep neural network;Explainable AI;Saliency methods;Decoys", "primary_area": "", "supplementary_material": "", "author": "Yang Young Lu;Wenbo Guo;Xinyu Xing;William Noble", "authorids": "~Yang_Young_Lu1;~Wenbo_Guo1;~Xinyu_Xing1;~William_Noble1", "gender": "M;M;M;M", "homepage": "https://batmen-lab.github.io/;https://henrygwb.github.io/;http://xinyuxing.org/;http://noble.gs.washington.edu", "dblp": "197/8508;144/1238-2.html;;08/978", "google_scholar": "C4CLJQgAAAAJ;KyPheRMAAAAJ;https://scholar.google.com.tw/citations?user=WBN9c6kAAAAJ;plt2_DsAAAAJ", "orcid": ";;;0000-0001-7283-4715", "linkedin": ";;;", "or_profile": "~Yang_Young_Lu1;~Wenbo_Guo1;~Xinyu_Xing1;~William_Noble1", "aff": "University of Washington, Seattle;Pennsylvania State University;Pennsylvania State University;University of Washington, Seattle", "aff_domain": "uw.edu;psu.edu;psu.edu;uw.edu", "position": "Postdoc;PhD student;Assistant Professor;Professor", "bibtex": "@misc{\nlu2021decoyenhanced,\ntitle={Decoy-enhanced Saliency Maps },\nauthor={Yang Young Lu and Wenbo Guo and Xinyu Xing and William Noble},\nyear={2021},\nurl={https://openreview.net/forum?id=4cC0HFuVd2d}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=4cC0HFuVd2d", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;2;3", "wc_review": "2290;343;535", "wc_reply_reviewers": "462;0;0", "wc_reply_authors": "2035;451;420", "reply_reviewers": "2;0;0", "reply_authors": "4;1;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 1056.0, 876.0833293699864 ], "wc_reply_reviewers_avg": [ 154.0, 217.78888860545663 ], "wc_reply_authors_avg": [ 968.6666666666666, 754.1177332191278 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 2.0, 1.4142135623730951 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:FA22RsRIur4J:scholar.google.com/&scioq=Decoy-enhanced+Saliency+Maps&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "University of Washington;Pennsylvania State University", "aff_unique_dep": ";", "aff_unique_url": "https://www.washington.edu;https://www.psu.edu", "aff_unique_abbr": "UW;PSU", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Seattle;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "4dFyyAdWbis", "title": "Scaling Unsupervised Domain Adaptation through Optimal Collaborator Selection and Lazy Discriminator Synchronization", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Breakthroughs in unsupervised domain adaptation (uDA) have opened up the possibility of adapting models from a label-rich source domain to unlabeled target domains. Prior uDA works have primarily focused on improving adaptation accuracy between the given source and target domains, and considerably less attention has been paid to the challenges that arise when uDA is deployed in practical settings. This paper puts forth a novel and complementary perspective, and investigates the algorithmic challenges that arise when uDA is deployed in a distributed ML system with multiple target domains. We propose two algorithms: i) a Collaborator Selection algorithm which selects an optimal collaborator for each target domain, and makes uDA systems more accurate and flexible; ii) a distributed training strategy that allows adversarial uDA algorithms to train in a privacy-preserving manner. We provide theoretical justifications and empirical results to show that our solution significantly boosts the performance of uDA in practical settings.", "keywords": "unsupervised domain adaptation;systems for machine learning", "primary_area": "", "supplementary_material": "", "author": "Akhil Mathur;Shaoduo Gan;Anton Isopoussu;Fahim Kawsar;Nadia Berthouze;Nicholas Donald Lane", "authorids": "~Akhil_Mathur1;~Shaoduo_Gan1;~Anton_Isopoussu1;fahim.kawsar@nokia-bell-labs.com;~Nadia_Berthouze1;~Nicholas_Donald_Lane1", "gender": "M;M;;;F;", "homepage": "https://akhilmathurs.github.io/;;https://aisopous.github.io/;;https://uclic.ucl.ac.uk/people/nadia-berthouze/;", "dblp": ";180/6128;;;b/NadiaBianchiBerthouze;", "google_scholar": ";Gy9ZnBcAAAAJ;https://scholar.google.co.uk/citations?user=B7Nk1GYAAAAJ;;https://scholar.google.com.tw/citations?user=gk2VyMwAAAAJ;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Akhil_Mathur1;~Shaoduo_Gan1;~Anton_Isopoussu1;fahim.kawsar@nokia-bell-labs.com;~Nadia_Berthouze1;~Nicholas_Donald_Lane1", "aff": "Bell Labs;Swiss Federal Institute of Technology;;;University College London, University of London;", "aff_domain": "nokia-bell-labs.com;ethz.ch;;;ucl.ac.uk;", "position": "Principal Researcher;PhD student;;;Full Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=4dFyyAdWbis", "pdf_size": 0, "rating": "2;6;6", "confidence": "4;3;5", "wc_review": "359;218;441", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.666666666666667, 1.8856180831641267 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 339.3333333333333, 92.09536120541337 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6483548357316619910&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Bell Labs;Swiss Federal Institute of Technology;University College London", "aff_unique_dep": ";;", "aff_unique_url": "https://www.bell-labs.com;https://www.ethz.ch;https://www.ucl.ac.uk", "aff_unique_abbr": "Bell Labs;ETH Zurich;UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2", "aff_country_unique": "United States;Switzerland;United Kingdom" }, { "title": "Evaluations and Methods for Explanation through Robustness Analysis", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3083", "id": "4dXmpCDGNp7", "poster": "", "openreview": "https://openreview.net/forum?id=4dXmpCDGNp7", "slides": "https://iclr.cc/virtual/2021/poster/3083", "video": "https://iclr.cc/virtual/2021/poster/3083", "author_site": "Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep K Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh", "tldr": "", "abstract": "Feature based explanations, that provide importance of each feature towards the model prediction, is arguably one of the most intuitive ways to explain a model. In this paper, we establish a novel set of evaluation criteria for such feature based explanations by robustness analysis. In contrast to existing evaluations which require us to specify some way to \"remove\" features that could inevitably introduces biases and artifacts, we make use of the subtler notion of smaller adversarial perturbations. By optimizing towards our proposed evaluation criteria, we obtain new explanations that are loosely necessary and sufficient for a prediction. We further extend the explanation to extract the set of features that would move the current prediction to a target class by adopting targeted adversarial attack for the robustness analysis. Through experiments across multiple domains and a user study, we validate the usefulness of our evaluation criteria and our derived explanations.", "keywords": "Interpretability;Explanations;Adversarial Robustness", "primary_area": "", "supplementary_material": "/attachment/5a27bdbc422e25feee0636d2f66090b082fe1ecf.zip", "author": "Cheng-Yu Hsieh;Chih-Kuan Yeh;Xuanqing Liu;Pradeep Kumar Ravikumar;Seungyeon Kim;Sanjiv Kumar;Cho-Jui Hsieh", "authorids": "~Cheng-Yu_Hsieh1;~Chih-Kuan_Yeh1;~Xuanqing_Liu1;~Pradeep_Kumar_Ravikumar1;~Seungyeon_Kim1;~Sanjiv_Kumar1;~Cho-Jui_Hsieh1", "gender": "M;M;M;M;;;M", "homepage": "https://chengyuhsieh.github.io/;https://chihkuanyeh.github.io/;;http://www.cs.cmu.edu/~pradeepr/;https://www.seungyeon.ai;http://www.sanjivk.com/;http://web.cs.ucla.edu/~chohsieh/index.html", "dblp": "40/4421;;205/2594;94/3594;74/7997-1.html;;14/2770", "google_scholar": "WXX6ZwwAAAAJ;;;https://scholar.google.com.tw/citations?user=Q4DTPw4AAAAJ;zbcN_QIAAAAJ;https://scholar.google.com/citations?hl=en;Wy89g4IAAAAJ", "orcid": ";;;;;;", "linkedin": ";;;;;;", "or_profile": "~Cheng-Yu_Hsieh1;~Chih-Kuan_Yeh1;~Xuanqing_Liu1;~Pradeep_Kumar_Ravikumar1;~Seungyeon_Kim1;~Sanjiv_Kumar1;~Cho-Jui_Hsieh1", "aff": "University of Washington;School of Computer Science, Carnegie Mellon University;University of California, Los Angeles;School of Computer Science, Carnegie Mellon University;Google;Google;University of California, Los Angeles", "aff_domain": "washington.edu;cs.cmu.edu;ucla.edu;cs.cmu.edu;google.com;google.com;ucla.edu", "position": "PhD student;PhD student;PhD student;Associate Professor;Researcher;Research Scientist;Assistant Professor", "bibtex": "@inproceedings{\nhsieh2021evaluations,\ntitle={Evaluations and Methods for Explanation through Robustness Analysis},\nauthor={Cheng-Yu Hsieh and Chih-Kuan Yeh and Xuanqing Liu and Pradeep Kumar Ravikumar and Seungyeon Kim and Sanjiv Kumar and Cho-Jui Hsieh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=4dXmpCDGNp7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "3;4;3;4", "wc_review": "450;366;511;917", "wc_reply_reviewers": "36;0;0;81", "wc_reply_authors": "734;904;969;2012", "reply_reviewers": "1;0;0;1", "reply_authors": "1;2;2;4", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 561.0, 211.88558233159708 ], "wc_reply_reviewers_avg": [ 29.25, 33.29695932063467 ], "wc_reply_authors_avg": [ 1154.75, 502.3163221516896 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.25, 1.0897247358851685 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 69, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15196298478913450721&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 12, "pdf": "https://openreview.net/pdf?id=4dXmpCDGNp7", "email": "washington.edu;cs.cmu.edu;ucla.edu;cs.cmu.edu;google.com;google.com;ucla.edu", "author_num": 7, "aff_unique_index": "0;1;2;1;3;3;2", "aff_unique_norm": "University of Washington;Carnegie Mellon University;University of California, Los Angeles;Google", "aff_unique_dep": ";School of Computer Science;;Google", "aff_unique_url": "https://www.washington.edu;https://www.cmu.edu;https://www.ucla.edu;https://www.google.com", "aff_unique_abbr": "UW;CMU;UCLA;Google", "aff_campus_unique_index": "1;2;1;3;3;2", "aff_campus_unique": ";Pittsburgh;Los Angeles;Mountain View", "aff_country_unique_index": "0;0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "4emQEegFhSy", "title": "Adaptive Multi-model Fusion Learning for Sparse-Reward Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we consider intrinsic reward generation for sparse-reward reinforcement learning based on model prediction errors. In typical model-prediction-error-based intrinsic reward generation, an agent has a learning model for the underlying environment. Then intrinsic reward is designed as the error between the model prediction and the actual outcome of the environment, based on the fact that for less-visited or non-visited states, the learned model yields larger prediction errors, promoting exploration helpful for reinforcement learning. This paper generalizes this model-prediction-error-based intrinsic reward generation method to multiple prediction models. We propose a new adaptive fusion method relevant to the multiple-model case, which learns optimal prediction-error fusion across the learning phase to enhance the overall learning performance. Numerical results show that for representative locomotion tasks, the proposed intrinsic reward generation method outperforms most of the previous methods, and the gain is significant in some tasks.", "keywords": "sparse-reward RL;intrinsic reward generation;adaptive fusion;information geometry;scale-free property", "primary_area": "", "supplementary_material": "", "author": "Giseung Park;Whiyoung Jung;Sungho Choi;Youngchul Sung", "authorids": "~Giseung_Park1;~Whiyoung_Jung1;~Sungho_Choi1;~Youngchul_Sung1", "gender": "M;M;M;M", "homepage": "https://sites.google.com/view/giseung-park;;https://sites.google.com/view/sisrelkaist/members/shchoi;https://sites.google.com/view/youngchulsung", "dblp": "233/3816;256/1642;60/1680;17/6798", "google_scholar": ";72La2OEAAAAJ;https://scholar.google.com/citations?view_op=list_works;-9D2k3UAAAAJ", "orcid": "0000-0002-9737-4142;;;0000-0003-4536-6690", "linkedin": ";;;", "or_profile": "~Giseung_Park1;~Whiyoung_Jung1;~Sungho_Choi1;~Youngchul_Sung1", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "position": "PhD student;PhD student;PhD student;Full Professor", "bibtex": "@misc{\npark2021adaptive,\ntitle={Adaptive Multi-model Fusion Learning for Sparse-Reward Reinforcement Learning},\nauthor={Giseung Park and Whiyoung Jung and Sungho Choi and Youngchul Sung},\nyear={2021},\nurl={https://openreview.net/forum?id=4emQEegFhSy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=4emQEegFhSy", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "3;4;3;4", "wc_review": "282;405;251;327", "wc_reply_reviewers": "0;0;0;15", "wc_reply_authors": "531;743;279;723", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 316.25, 57.92829619451965 ], "wc_reply_reviewers_avg": [ 3.75, 6.49519052838329 ], "wc_reply_authors_avg": [ 569.0, 186.77258899528056 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6036050580112472775&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "4f04RAhMUo6", "title": "PODS: Policy Optimization via Differentiable Simulation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Current reinforcement learning (RL) methods use simulation models as simple black-box oracles. In this paper, with the goal of improving the performance exhibited by RL algorithms, we explore a systematic way of leveraging the additional information provided by an emerging class of differentiable simulators. Building on concepts established by Deterministic Policy Gradients (DPG) methods, the neural network policies learned with our approach represent deterministic actions. In a departure from standard methodologies, however, learning these policy does not hinge on approximations of the value function that must be learned concurrently in an actor-critic fashion. Instead, we exploit differentiable simulators to directly compute the analytic gradient of a policy's value function with respect to the actions it outputs. This, in turn, allows us to efficiently perform locally optimal policy improvement iterations. Compared against other state-of-the-art RL methods, we show that with minimal hyper-parameter tuning our approach consistently leads to better asymptotic behavior across a set of payload manipulation tasks that demand high precision.", "keywords": "Reinforcement Learning;Decision and Control;Planning;Robotics.", "primary_area": "", "supplementary_material": "/attachment/36f5c4bee93aa08390e3794bebaa4c2cfe745f21.zip", "author": "Miguel Angel Zamora Mora;Momchil Peychev;Sehoon Ha;Martin Vechev;Stelian Coros", "authorids": "~Miguel_Angel_Zamora_Mora1;mpeychev@ethz.ch;sehoon.ha@gmail.com;~Martin_Vechev1;~Stelian_Coros1", "gender": ";;;M;M", "homepage": "https://inf.ethz.ch/people/people-atoz/person-detail.MjYwMzE0.TGlzdC8zMDQsLTIxNDE4MTU0NjA=.html;;;https://www.sri.inf.ethz.ch/people/martin;http://crl.ethz.ch/index.html", "dblp": "178/7796;;;93/2189.html;", "google_scholar": "SieIuzwAAAAJ;;;https://scholar.google.ch/citations?user=aZ1Rh50AAAAJ;sX31JjwAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Miguel_Angel_Zamora_Mora1;mpeychev@ethz.ch;sehoon.ha@gmail.com;~Martin_Vechev1;~Stelian_Coros1", "aff": "Swiss Federal Institute of Technology;;;;ETHZ - ETH Zurich", "aff_domain": "ethz.ch;;;;ethz.ch", "position": "PhD student;;;;Associate Professor", "bibtex": "@misc{\nmora2021pods,\ntitle={{\\{}PODS{\\}}: Policy Optimization via Differentiable Simulation},\nauthor={Miguel Angel Zamora Mora and Momchil Peychev and Sehoon Ha and Martin Vechev and Stelian Coros},\nyear={2021},\nurl={https://openreview.net/forum?id=4f04RAhMUo6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=4f04RAhMUo6", "pdf_size": 0, "rating": "4;6;6", "confidence": "5;4;4", "wc_review": "252;539;493", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "665;405;320", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 428.0, 125.85971025974384 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 463.3333333333333, 146.76133308500877 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.9999999999999997, "gs_citation": 59, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16993263954160231905&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1", "aff_unique_norm": "Swiss Federal Institute of Technology;ETH Zurich", "aff_unique_dep": ";", "aff_unique_url": "https://www.ethz.ch;https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich;ETHZ", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Switzerland" }, { "id": "4hA23Eld-HU", "title": "Learning to Control on the Fly", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "This paper proposes an algorithm which learns to control on the fly. The proposed algorithm has no access to the transition law of the environment, which is actually linear with bounded random noise, and learns to make decisions directly online without training phases or sub-optimal policies as the initial input. Neither estimating the system parameters nor the value functions online, the proposed algorithm adapts the ellipsoid method into the online decision making setting. By adding linear constraints when the feasibility of the decision variable is violated, the volume of the decision variable domain can be collapsed and we upper bound the number of online linear constraints needed for the convergence of the state to be around the desired state under the bounded random state noise. The algorithm is also proved to be of constant bounded online regret given certain range of the bound of the random noise.", "keywords": "online decision making;convergence;regret bound;bounded random noise", "primary_area": "", "supplementary_material": "", "author": "Zhanzhan Zhao", "authorids": "~Zhanzhan_Zhao1", "gender": "F", "homepage": "http://www-personal.umich.edu/~zhanzhao/index.html", "dblp": "", "google_scholar": "Bv6a-IIAAAAJ", "orcid": "", "linkedin": "zhanzhan-zhao-301401117?challengeId=AQFU4uYIuceNiwAAAXTLFCc4rS79ZvF1heQ3PhT8pAIlt9uVU1JyDYHv6cQ8wqDt0nw3CSPB_417LNDrdBmmeej5mqYLzc1b3Q&submissionId=8527b636-bc5f-3816-3dfe-d72640258672", "or_profile": "~Zhanzhan_Zhao1", "aff": "Georgia Institute of Technology", "aff_domain": "gatech.edu", "position": "PhD student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=4hA23Eld-HU", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "3;5;5;4", "wc_review": "278;179;464;645", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 391.5, 178.57561423665885 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0", "aff_unique_norm": "Georgia Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.gatech.edu", "aff_unique_abbr": "Georgia Tech", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "4jJI8Sqwz9V", "title": "Generalization and Stability of GANs: A theory and promise from data augmentation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The instability when training generative adversarial networks (GANs) is a notoriously difficult issue, and the generalization of GANs remains open. In this paper, we will analyze various sources of instability which not only come from the discriminator but also the generator. We then point out that the requirement of Lipschitz continuity on both the discriminator and generator leads to generalization and stability for GANs. As a consequence, this work naturally provides a generalization bound for a large class of existing models and explains the success of recent large-scale generators. Finally, we show why data augmentation can ensure Lipschitz continuity on both the discriminator and generator. This work therefore provides a theoretical basis for a simple way to ensure generalization in GANs, explaining the highly successful use of data augmentation for GANs in practice.", "keywords": "generative adversarial networks;generalization;stability;data augmentation", "primary_area": "", "supplementary_material": "/attachment/c614508b3cf3ce6d4427ee328fb5ae4929cd226b.zip", "author": "Khoat Than;Nghia Vu", "authorids": "~Khoat_Than1;vutrungnghiahust99@gmail.com", "gender": "M;", "homepage": "https://users.soict.hust.edu.vn/khoattq/;", "dblp": "118/4726;", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Khoat_Than1;vutrungnghiahust99@gmail.com", "aff": "VinAI Research;", "aff_domain": "vinai.io;", "position": "Scientist;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=4jJI8Sqwz9V", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "4;2;4;4", "wc_review": "319;88;877;792", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 519.0, 327.2819885053255 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2697120748266474859&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "VinAI Research", "aff_unique_dep": "", "aff_unique_url": "https://www.vinai.io/", "aff_unique_abbr": "VinAI", "aff_country_unique_index": "0", "aff_country_unique": "Vietnam" }, { "id": "4jXnFYaDOuD", "title": "Importance-based Multimodal Autoencoder", "track": "main", "status": "Reject", "tldr": "", "abstract": "Integrating information from multiple modalities (e.g., verbal, acoustic and visual data) into meaningful representations has seen great progress in recent years. However, two challenges are not sufficiently addressed by current approaches: (1) computationally efficient training of multimodal autoencoder networks which are robust in the absence of modalities, and (2) unsupervised learning of important subspaces in each modality which are correlated with other modalities. In this paper we propose the IMA (Importance-based Multimodal Autoencoder) model, a scalable model that learns modality importances and robust multimodal representations through a novel cross-covariance based loss function. We conduct experiments on MNIST-TIDIGITS a multimodal dataset of spoken and image digits,and on IEMOCAP, a multimodal emotion corpus. The IMA model is able to distinguish digits from uncorrelated noise, and word-level importances learnt that correspond to the separation between function and emotional words. The multimodal representations learnt by IMA are also competitive with state-of-the-art baseline approaches on downstream tasks.", "keywords": "Neural networks;Speech analysis;Multimodal;Autoencoders;Representation Learning", "primary_area": "", "supplementary_material": "", "author": "Sayan Ghosh;Eugene Laksana;Louis-Philippe Morency;Stefan Scherer", "authorids": "~Sayan_Ghosh1;~Eugene_Laksana1;~Louis-Philippe_Morency1;~Stefan_Scherer1", "gender": ";;M;M", "homepage": ";;https://www.cs.cmu.edu/~morency/;", "dblp": "67/6126-4;172/0899;31/739;60/5336", "google_scholar": "WC_NlykAAAAJ;;https://scholar.google.com.tw/citations?user=APgaFK0AAAAJ;rbGxNYwAAAAJ", "orcid": ";;0000-0001-6376-7696;0000-0002-0280-5393", "linkedin": ";;morency?challengeId=AQELGK_OvMa0vwAAAY72L-VV4X9hW8juuY80VHVeeSGHZ1PJHeeEa5LTFoeTmDGU0t1OL07MXJTYC9EAi6qgPDd2z9ztnbdFYA&submissionId=09a0ff34-04ac-c717-bef7-8c9c8811b463&challengeSource=AgFhxWkU3q7v4wAAAY72L-1xRE0eG-BnZUNE9e3eAG95pgOCZ9u1nxEg-1dK2Dw&challegeType=AgHMzV0lqKgEFwAAAY72L-11X6DHMd3V_A3Iur8XZeyYF2-oBzoufs8&memberId=AgH4yz7pZ_riCgAAAY72L-146jmR2pdr3dmhy2icxBtEQzQ&recognizeDevice=AgFDCNyrhKiFSAAAAY72L-16m7z2EH2t0ueWmMKjyk1_ZJAkfFVe;", "or_profile": "~Sayan_Ghosh1;~Eugene_Laksana1;~Louis-Philippe_Morency1;~Stefan_Scherer1", "aff": "Meta Facebook;;Carnegie Mellon University;Embodied, Inc.", "aff_domain": "fb.com;;cmu.edu;embodied.com", "position": "Research Scientist;;Associate Professor;CTO", "bibtex": "@misc{\nghosh2021importancebased,\ntitle={Importance-based Multimodal Autoencoder},\nauthor={Sayan Ghosh and Eugene Laksana and Louis-Philippe Morency and Stefan Scherer},\nyear={2021},\nurl={https://openreview.net/forum?id=4jXnFYaDOuD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=4jXnFYaDOuD", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "5;4;3;4", "wc_review": "487;776;226;216", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "530;1161;513;393", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 426.25, 229.30370145289848 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 649.25, 300.140279702675 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:pBT1kufSnm0J:scholar.google.com/&scioq=Importance-based+Multimodal+Autoencoder&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Meta;Carnegie Mellon University;Embodied, Inc.", "aff_unique_dep": "Meta Platforms, Inc.;;", "aff_unique_url": "https://meta.com;https://www.cmu.edu;", "aff_unique_abbr": "Meta;CMU;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "4ja9sJJygb", "title": "Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Pruning and quantization are proven methods for improving the performance and storage efficiency of convolutional neural networks (CNNs). Pruning removes near-zero weights in tensors and masks weak connections between neurons in neighbouring layers. Quantization reduces the precision of weights by replacing them with numerically similar values that require less storage. In this paper we identify another form of redundancy in CNN weight tensors, in the form of repeated patterns of similar values. We observe that pruning and quantization both tend to drastically increase the number of repeated patterns in the weight tensors.\nWe investigate several compression schemes to take advantage of this structure in CNN weight data, including multiple forms of Huffman coding, and other approaches inspired by block sparse matrix formats. We evaluate our approach on several well-known CNNs and find that we can achieve compaction ratios of 1.4x to 3.1x in addition to the saving from pruning and quantization.", "keywords": "Sparse tensor;tensor compaction;beyond pruning and quantization", "primary_area": "", "supplementary_material": "", "author": "Yuan Wen;David Gregg", "authorids": "~Yuan_Wen1;david.gregg@cs.tcd.ie", "gender": ";", "homepage": ";", "dblp": ";", "google_scholar": "8CHUcJEAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Yuan_Wen1;david.gregg@cs.tcd.ie", "aff": "Trinity College, Dublin;", "aff_domain": "tcd.ie;", "position": "Postdoc;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer5;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=4ja9sJJygb", "pdf_size": 0, "rating": "3;3;4;4;5", "confidence": "4;4;4;3;5", "wc_review": "500;79;286;378;272", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 3.8, 0.7483314773547882 ], "confidence_avg": [ 4.0, 0.6324555320336759 ], "wc_review_avg": [ 303.0, 138.4629914453678 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.42257712736425823, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4781432107567461629&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Trinity College Dublin", "aff_unique_dep": "", "aff_unique_url": "https://www.tcd.ie", "aff_unique_abbr": "TCD", "aff_campus_unique_index": "0", "aff_campus_unique": "Dublin", "aff_country_unique_index": "0", "aff_country_unique": "Ireland" }, { "id": "4kWGWoFGA_H", "title": "Beyond the Pixels: Exploring the Effects of Bit-Level Network and File Corruptions on Video Model Robustness", "track": "main", "status": "Reject", "tldr": "", "abstract": "We investigate the robustness of video machine learning models to bit-level network and file corruptions, which can arise from network transmission failures or hardware errors, and explore defenses against such corruptions. We simulate network and file corruptions at multiple corruption levels, and find that bit-level corruptions can cause substantial performance drops on common action recognition and multi-object tracking tasks. We explore two types of defenses against bit-level corruptions: corruption-agnostic and corruption-aware defenses. We find that corruption-agnostic defenses such as adversarial training have limited effectiveness, performing up to 11.3 accuracy points worse than a no-defense baseline. In response, we propose Bit-corruption Augmented Training (BAT), a corruption-aware baseline that exploits knowledge of bit-level corruptions to enforce model invariance to such corruptions. BAT outperforms corruption-agnostic defenses, recovering up to 7.1 accuracy points over a no-defense baseline on highly-corrupted videos while maintaining competitive performance on clean/near-clean data.", "keywords": "robustness;machine learning;file corruption;network corruption;video", "primary_area": "", "supplementary_material": "/attachment/0436e04bacbc07e976d0ffc54cf0beb85c0653fe.zip", "author": "Trenton Chang;Daniel Yang Fu;Yixuan Li", "authorids": "~Trenton_Chang1;~Daniel_Yang_Fu1;~Yixuan_Li1", "gender": ";;F", "homepage": ";;http://pages.cs.wisc.edu/~sharonli/", "dblp": ";;144/6087-1", "google_scholar": ";;https://scholar.google.com/citations?hl=en", "orcid": ";;", "linkedin": ";;liyixuan", "or_profile": "~Trenton_Chang1;~Daniel_Yang_Fu1;~Yixuan_Li1", "aff": ";;Cornell University", "aff_domain": ";;cornell.edu", "position": ";;Graduate Student", "bibtex": "@misc{\nchang2021beyond,\ntitle={Beyond the Pixels: Exploring the Effects of Bit-Level Network and File Corruptions on Video Model Robustness},\nauthor={Trenton Chang and Daniel Yang Fu and Yixuan Li},\nyear={2021},\nurl={https://openreview.net/forum?id=4kWGWoFGA_H}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=4kWGWoFGA_H", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "4;3;2;3", "wc_review": "435;205;545;284", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "667;295;445;527", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 367.25, 131.7580642693266 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 483.5, 134.70244986636285 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3244428422615251, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:iaG-WBECj7UJ:scholar.google.com/&scioq=Beyond+the+Pixels:+Exploring+the+Effects+of+Bit-Level+Network+and+File+Corruptions+on+Video+Model+Robustness&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Cornell University", "aff_unique_dep": "", "aff_unique_url": "https://www.cornell.edu", "aff_unique_abbr": "Cornell", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "4mkxyuPcFt", "title": "Disentangling Adversarial Robustness in Directions of the Data Manifold", "track": "main", "status": "Reject", "tldr": "", "abstract": "Using generative models (GAN or VAE) to craft adversarial examples, i.e. generative adversarial examples, has received increasing attention in recent years. Previous studies showed that the generative adversarial examples work differently compared to that of the regular adversarial examples in many aspects, such as attack rates, perceptibility, and generalization. But the reasons causing the differences between regular and generative adversarial examples are unclear. In this work, we study the theoretical properties of the attacking mechanisms of the two kinds of adversarial examples in the Gaussian mixture data model case. We prove that adversarial robustness can be disentangled in directions of the data manifold. Specifically, we find that: 1. Regular adversarial examples attack in directions of small variance of the data manifold, while generative adversarial examples attack in directions of large variance. 2. Standard adversarial training increases model robustness by extending the data manifold boundary in directions of small variance, while on the contrary, adversarial training with generative adversarial examples increases model robustness by extending the data manifold boundary directions of large variance. In experiments, we demonstrate that these phenomena also exist on real datasets. Finally, we study the robustness trade-off between generative and regular adversarial examples. We show that the conflict between regular and generative adversarial examples is much smaller than the conflict between regular adversarial examples of different norms.", "keywords": "Adversarial Robustness;Adversarial Training;Generative Models", "primary_area": "", "supplementary_material": "", "author": "Jiancong Xiao;Liusha Yang;Zhi-Quan Luo", "authorids": "~Jiancong_Xiao1;~Liusha_Yang1;~Zhi-Quan_Luo1", "gender": "M;;M", "homepage": "https://jiancongxiao.github.io;;", "dblp": "330/4306;;", "google_scholar": "_vGY3joAAAAJ;;dW3gcXoAAAAJ", "orcid": ";;", "linkedin": ";yang-liusha-7ba26461/;", "or_profile": "~Jiancong_Xiao1;~Liusha_Yang1;~Zhi-Quan_Luo1", "aff": "The Chinese University of Hong Kong, Shenzhen;;The Chinese University of Hong Kong, Shenzhen", "aff_domain": "cuhk.edu.cn;;cuhk.edu.cn", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nxiao2021disentangling,\ntitle={Disentangling Adversarial Robustness in Directions of the Data Manifold},\nauthor={Jiancong Xiao and Liusha Yang and Zhi-Quan Luo},\nyear={2021},\nurl={https://openreview.net/forum?id=4mkxyuPcFt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=4mkxyuPcFt", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "3;4;5;3", "wc_review": "920;725;647;262", "wc_reply_reviewers": "194;0;0;0", "wc_reply_authors": "563;523;258;349", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 638.5, 239.03399339842858 ], "wc_reply_reviewers_avg": [ 48.5, 84.00446416709055 ], "wc_reply_authors_avg": [ 423.25, 124.80059094411372 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.4545454545454545, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8756986766019850335&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Chinese University of Hong Kong", "aff_unique_dep": "", "aff_unique_url": "https://www.cuhk.edu.cn", "aff_unique_abbr": "CUHK", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Shenzhen", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "4pN0NjwSoPR", "title": "Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution", "track": "main", "status": "Reject", "tldr": "", "abstract": "Model quantization is to discretize weights and activations of a deep neural network (DNN).\nUnlike previous methods that manually defined the quantization hyperparameters such as precision (\\ie bitwidth), dynamic range (\\ie minimum and maximum discrete values) and stepsize (\\ie interval between discrete values),\nthis work proposes a novel approach to differentiably learn all of them, named Differentiable Dynamic Quantization (DDQ), which possesses several appealing benefits. (1) Unlike previous works that applied the rounding operation to discretize values, DDQ provides a unified perspective by formulating discretization as a matrix-vector product, where different values of the matrix and vector represent different quantization methods such as mixed precision and soft quantization, and their values can be learned differentiably from training data, making different hidden layers in a DNN used different quantization methods. \n(2) DDQ is hardware-friendly, where all variables can be computed by using low-precision matrix-vector multiplication, making it capable in wide spectrum of hardwares.\n(3) The matrix variable is carefully reparameterized to reduce its number of parameters from O(2^{b^2}) to O(\\log2^b), where b is the bit width.\nExtensive experiments show that DDQ outperforms prior arts on various advanced networks and benchmarks. For instance, compared to the full-precision models, MobileNetv2 trained with DDQ achieves comparable top1 accuracy on ImageNet (71.7% vs 71.9%), while ResNet18 trained with DDQ increases accuracy by 0.5%.\nThese results relatively improve recent state-of-the-art quantization methods by 70% and 140% compared to the full-precision models.\n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Zhaoyang Zhang;Wenqi Shao;Jinwei Gu;Xiaogang Wang;Ping Luo", "authorids": "~Zhaoyang_Zhang1;~Wenqi_Shao2;~Jinwei_Gu1;~Xiaogang_Wang2;~Ping_Luo2", "gender": "M;M;M;M;", "homepage": "https://zzyfd.github.io/#/;https://wqshao126.github.io/;http://www.gujinwei.org;http://www.ee.cuhk.edu.hk/~xgwang/;http://luoping.me/", "dblp": ";227/3122;61/1140;91/6236-1.html;54/4989-2.html", "google_scholar": "Pf6o7uAAAAAJ;Bs9mrwwAAAAJ;k_T8t30AAAAJ;https://scholar.google.com.hk/citations?user=-B5JgjsAAAAJ;https://scholar.google.com.hk/citations?hl=en", "orcid": ";;0000-0001-8705-8237;;0000-0002-6685-7950", "linkedin": ";;;;", "or_profile": "~Zhaoyang_Zhang1;~Wenqi_Shao2;~Jinwei_Gu1;~Xiaogang_Wang2;~Luo_Ping2", "aff": "The Chinese University of Hong Kong;The Chinese University of Hong Kong;SenseBrain Technology;The Chinese University of Hong Kong;The University of Hong Kong", "aff_domain": "cuhk.edu.hk;cuhk.edu.hk;sensebrain.ai;cuhk.edu.hk;hku.hk", "position": "PhD student;PhD student;R&D Executive Director;Full Professor;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=4pN0NjwSoPR", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "5;4;4;5", "wc_review": "346;327;364;428", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 366.25, 37.97614382740828 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 36, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10702319906241740586&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;1;0;2", "aff_unique_norm": "Chinese University of Hong Kong;SenseBrain Technology;University of Hong Kong", "aff_unique_dep": ";;", "aff_unique_url": "https://www.cuhk.edu.hk;;https://www.hku.hk", "aff_unique_abbr": "CUHK;;HKU", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Hong Kong SAR;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China;" }, { "id": "4q8qGBf4Zxb", "title": "Network Architecture Search for Domain Adaptation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep networks have been used to learn transferable representations for domain adaptation. Existing deep domain adaptation methods systematically employ popular hand-crafted networks designed specifically for image-classification tasks, leading to sub-optimal domain adaptation performance. In this paper, we present Neural Architecture Search for Domain Adaptation (NASDA), a principle framework that leverages differentiable neural architecture search to derive the optimal network architecture for domain adaptation task. NASDA is designed with two novel training strategies: neural architecture search with multi-kernel Maximum Mean Discrepancy to derive the optimal architecture, and adversarial training between a feature generator and a batch of classifiers to consolidate the feature generator. We demonstrate experimentally that NASDA leads to state-of-the-art performance on several domain adaptation benchmarks.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/9c6a0374165f478e264d2e69ceeb8dce8e20ac4a.zip", "author": "Yichen Li;Xingchao Peng", "authorids": "~Yichen_Li2;~Xingchao_Peng1", "gender": "F;M", "homepage": ";http://cs-people.bu.edu/xpeng/", "dblp": ";http://dblp.uni-trier.de/pers/hd/p/Peng:Xingchao", "google_scholar": "https://scholar.google.com/citations?hl=en;66lkylsAAAAJ", "orcid": "0000-0002-5659-8748;", "linkedin": ";", "or_profile": "~Yichen_Li2;~Xingchao_Peng1", "aff": "Stanford University;Boston University", "aff_domain": "stanford.edu; ", "position": "MS student;Graduate Student", "bibtex": "@misc{\nli2021network,\ntitle={Network Architecture Search for Domain Adaptation},\nauthor={Yichen Li and Xingchao Peng},\nyear={2021},\nurl={https://openreview.net/forum?id=4q8qGBf4Zxb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=4q8qGBf4Zxb", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "5;4;5;3", "wc_review": "798;434;473;385", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 522.5, 162.08716790665449 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 17, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4509408709580056100&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Stanford University;Boston University", "aff_unique_dep": ";", "aff_unique_url": "https://www.stanford.edu;https://www.bu.edu", "aff_unique_abbr": "Stanford;BU", "aff_campus_unique_index": "0", "aff_campus_unique": "Stanford;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Scalable Bayesian Inverse Reinforcement Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3152", "id": "4qR3coiNaIv", "poster": "", "openreview": "https://openreview.net/forum?id=4qR3coiNaIv", "slides": "https://iclr.cc/virtual/2021/poster/3152", "video": "https://iclr.cc/virtual/2021/poster/3152", "author_site": "Alex Chan, Mihaela van der Schaar", "tldr": "", "abstract": "Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the inverse reinforcement learning problem. Unfortunately current methods generally do not scale well beyond the small tabular setting due to the need for an inner-loop MDP solver, and even non-Bayesian methods that do themselves scale often require extensive interaction with the environment to perform well, being inappropriate for high stakes or costly applications such as healthcare. In this paper we introduce our method, Approximate Variational Reward Imitation Learning (AVRIL), that addresses both of these issues by jointly learning an approximate posterior distribution over the reward that scales to arbitrarily complicated state spaces alongside an appropriate policy in a completely offline manner through a variational approach to said latent reward. Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.", "keywords": "Bayesian;Inverse reinforcement learning;Imitation Learning", "primary_area": "", "supplementary_material": "", "author": "Alex James Chan;Mihaela van der Schaar", "authorids": "~Alex_James_Chan1;~Mihaela_van_der_Schaar2", "gender": "M;F", "homepage": "https://alexjchan.com;https://www.vanderschaar-lab.com", "dblp": "268/6948;", "google_scholar": "yfy_BGIAAAAJ;DZ3S--MAAAAJ", "orcid": ";", "linkedin": "alex-chan-040081131/;", "or_profile": "~Alex_James_Chan1;~Mihaela_van_der_Schaar2", "aff": "University of Cambridge;University of California, Los Angeles", "aff_domain": "cam.ac.uk;ucla.edu", "position": "PhD student;Full Professor", "bibtex": "@inproceedings{\nchan2021scalable,\ntitle={Scalable Bayesian Inverse Reinforcement Learning},\nauthor={Alex James Chan and Mihaela van der Schaar},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=4qR3coiNaIv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "3;3;4;4", "wc_review": "622;345;1183;186", "wc_reply_reviewers": "0;0;535;0", "wc_reply_authors": "402;376;1109;212", "reply_reviewers": "0;0;3;0", "reply_authors": "1;1;4;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 584.0, 379.39754875328333 ], "wc_reply_reviewers_avg": [ 133.75, 231.66179551233733 ], "wc_reply_authors_avg": [ 524.75, 345.0922883809489 ], "reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 86, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12621774369321605222&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=4qR3coiNaIv", "email": "cam.ac.uk;ucla.edu", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "University of Cambridge;University of California, Los Angeles", "aff_unique_dep": ";", "aff_unique_url": "https://www.cam.ac.uk;https://www.ucla.edu", "aff_unique_abbr": "Cambridge;UCLA", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Cambridge;Los Angeles", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;United States" }, { "id": "4qgEGwOtxU", "title": "Importance and Coherence: Methods for Evaluating Modularity in Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "As deep neural networks become more advanced and widely-used, it is important to understand their inner workings. Toward this goal, modular interpretations are appealing because they offer flexible levels of abstraction aside from standard architectural building blocks (e.g., neurons, channels, layers). In this paper, we consider the problem of assessing how functionally interpretable a given partitioning of neurons is. We propose two proxies for this: importance which reflects how crucial sets of neurons are to network performance, and coherence which reflects how consistently their neurons associate with input/output features. To measure these proxies, we develop a set of statistical methods based on techniques that have conventionally been used for the interpretation of individual neurons. We apply these methods on partitionings generated by a spectral clustering algorithm which uses a graph representation of the network's neurons and weights. We show that despite our partitioning algorithm using neither activations nor gradients, it reveals clusters with a surprising amount of importance and coherence. Together, these results support the use of modular interpretations, and graph-based partitionings in particular, for interpretability.", "keywords": "interpretability;modularity", "primary_area": "", "supplementary_material": "/attachment/0bf63933bc515a916ebf0e7e8e8d45e3f81dc181.zip", "author": "Shlomi Hod;Stephen Casper;Daniel Filan;Cody Wild;Andrew Critch;Stuart Russell", "authorids": "~Shlomi_Hod1;~Stephen_Casper1;~Daniel_Filan1;~Cody_Wild1;~Andrew_Critch1;~Stuart_Russell1", "gender": ";M;M;M;M;F", "homepage": "https://shlomi.hod.xyz;https://stephencasper.com/;https://danielfilan.com/;http://acritch.com/;https://people.eecs.berkeley.edu/~russell/;https://scholar.google.com/citations?user=VcsUv5kAAAAJ&hl=en", "dblp": ";255/5295.html;;;;", "google_scholar": "s_WPt74AAAAJ;N4aglP4AAAAJ;9eoaiXMAAAAJ;F3_yOXUAAAAJ;https://scholar.google.com.tw/citations?user=KJGrjCAAAAAJ;VcsUv5kAAAAJ", "orcid": "0000-0002-0387-4542;0000-0003-0084-1937;;;;", "linkedin": "shlomi-hod/;;;acritch;;", "or_profile": "~Shlomi_Hod1;~Stephen_Casper1;~Daniel_Filan1;~Andrew_Critch1;~Stuart_Russell1;~Cody_Wild2", "aff": "Boston University;Harvard University;University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;University of California, Berkeley", "aff_domain": "bu.edu;harvard.edu;berkeley.edu;berkeley.edu;berkeley.edu;berkeley.edu", "position": "PhD student;Undergrad student;PhD student;Postdoc;Full Professor;Research Engineer", "bibtex": "@misc{\nhod2021importance,\ntitle={Importance and Coherence: Methods for Evaluating Modularity in Neural Networks},\nauthor={Shlomi Hod and Stephen Casper and Daniel Filan and Cody Wild and Andrew Critch and Stuart Russell},\nyear={2021},\nurl={https://openreview.net/forum?id=4qgEGwOtxU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=4qgEGwOtxU", "pdf_size": 0, "rating": "4;4;5", "confidence": "4;3;4", "wc_review": "812;283;629", "wc_reply_reviewers": "304;426;387", "wc_reply_authors": "860;899;894", "reply_reviewers": "1;2;1", "reply_authors": "2;3;2", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 574.6666666666666, 219.35410236013874 ], "wc_reply_reviewers_avg": [ 372.3333333333333, 50.87457343528516 ], "wc_reply_authors_avg": [ 884.3333333333334, 17.326921891156033 ], "reply_reviewers_avg": [ 1.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.3333333333333335, 0.4714045207910317 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3546007472992114508&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;2;2;2", "aff_unique_norm": "Boston University;Harvard University;University of California, Berkeley", "aff_unique_dep": ";;", "aff_unique_url": "https://www.bu.edu;https://www.harvard.edu;https://www.berkeley.edu", "aff_unique_abbr": "BU;Harvard;UC Berkeley", "aff_campus_unique_index": "1;1;1;1", "aff_campus_unique": ";Berkeley", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "4rsTcjH7co", "title": "Autoencoder Image Interpolation by Shaping the Latent Space", "track": "main", "status": "Reject", "tldr": "", "abstract": "One of the fascinating properties of deep learning is the ability of the network to reveal the underlying factors characterizing elements in datasets of different types. Autoencoders represent an effective approach for computing these factors. Autoencoders have been studied in the context of enabling interpolation between data points by decoding convex combinations of latent vectors. However, this interpolation often leads to artifacts or produces unrealistic results during reconstruction. We argue that these incongruities are due to the structure of the latent space and to the fact that such naively interpolated latent vectors deviate from the data manifold. In this paper, we propose a regularization technique that shapes the latent representation to follow a manifold that is consistent with the training images and that forces the manifold to be smooth and locally convex. This regularization not only enables faithful interpolation between data points, as we show herein, but can also be used as a general regularization technique to avoid overfitting or to produce new samples for data augmentation", "keywords": "deep learning;autoencoders;deep generative models;representation learning", "primary_area": "", "supplementary_material": "", "author": "Alon Oring;Zohar Yakhini;Yacov Hel-Or", "authorids": "~Alon_Oring1;zohar.yakhini@gmail.com;~Yacov_Hel-Or1", "gender": "M;;M", "homepage": ";;https://faculty.runi.ac.il/toky/", "dblp": ";;", "google_scholar": ";;KkKdy9EAAAAJ", "orcid": ";;", "linkedin": "oringa/;;", "or_profile": "~Alon_Oring1;zohar.yakhini@gmail.com;~Yacov_Hel-Or1", "aff": "Interdisciplinary Center Herzliya;;Reichman University", "aff_domain": "idc.ac.il;;runi.ac.il", "position": "MS student;;Full Professor", "bibtex": "@misc{\noring2021autoencoder,\ntitle={Autoencoder Image Interpolation by Shaping the Latent Space},\nauthor={Alon Oring and Zohar Yakhini and Yacov Hel-Or},\nyear={2021},\nurl={https://openreview.net/forum?id=4rsTcjH7co}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=4rsTcjH7co", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;4;4;4", "wc_review": "400;417;1211;192", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "335;305;1462;14", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 555.0, 388.9646513502223 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 529.0, 553.0655476523556 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 47, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16963776820747024074&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1", "aff_unique_norm": "Interdisciplinary Center Herzliya;Reichman University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ich.haifa.ac.il/;https://www Reichman.ac.il", "aff_unique_abbr": "ICH;RUT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Israel" }, { "id": "4sCyjwaVtZ9", "title": "Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible", "track": "main", "status": "Reject", "tldr": "", "abstract": "Machine learning is predicated on the concept of generalization: a model achieving low error on a sufficiently large training set should also perform well on novel samples from the same distribution. We show that both data whitening and second order optimization can harm or entirely prevent generalization. In general, model training harnesses information contained in the sample-sample second moment matrix of a dataset. For a general class of models, namely models with a fully connected first layer, we prove that the information contained in this matrix is the only information which can be used to generalize. Models trained using whitened data, or with certain second order optimization schemes, have less access to this information; in the high dimensional regime they have no access at all, resulting in poor or nonexistent generalization ability. We experimentally verify these predictions for several architectures, and further demonstrate that generalization continues to be harmed even when theoretical requirements are relaxed. However, we also show experimentally that regularized second order optimization can provide a practical tradeoff, where training is accelerated but less information is lost, and generalization can in some circumstances even improve.", "keywords": "whitening;second order optimization;deep networks;generalization", "primary_area": "", "supplementary_material": "/attachment/8b546fc5fa675b537b4b78fd622ab6c6093711b6.zip", "author": "Neha S. Wadia;Daniel Duckworth;Samuel Stern Schoenholz;Ethan Dyer;Jascha Sohl-Dickstein", "authorids": "~Neha_S._Wadia1;~Daniel_Duckworth1;~Samuel_Stern_Schoenholz1;~Ethan_Dyer1;~Jascha_Sohl-Dickstein2", "gender": "F;M;M;M;M", "homepage": "https://neha-wadia.github.io;;https://samschoenholz.wordpress.com/;;http://sohldickstein.com", "dblp": ";10/8371.html;190/7108;;51/7117", "google_scholar": "5qC5g3MAAAAJ;2fWmq-4AAAAJ;mk-zQBsAAAAJ;;-3zYIjQAAAAJ", "orcid": ";;;;", "linkedin": ";dduckworth/;samuel-schoenholz-379830a0;;", "or_profile": "~Neha_S._Wadia1;~Daniel_Duckworth1;~Samuel_Stern_Schoenholz1;~Ethan_Dyer1;~Jascha_Sohl-Dickstein1", "aff": "University of California, Berkeley;Google;Google;Google;Google", "aff_domain": "berkeley.edu;google.com;google.com;google.com;google.com", "position": "PhD student;Researcher;Research Scientist;Staff;Research Scientist", "bibtex": "@misc{\nwadia2021whitening,\ntitle={Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible},\nauthor={Neha S. Wadia and Daniel Duckworth and Samuel Stern Schoenholz and Ethan Dyer and Jascha Sohl-Dickstein},\nyear={2021},\nurl={https://openreview.net/forum?id=4sCyjwaVtZ9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=4sCyjwaVtZ9", "pdf_size": 0, "rating": "4;4;7;7", "confidence": "4;5;3;4", "wc_review": "830;686;789;455", "wc_reply_reviewers": "381;0;17;72", "wc_reply_authors": "1938;1059;864;1079", "reply_reviewers": "2;0;1;1", "reply_authors": "4;2;2;3", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 690.0, 145.46649098675613 ], "wc_reply_reviewers_avg": [ 117.5, 154.44173658697315 ], "wc_reply_authors_avg": [ 1235.0, 414.47617543110965 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.75, 0.82915619758885 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=993954729910250313&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1;1;1", "aff_unique_norm": "University of California, Berkeley;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.berkeley.edu;https://www.google.com", "aff_unique_abbr": "UC Berkeley;Google", "aff_campus_unique_index": "0;1;1;1;1", "aff_campus_unique": "Berkeley;Mountain View", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "4vDf4Qtodh", "title": "InstantEmbedding: Efficient Local Node Representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we introduce InstantEmbedding, an efficient method for generating single-node representations using local PageRank computations. We prove that our approach produces globally consistent representations in sublinear time. We demonstrate this empirically by conducting extensive experiments on real-world datasets with over a billion edges. Our experiments confirm that InstantEmbedding requires drastically less computation time (over 9,000 times faster) and less memory (by over 8,000 times) to produce a single node\u2019s embedding than traditional methods including DeepWalk, node2vec, VERSE, and FastRP. We also show that our method produces high quality representations, demonstrating results that meet or exceed the state of the art for unsupervised representation learning on tasks like node classification and link prediction.", "keywords": "Node Embedding;Structural Graph Representations;Graph Embedding;Local Algorithms", "primary_area": "", "supplementary_material": "", "author": "Stefan Postavaru;Anton Tsitsulin;Filipe Miguel Goncalves de Almeida;Yingtao Tian;Silvio Lattanzi;Bryan Perozzi", "authorids": "~Stefan_Postavaru1;~Anton_Tsitsulin1;~Filipe_Miguel_Goncalves_de_Almeida1;~Yingtao_Tian1;~Silvio_Lattanzi1;~Bryan_Perozzi1", "gender": "M;M;M;;M;", "homepage": ";http://tsitsul.in;https://www.linkedin.com/in/fmgda/;https://alantian.net/;https://sites.google.com/site/silviolattanzi/;http://www.perozzi.net/", "dblp": ";217/1668;;180/5335;46/6611;91/10813", "google_scholar": "ZrvSHigAAAAJ;https://scholar.google.com/citations?hl=en;;17Fe5K0AAAAJ;vxUZ4AUAAAAJ;rZgbMs4AAAAJ", "orcid": ";;;;;", "linkedin": "%C8%99tefan-post%C4%83varu-00222287/;atsitsulin/;;;;", "or_profile": "~Stefan_Postavaru1;~Anton_Tsitsulin1;~Filipe_Miguel_Goncalves_de_Almeida1;~Yingtao_Tian1;~Silvio_Lattanzi1;~Bryan_Perozzi1", "aff": "Google;University of Bonn;Google;Google;Google;Google", "aff_domain": "google.com;uni-bonn.de;google.com;google.com;google.com;google.com", "position": "AI Resident;PhD student;Software Engineer;Research Scientist;Researcher;Researcher", "bibtex": "@misc{\npostavaru2021instantembedding,\ntitle={InstantEmbedding: Efficient Local Node Representations},\nauthor={Stefan Postavaru and Anton Tsitsulin and Filipe Miguel Goncalves de Almeida and Yingtao Tian and Silvio Lattanzi and Bryan Perozzi},\nyear={2021},\nurl={https://openreview.net/forum?id=4vDf4Qtodh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=4vDf4Qtodh", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "3;5;4;3", "wc_review": "284;1260;352;584", "wc_reply_reviewers": "0;0;67;0", "wc_reply_authors": "447;1216;503;657", "reply_reviewers": "0;0;1;0", "reply_authors": "1;2;2;1", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 620.0, 385.8808106138475 ], "wc_reply_reviewers_avg": [ 16.75, 29.011851026778693 ], "wc_reply_authors_avg": [ 705.75, 304.46294930582275 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9539548960153198818&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;0;0;0;0", "aff_unique_norm": "Google;University of Bonn", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.uni-bonn.de/", "aff_unique_abbr": "Google;UBonn", "aff_campus_unique_index": "0;0;0;0;0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;1;0;0;0;0", "aff_country_unique": "United States;Germany" }, { "id": "4xzY5yod28y", "title": "Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent", "track": "main", "status": "Reject", "tldr": "", "abstract": "Stochastic gradient descent (SGD) algorithms, with constant momentum and its variants such as Adam, are the optimization methods of choice for training deep neural networks (DNNs). There is great interest in speeding up the convergence of these methods due to their high computational expense. Nesterov accelerated gradient (NAG) with a time-varying momentum, denoted as NAG below, improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst. In this paper, we propose scheduled restart SGD (SRSGD), a new NAG-style scheme for training DNNs. SRSGD replaces the constant momentum in SGD by the increasing momentum in NAG but stabilizes the iterations by resetting the momentum to zero according to a schedule. Using a variety of models and benchmarks for image classification, we demonstrate that, in training DNNs, SRSGD significantly improves convergence and generalization; for instance, in training ResNet-200 for ImageNet classification, SRSGD achieves an error rate of 20.93% vs. the benchmark of 22.13%. These improvements become more significant as the network grows deeper. Furthermore, on both CIFAR and ImageNet, SRSGD reaches similar or even better error rates with significantly fewer training epochs compared to the SGD baseline.", "keywords": "Nesterov Accelerated Gradient;Deep Learning;Image Classification", "primary_area": "", "supplementary_material": "/attachment/13d2013616fa3f98c07e97ff5103eff5a23252a6.zip", "author": "Bao Wang;Tan Minh Nguyen;Tao Sun;Andrea Bertozzi;Richard Baraniuk;Stanley Osher", "authorids": "~Bao_Wang1;~Tan_Minh_Nguyen1;~Tao_Sun7;~Andrea_Bertozzi1;~Richard_Baraniuk1;~Stanley_Osher1", "gender": "M;M;M;F;;M", "homepage": "https://www.math.utah.edu/~bwang/index.html;https://tanmnguyen89.github.io/;;http://www.math.ucla.edu/~bertozzi;http://richb.rice.edu/;https://www.math.ucla.edu/~sjo/", "dblp": ";255/4725;74/3590-5;80/2099.html;32/2804;", "google_scholar": ";OizOh88AAAAJ;fPNZpAe5WXIC;VJPRn1oAAAAJ;https://scholar.google.com.tw/citations?user=N-BBA20AAAAJ;", "orcid": ";;;0000-0003-0396-7391;;", "linkedin": ";;;;richard-baraniuk;", "or_profile": "~Bao_Wang1;~Tan_Minh_Nguyen1;~Tao_Sun7;~Andrea_Bertozzi1;~Richard_Baraniuk1;~Stanley_Osher1", "aff": "University of Utah;University of California, Los Angeles;National University of Defense Technology;University of California, Los Angeles;William Marsh Rice University;University of California, Los Angeles", "aff_domain": "utah.edu;ucla.edu;nudt.edu.cn;math.ucla.edu;rice.edu;ucla.edu", "position": "Assistant Professor;Postdoc;Assistant Professor;Full Professor;C. Sidney Burrus Professor;Full Professor", "bibtex": "@misc{\nwang2021scheduled,\ntitle={Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent},\nauthor={Bao Wang and Tan Minh Nguyen and Tao Sun and Andrea Bertozzi and Richard Baraniuk and Stanley Osher},\nyear={2021},\nurl={https://openreview.net/forum?id=4xzY5yod28y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer5;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=4xzY5yod28y", "pdf_size": 0, "rating": "4;5;5;6;6", "confidence": "4;3;4;4;4", "wc_review": "641;199;421;619;775", "wc_reply_reviewers": "340;88;186;191;465", "wc_reply_authors": "2315;1346;1068;1314;2216", "reply_reviewers": "1;1;1;1;1", "reply_authors": "4;3;3;3;4", "rating_avg": [ 5.2, 0.7483314773547882 ], "confidence_avg": [ 3.8, 0.39999999999999997 ], "wc_review_avg": [ 531.0, 200.85019292995466 ], "wc_reply_reviewers_avg": [ 254.0, 132.71473166156048 ], "wc_reply_authors_avg": [ 1651.8, 511.19483565466504 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 3.4, 0.4898979485566356 ], "replies_avg": [ 30, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.13363062095621225, "gs_citation": 54, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8136235344143223162&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;2;1;3;1", "aff_unique_norm": "University of Utah;University of California, Los Angeles;National University of Defense Technology;Rice University", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.utah.edu;https://www.ucla.edu;http://www.nudt.edu.cn/;https://www.rice.edu", "aff_unique_abbr": "Utah;UCLA;NUDT;Rice", "aff_campus_unique_index": "1;1;1", "aff_campus_unique": ";Los Angeles", "aff_country_unique_index": "0;0;1;0;0;0", "aff_country_unique": "United States;China" }, { "id": "4zr9e5xwZ9Y", "title": "Distributed Training of Graph Convolutional Networks using Subgraph Approximation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Modern machine learning techniques are successfully being adapted to data modeled as graphs. However, many real-world graphs are typically very large and do not fit in memory, often making the problem of training machine learning models on them intractable. Distributed training has been successfully employed to alleviate memory problems and speed up training in machine learning domains in which the input data is assumed to be independently identical distributed (i.i.d). However, distributing the training of non i.i.d data such as graphs that are used as training inputs in Graph Convolutional Networks (GCNs) causes accuracy problems since information is lost at the graph partitioning boundaries.\n\nIn this paper, we propose a training strategy that mitigates the lost information across multiple partitions of a graph through a subgraph approximation scheme. Our proposed approach augments each sub-graph with a small amount of edge and vertex information that is approximated from all other sub-graphs. The subgraph approximation approach helps the distributed training system converge at single-machine accuracy, while keeping the memory footprint low and minimizing synchronization overhead between the machines. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Alexandra Angerd;Keshav Balasubramanian;Murali Annavaram", "authorids": "~Alexandra_Angerd1;keshavba@usc.edu;~Murali_Annavaram1", "gender": "F;;M", "homepage": ";;http://annavar.am", "dblp": ";;02/5812", "google_scholar": ";;https://scholar.google.com/citations?hl=en", "orcid": ";;0000-0002-4633-6867", "linkedin": "alexandra-angerd-2b776337/;;", "or_profile": "~Alexandra_Angerd1;keshavba@usc.edu;~Murali_Annavaram1", "aff": "Chalmers University;;University of Southern California", "aff_domain": "chalmers.se;;usc.edu", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nangerd2021distributed,\ntitle={Distributed Training of Graph Convolutional Networks using Subgraph Approximation},\nauthor={Alexandra Angerd and Keshav Balasubramanian and Murali Annavaram},\nyear={2021},\nurl={https://openreview.net/forum?id=4zr9e5xwZ9Y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=4zr9e5xwZ9Y", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;5;4;4", "wc_review": "315;280;467;627", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 422.25, 137.53431390020455 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1624271646938805175&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Chalmers University of Technology;University of Southern California", "aff_unique_dep": ";", "aff_unique_url": "https://www.chalmers.se;https://www.usc.edu", "aff_unique_abbr": "Chalmers;USC", "aff_campus_unique_index": "1", "aff_campus_unique": ";Los Angeles", "aff_country_unique_index": "0;1", "aff_country_unique": "Sweden;United States" }, { "id": "510f7KAPmYR", "title": "Intra-layer Neural Architecture Search", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We propose an efficient neural architecture search (NAS) algorithm with a flexible search space that encompasses layer operations down to individual weights. This work addresses NAS challenges in a search space of weight connections within layers, specifically the large number of architecture variations compared to a high-level search space with predetermined layer types. Our algorithm continuously evolves network architecture by adding new candidate parameters (weights and biases) using a first-order estimation based on their gradients at 0. Training is decoupled into alternating steps: adjusting network weights holding architecture constant, and adjusting network architecture holding weights constant. We explore additional applications by extend this method for multi-task learning with shared parameters. On the CIFAR-10 dataset, our evolved network achieves an accuracy of 97.42\\% with 5M parameters, and 93.75\\% with 500K parameters. On the ImageNet dataset, we achieve 76.6\\% top-1 and 92.5\\% top-5 accuracy with a search restriction of 8.5M parameters.", "keywords": "Neural Networks;Neural Architecture Search;Training Algorithms", "primary_area": "", "supplementary_material": "", "author": "Dong Kai Wang;Nam Sung Kim", "authorids": "~Dong_Kai_Wang1;~Nam_Sung_Kim1", "gender": "M;", "homepage": "https://wdongkai.github.io/;", "dblp": ";", "google_scholar": "sDXzsb4AAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Dong_Kai_Wang1;~Nam_Sung_Kim1", "aff": "University of Illinois, Urbana Champaign;University of Wisconsin-Madison", "aff_domain": "illinois.edu;", "position": "PhD student;", "bibtex": "", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=510f7KAPmYR", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:84zkIHVZ1pcJ:scholar.google.com/&scioq=Intra-layer+Neural+Architecture+Search&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Illinois Urbana-Champaign;University of Wisconsin-Madison", "aff_unique_dep": ";", "aff_unique_url": "https://illinois.edu;https://www.wisc.edu", "aff_unique_abbr": "UIUC;UW-Madison", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Urbana-Champaign;Madison", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "53WS781RzT9", "title": "The Impact of the Mini-batch Size on the Dynamics of SGD: Variance and Beyond", "track": "main", "status": "Reject", "tldr": "", "abstract": "We study mini-batch stochastic gradient descent (SGD) dynamics under linear regression and deep linear networks by focusing on the variance of the gradients only given the initial weights and mini-batch size, which is the first study of this nature. In the linear regression case, we show that in each iteration the norm of the gradient is a decreasing function of the mini-batch size $b$ and thus the variance of the stochastic gradient estimator is a decreasing function of $b$. For deep neural networks with $L_2$ loss we show that the variance of the gradient is a polynomial in $1/b$. The results theoretically back the important intuition that smaller batch sizes yield larger variance of the stochastic gradients and lower loss function values which is a common believe among the researchers. The proof techniques exhibit a relationship between stochastic gradient estimators and initial weights, which is useful for further research on the dynamics of SGD. We empirically provide insights to our results on various datasets and commonly used deep network structures. We further discuss possible extensions of the approaches we build in studying the generalization ability of the deep learning models.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/d1df8d44631e48fc053ddc8c6ecbf9ed11ca472e.zip", "author": "Xin Qian;Diego Klabjan", "authorids": "~Xin_Qian2;~Diego_Klabjan1", "gender": "M;M", "homepage": "http://www.notavailable.com;http://dynresmanagement.com/index.html", "dblp": ";17/105", "google_scholar": ";TaQZ_VUAAAAJ", "orcid": ";0000-0003-4213-9281", "linkedin": ";diegoklabjan", "or_profile": "~Xin_Qian2;~Diego_Klabjan1", "aff": "Northwestern University, Northwestern University;Northwestern University", "aff_domain": "u.northwestern.edu;u.northwestern.edu", "position": "PhD student;Full Professor", "bibtex": "@misc{\nqian2021the,\ntitle={The Impact of the Mini-batch Size on the Dynamics of {\\{}SGD{\\}}: Variance and Beyond},\nauthor={Xin Qian and Diego Klabjan},\nyear={2021},\nurl={https://openreview.net/forum?id=53WS781RzT9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=53WS781RzT9", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "5;4;4;4", "wc_review": "418;166;388;781", "wc_reply_reviewers": "0;0;83;0", "wc_reply_authors": "922;362;495;685", "reply_reviewers": "0;0;1;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 438.25, 220.5293347833798 ], "wc_reply_reviewers_avg": [ 20.75, 35.94005425705421 ], "wc_reply_authors_avg": [ 616.0, 210.68578499746963 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7745966692414834, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:6-NlDEt1Gy8J:scholar.google.com/&scioq=The+Impact+of+the+Mini-batch+Size+on+the+Dynamics+of+SGD:+Variance+and+Beyond&hl=en&as_sdt=0,5", "gs_version_total": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Northwestern University", "aff_unique_dep": "", "aff_unique_url": "https://www.northwestern.edu", "aff_unique_abbr": "NU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "54-QTuqSLyn", "title": "Mitigating Mode Collapse by Sidestepping Catastrophic Forgetting", "track": "main", "status": "Reject", "tldr": "", "abstract": "Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator. Investigative study using a new data generation procedure indicates that the mode collapse of the generator is driven by the discriminator\u2019s inability to maintain classification accuracy on previously seen samples, a phenomenon called Catastrophic Forgetting in continual learning. Motivated by this observation, we introduce a novel training procedure that dynamically spawns additional dis-criminators to remember previous modes of generation. On several datasets, we show that our training scheme can be plugged-in to existing GAN frameworks to mitigate mode collapse and improve standard metrics for GAN evaluation.", "keywords": "mode collapse;catastrophic forgetting;multi-adversarial training", "primary_area": "", "supplementary_material": "/attachment/5484b45d4b97c083392dcecc03c001481da78392.zip", "author": "Karttikeya Mangalam;Rohin Garg;Jathushan Rajasegaran;Taesung Park", "authorids": "~Karttikeya_Mangalam1;~Rohin_Garg1;~Jathushan_Rajasegaran1;~Taesung_Park2", "gender": "M;M;M;M", "homepage": "http://karttikeya.github.io/;;https://brjathu.github.io/;https://taesung.me", "dblp": "200/8205;;211/4065;55/4543", "google_scholar": "2l1fWEoAAAAJ;;https://scholar.google.com.sg/citations?user=Ctp3igcAAAAJ;hHkuxSUAAAAJ", "orcid": ";;;", "linkedin": ";rohin-garg-953643151/;;", "or_profile": "~Karttikeya_Mangalam1;~Rohin_Garg1;~Jathushan_Rajasegaran1;~Taesung_Park1", "aff": "University of California, Berkeley;;;University of California, Berkeley", "aff_domain": "berkeley.edu;;;berkeley.edu", "position": "PhD student;;;PhD student", "bibtex": "@misc{\nmangalam2021mitigating,\ntitle={Mitigating Mode Collapse by Sidestepping Catastrophic Forgetting},\nauthor={Karttikeya Mangalam and Rohin Garg and Jathushan Rajasegaran and Taesung Park},\nyear={2021},\nurl={https://openreview.net/forum?id=54-QTuqSLyn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=54-QTuqSLyn", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "5;3;4;3", "wc_review": "1031;348;810;400", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 647.25, 284.7958698787607 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.674199862463242, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:zqSzJ2-sV-kJ:scholar.google.com/&scioq=Mitigating+Mode+Collapse+by+Sidestepping+Catastrophic+Forgetting&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of California, Berkeley", "aff_unique_dep": "", "aff_unique_url": "https://www.berkeley.edu", "aff_unique_abbr": "UC Berkeley", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Berkeley", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "5B8YAz6W3eX", "title": "Apollo: An Adaptive Parameter-wised Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we introduce Apollo, a quasi-newton method for noncovex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix. Algorithmically, Apollo requires only first-order gradients and updates the approximation of the Hessian diagonally such that it satisfies the weak secant relation. To handle nonconvexity, we replace the Hessian with its absolute value, the computation of which is also efficient under our diagonal approximation, yielding an optimization algorithm with linear complexity for both time and memory. Experimentally, through three tasks on vision and language we show that Apollo achieves significant improvements over other stochastic optimization methods, including SGD and variants of Adam, in term of both convergence speed and generalization performance.", "keywords": "Optimization;Stochastic Optimization;Nonconvex;Quasi-Newton;Neural Network;Deep Learning", "primary_area": "", "supplementary_material": "", "author": "Xuezhe Ma", "authorids": "~Xuezhe_Ma1", "gender": "M", "homepage": "https://xuezhemax.github.io/", "dblp": "127/0230", "google_scholar": "6_MQLIcAAAAJ", "orcid": "", "linkedin": "xuezhe-ma-b5354731", "or_profile": "~Xuezhe_Ma1", "aff": "USC/ISI", "aff_domain": "isi.edu", "position": "Assistant Professor", "bibtex": "@misc{\nma2021apollo,\ntitle={Apollo: An Adaptive Parameter-wised Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization},\nauthor={Xuezhe Ma},\nyear={2021},\nurl={https://openreview.net/forum?id=5B8YAz6W3eX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=5B8YAz6W3eX", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;4;3;4", "wc_review": "558;592;388;847", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "361;200;342;536", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 596.25, 164.1072438986165 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 359.75, 119.26939045706573 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 50, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13451820388334628730&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "University of Southern California", "aff_unique_dep": "", "aff_unique_url": "https://isi.usc.edu", "aff_unique_abbr": "USC", "aff_campus_unique_index": "0", "aff_campus_unique": "ISI", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "5Dj8rVRg9Ui", "title": "Using Synthetic Data to Improve the Long-range Forecasting of Time Series Data", "track": "main", "status": "Reject", "tldr": "", "abstract": "Effective long-range forecasting of time series data remains an unsolved and open problem. One possible approach is to use generative models to improve long-range forecasting, but the challenge then is how to generate high-quality synthetic data. In this paper, we propose a conditional Wasserstein GAN with Gradient and Error Penalty (cWGAN-GEP), aiming to generate accurate synthetic data that preserves the temporal dynamics between the conditioning input and generated data. By using such synthetic data, we develop a long-range forecasting method called Generative Forecasting (GenF). GenF consists of three key components: (i) a cWGAN-GEP based generator, to generate synthetic data for next few time steps. (ii) a predictor which makes long-range predictions based on generated and observed data. (iii) an information theoretic clustering (ITC) algorithm to better train the cWGAN-GEP based generator and the predictor. Our experimental results on three public datasets demonstrate that GenF significantly outperforms a diverse range of state-of-the-art benchmarks and classical approaches. In most cases, we find an improvement of at least 10% over all studied methods. Lastly, we conduct an ablation study to demonstrate the effectiveness of the cWGAN-GEP and the ITC algorithm.", "keywords": "long-range time series data prediction;Generative Adversarial Network;Long Short-term Memory", "primary_area": "", "supplementary_material": "", "author": "Shiyu Liu;Mehul Motani", "authorids": "~Shiyu_Liu1;~Mehul_Motani1", "gender": ";M", "homepage": "https://nil.com;https://mehulmotani.github.io/", "dblp": ";83/4035", "google_scholar": ";https://scholar.google.com.sg/citations?user=Bm9BwEQAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Shiyu_Liu1;~Mehul_Motani1", "aff": "National University of Singapore;National University of Singapore", "aff_domain": "nus.edu.sg;nus.edu.sg", "position": "PhD student;Associate Professor", "bibtex": "@misc{\nliu2021using,\ntitle={Using Synthetic Data to Improve the Long-range Forecasting of Time Series Data},\nauthor={Shiyu Liu and Mehul Motani},\nyear={2021},\nurl={https://openreview.net/forum?id=5Dj8rVRg9Ui}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=5Dj8rVRg9Ui", "pdf_size": 0, "rating": "5;5;6", "confidence": "4;3;4", "wc_review": "360;231;814", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "587;413;840", "reply_reviewers": "0;0;0", "reply_authors": "1;1;2", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 468.3333333333333, 250.03244233943366 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 613.3333333333334, 175.31368711224144 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:6gadgMhJwQIJ:scholar.google.com/&scioq=Using+Synthetic+Data+to+Improve+the+Long-range+Forecasting+of+Time+Series+Data&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "National University of Singapore", "aff_unique_dep": "", "aff_unique_url": "https://www.nus.edu.sg", "aff_unique_abbr": "NUS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Singapore" }, { "id": "5FRJWsiLRmA", "title": "Reservoir Transformers", "track": "main", "status": "Reject", "tldr": "", "abstract": "We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear reservoir layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Sheng Shen;Alexei Baevski;Ari S. Morcos;Kurt Keutzer;Michael Auli;Douwe Kiela", "authorids": "sheng.s@berkeley.edu;~Alexei_Baevski1;~Ari_S._Morcos1;~Kurt_Keutzer1;~Michael_Auli1;~Douwe_Kiela1", "gender": ";;;M;;M", "homepage": ";;;https://people.eecs.berkeley.edu/~keutzer/;;https://douwekiela.github.io", "dblp": ";;;k/KurtKeutzer.html;;136/9140", "google_scholar": ";;;ID9QePIAAAAJ;;Q0piorUAAAAJ", "orcid": ";;;0000-0003-3868-8501;;", "linkedin": ";;;kurtkeutzer/;;", "or_profile": "sheng.s@berkeley.edu;~Alexei_Baevski1;~Ari_S._Morcos1;~Kurt_Keutzer1;~Michael_Auli1;~Douwe_Kiela1", "aff": ";;;University of California, Berkeley;;Facebook AI Research", "aff_domain": ";;;berkeley.edu;;fb.com", "position": ";;;Full Professor;;Research Scientist", "bibtex": "@misc{\nshen2021reservoir,\ntitle={Reservoir Transformers},\nauthor={Sheng Shen and Alexei Baevski and Ari S. Morcos and Kurt Keutzer and Michael Auli and Douwe Kiela},\nyear={2021},\nurl={https://openreview.net/forum?id=5FRJWsiLRmA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=5FRJWsiLRmA", "pdf_size": 0, "rating": "5;5;7", "confidence": "4;3;4", "wc_review": "653;331;434", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "655;335;677", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 472.6666666666667, 134.269215467367 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 555.6666666666666, 156.2931718136002 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.49999999999999983, "gs_citation": 32, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7233417758951236944&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1", "aff_unique_norm": "University of California, Berkeley;Meta", "aff_unique_dep": ";Facebook AI Research", "aff_unique_url": "https://www.berkeley.edu;https://research.facebook.com", "aff_unique_abbr": "UC Berkeley;FAIR", "aff_campus_unique_index": "0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "5IqTrksw9S", "title": "GLUECode: A Benchmark for Source Code Machine Learning Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "A multitude of machine learning models for source code have been proposed in the recent years capturing various aspects of the inherent rich structure and semantics of code. However, these models are commonly designed to perform well on a single task, failing to capture code's multifaceted nature. To address this, we present GLUECode, Global and Local Understanding Evaluation of Code, a benchmark of diverse tasks to evaluate machine learning models of source code. \n\nCrucially, GLUECode accounts for the distinct characteristics of source code: (1) source code is highly structured and (2) source code is often composed of multiple interacting entities. Existing tasks incentivize researchers to create models and code representations that perform well on a single task - commonly focusing on local reasoning. GLUECode aims to allow researchers to experiment with multiple local and global source code representations, and evaluate these models on their ability to capture the diverse characteristics of source code, thus driving the community towards building robust source code models incorporating global reasoning. \n\nWe present results for several baselines. The GLUECode tasks are challenging for the evaluated baselines; no model achieves convincing performance across all tasks. This indicates that there is ample room for progress on GLUECode.", "keywords": "benchmark;source code;code understanding;deep learning", "primary_area": "", "supplementary_material": "", "author": "Anjan Karmakar;Julian Aron Prenner;Miltiadis Allamanis;Romain Robbes", "authorids": "~Anjan_Karmakar1;julianaron.prenner@unibz.it;~Miltiadis_Allamanis1;rrobbes@unibz.it", "gender": "M;;;", "homepage": ";;;", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": "theanjankarmakar/;;;", "or_profile": "~Anjan_Karmakar1;julianaron.prenner@unibz.it;~Miltiadis_Allamanis1;rrobbes@unibz.it", "aff": "Free University of Bozen-Bolzano;;;", "aff_domain": "unibz.it;;;", "position": "PhD student;;;", "bibtex": "@misc{\nkarmakar2021gluecode,\ntitle={{\\{}GLUEC{\\}}ode: A Benchmark for Source Code Machine Learning Models},\nauthor={Anjan Karmakar and Julian Aron Prenner and Miltiadis Allamanis and Romain Robbes},\nyear={2021},\nurl={https://openreview.net/forum?id=5IqTrksw9S}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=5IqTrksw9S", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "4;5;3;5", "wc_review": "506;810;306;371", "wc_reply_reviewers": "0;0;136;0", "wc_reply_authors": "1130;1179;1496;469", "reply_reviewers": "0;0;2;0", "reply_authors": "2;2;3;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 498.25, 193.9076777747596 ], "wc_reply_reviewers_avg": [ 34.0, 58.88972745734183 ], "wc_reply_authors_avg": [ 1068.5, 373.5468511445385 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:U6Mfa0GuJuUJ:scholar.google.com/&scioq=GLUECode:+A+Benchmark+for+Source+Code+Machine+Learning+Models&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Free University of Bozen-Bolzano", "aff_unique_dep": "", "aff_unique_url": "https://www.unibz.it", "aff_unique_abbr": "UNIBZ", "aff_country_unique_index": "0", "aff_country_unique": "Italy" }, { "id": "5IxMM3wSLDm", "title": "Novelty Detection with Rotated Contrastive Predictive Coding", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The current dominant paradigm for novelty detection relies on a learned model\u2019s capability to recover the regularities. To this end, reconstruction-based learning is often used in which the normality of an observation is expressed in how well it can be reconstructed. However, this can be limiting as anomalous data can be reconstructed well if enough common features are shared between normal and anomalous data. In this paper, we pursue an alternative approach wherein the normality is measured by a contrastive learning objective. Specifically, we propose Rotated Contrastive Predictive Coding (Rotated CPC) where the model operates on rotated images and simultaneously learns to predict the future in latent space. Normality score is thus measured as how predictive the representations are and the score\u2019s robustness is further improved by ensembling predictions on multiple rotations of the input signal. We demonstrate the efficacy of this formulation across a variety of benchmark datasets where our method outperforms state-of-the-art methods.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Dong Huk Park;Trevor Darrell", "authorids": "~Dong_Huk_Park2;~Trevor_Darrell2", "gender": "M;M", "homepage": ";https://people.eecs.berkeley.edu/~trevor/", "dblp": "182/1826;d/TrevorDarrell", "google_scholar": "_kJ-zUYAAAAJ;https://scholar.google.com.tw/citations?user=bh-uRFMAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Dong_Huk_Park2;~trevor_darrell1", "aff": "University of California, Berkeley;Electrical Engineering & Computer Science Department", "aff_domain": "berkeley.edu;eecs.berkeley.edu", "position": "PhD student;Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=5IxMM3wSLDm", "pdf_size": 0, "rating": "3;4;6", "confidence": "5;4;4", "wc_review": "453;221;1106", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 1.247219128924647 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 593.3333333333334, 374.67882542548654 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7559289460184544, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:PFPAbUwmZZYJ:scholar.google.com/&scioq=Novelty+Detection+with+Rotated+Contrastive+Predictive+Coding&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of California, Berkeley;Electrical Engineering & Computer Science Department", "aff_unique_dep": ";Electrical Engineering & Computer Science", "aff_unique_url": "https://www.berkeley.edu;", "aff_unique_abbr": "UC Berkeley;", "aff_campus_unique_index": "0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0", "aff_country_unique": "United States;" }, { "id": "5JnS8wROG9", "title": "On the Inductive Bias of a CNN for Distributions with Orthogonal Patterns", "track": "main", "status": "Reject", "tldr": "", "abstract": "Training overparameterized convolutional neural networks with gradient based optimization is the most successful learning method for image classification. However, their generalization properties are far from understood. In this work, we consider a simplified image classification task where images contain orthogonal patches and are learned with a 3-layer overparameterized convolutional network and stochastic gradient descent (SGD). We empirically identify a novel phenomenon of SGD in our setting, where the dot-product between the learned pattern detectors and their detected patterns are governed by the pattern statistics in the training set. We call this phenomenon Pattern Statistics Inductive Bias (PSI) and empirically verify it in a large number of instances. We prove that in our setting, if a learning algorithm satisfies PSI then its sample complexity is $O(d^2\\log(d))$ where $d$ is the filter dimension. In contrast, we show a VC dimension lower bound which is exponential in $d$. We perform experiments with overparameterized CNNs on a variant of MNIST with non-orthogonal patches, and show that the empirical observations are in line with our analysis.", "keywords": "Deep learning theory;generalization;overparemeterization;CNN", "primary_area": "", "supplementary_material": "", "author": "Alon Brutzkus;Amir Globerson", "authorids": "~Alon_Brutzkus1;~Amir_Globerson1", "gender": "M;M", "homepage": ";http://www.cs.tau.ac.il/~gamir/", "dblp": "161/7411;08/4162.html", "google_scholar": "m1wmXdgAAAAJ;https://scholar.google.com.tw/citations?user=5JserkUAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Alon_Brutzkus1;~Amir_Globerson1", "aff": "Tel Aviv University;Tel Aviv University", "aff_domain": "tau.ac.il;tau.ac.il", "position": "PhD student;Associate Professor", "bibtex": "@misc{\nbrutzkus2021on,\ntitle={On the Inductive Bias of a {\\{}CNN{\\}} for Distributions with Orthogonal Patterns},\nauthor={Alon Brutzkus and Amir Globerson},\nyear={2021},\nurl={https://openreview.net/forum?id=5JnS8wROG9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=5JnS8wROG9", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "5;3;3;5", "wc_review": "266;472;316;234", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "716;780;240;105", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 322.0, 91.40021881811882 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 460.25, 292.5580070686837 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7729073939333284652&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Tel Aviv University", "aff_unique_dep": "", "aff_unique_url": "https://www.tau.ac.il", "aff_unique_abbr": "TAU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Israel" }, { "id": "5K8ZG9twKY", "title": "Efficient Estimators for Heavy-Tailed Machine Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "A dramatic improvement in data collection technologies has aided in procuring massive amounts of unstructured and heterogeneous datasets. This has consequently led to a prevalence of heavy-tailed distributions across a broad range of tasks in machine learning. In this work, we perform thorough empirical studies to show that modern machine learning models such as generative adversarial networks and invertible flow models are plagued with such ill-behaved distributions during the phase of training them. To alleviate this problem, we develop a computationally-efficient estimator for mean estimation with provable guarantees which can handle such ill-behaved distributions. We provide specific consequences of our theory for supervised learning tasks such as linear regression and generalized linear models. Furthermore, we study the performance of our algorithm on synthetic tasks and real-world experiments and show that our methods convincingly outperform a variety of practical baselines.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/07ffc3cf72a24f03420b1b92c334228fa45704c7.zip", "author": "Vishwak Srinivasan;Adarsh Prasad;Sivaraman Balakrishnan;Pradeep Kumar Ravikumar", "authorids": "~Vishwak_Srinivasan1;~Adarsh_Prasad1;~Sivaraman_Balakrishnan1;~Pradeep_Kumar_Ravikumar1", "gender": "M;;M;M", "homepage": "https://www.mit.edu/~vishwaks;;http://www.stat.cmu.edu/~siva/;http://www.cs.cmu.edu/~pradeepr/", "dblp": "211/7746;154/6615;52/10671;94/3594", "google_scholar": "MW4-PPgAAAAJ;;o7yFQXUAAAAJ;https://scholar.google.com.tw/citations?user=Q4DTPw4AAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Vishwak_Srinivasan1;~Adarsh_Prasad1;~Sivaraman_Balakrishnan1;~Pradeep_Kumar_Ravikumar1", "aff": "Massachusetts Institute of Technology;Carnegie Mellon University;Carnegie Mellon University;School of Computer Science, Carnegie Mellon University", "aff_domain": "mit.edu;cmu.edu;cmu.edu;cs.cmu.edu", "position": "PhD student;PhD student;Assistant Professor;Associate Professor", "bibtex": "@misc{\nsrinivasan2021efficient,\ntitle={Efficient Estimators for Heavy-Tailed Machine Learning},\nauthor={Vishwak Srinivasan and Adarsh Prasad and Sivaraman Balakrishnan and Pradeep Kumar Ravikumar},\nyear={2021},\nurl={https://openreview.net/forum?id=5K8ZG9twKY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=5K8ZG9twKY", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "5;3;4;4", "wc_review": "1280;351;535;215", "wc_reply_reviewers": "0;0;0;129", "wc_reply_authors": "860;561;429;789", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;1;2", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 595.25, 411.3273483492193 ], "wc_reply_reviewers_avg": [ 32.25, 55.858638544096294 ], "wc_reply_authors_avg": [ 659.75, 173.0626692848576 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12278262507436466706&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "Massachusetts Institute of Technology;Carnegie Mellon University", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://www.cmu.edu", "aff_unique_abbr": "MIT;CMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "5L8XMh667qz", "title": "Encoded Prior Sliced Wasserstein AutoEncoder for learning latent manifold representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "While variational autoencoders have been successful in a variety of tasks, the use of conventional Gaussian or Gaussian mixture priors are limited in their ability to encode underlying structure of data in the latent representation.\nIn this work, we introduce an Encoded Prior Sliced Wasserstein AutoEncoder (EPSWAE) wherein an additional prior-encoder network facilitates learns an embedding of the data manifold which preserves topological and geometric properties of the data, thus improving the structure of latent space.\nThe autoencoder and prior-encoder networks are iteratively trained using the Sliced Wasserstein (SW) distance, which efficiently measures the distance between two \\textit{arbitrary} sampleable distributions without being constrained to a specific form as in the KL divergence, and without requiring expensive adversarial training.\nTo improve the representation, we use (1) a structural consistency term in the loss that encourages isometry between feature space and latent space and (2) a nonlinear variant of the SW distance which averages over random nonlinear shearing.\nThe effectiveness of the learned manifold encoding is best explored by traversing the latent space through interpolations along \\textit{geodesics} which generate samples that lie on the manifold and hence are advantageous compared to standard Euclidean interpolation.\nTo this end, we introduce a graph-based algorithm for interpolating along network-geodesics in latent space by maximizing the density of samples along the path while minimizing total energy. We use the 3D-spiral data to show that the prior does indeed encode the geometry underlying the data and to demonstrate the advantages of the network-algorithm for interpolation.\nAdditionally, we apply our framework to MNIST, and CelebA datasets, and show that outlier generations, latent representations, and geodesic interpolations are comparable to the state of the art.", "keywords": "VAE;sliced Wasserstein distance;latent representation;interpolation;manifold embedding;geodesics;network algorithm", "primary_area": "", "supplementary_material": "/attachment/6420da408dcbd67a6850493430c8a09015955ae9.zip", "author": "Sanjukta Krishnagopal;Jacob Bedrossian", "authorids": "~Sanjukta_Krishnagopal1;jacob@math.umd.edu", "gender": "F;", "homepage": "https://networks.cs.ucsb.edu/;", "dblp": ";", "google_scholar": ";", "orcid": "0000-0002-1556-404X;", "linkedin": ";", "or_profile": "~Sanjukta_Krishnagopal1;jacob@math.umd.edu", "aff": "University College London;", "aff_domain": "ucl.ac.uk;", "position": "Postdoc;", "bibtex": "@misc{\nkrishnagopal2021encoded,\ntitle={Encoded Prior Sliced Wasserstein AutoEncoder for learning latent manifold representations},\nauthor={Sanjukta Krishnagopal and Jacob Bedrossian},\nyear={2021},\nurl={https://openreview.net/forum?id=5L8XMh667qz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=5L8XMh667qz", "pdf_size": 0, "rating": "5;5;7", "confidence": "4;4;4", "wc_review": "747;379;1576", "wc_reply_reviewers": "263;0;202", "wc_reply_authors": "1831;1262;2249", "reply_reviewers": "1;0;1", "reply_authors": "3;2;4", "rating_avg": [ 5.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 900.6666666666666, 500.607852737272 ], "wc_reply_reviewers_avg": [ 155.0, 112.39513631232744 ], "wc_reply_authors_avg": [ 1780.6666666666667, 404.5098542955687 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 3.0, 0.816496580927726 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:bbgEPcAEArgJ:scholar.google.com/&scioq=Encoded+Prior+Sliced+Wasserstein+AutoEncoder+for+learning+latent+manifold+representations&hl=en&as_sdt=0,5", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "University College London", "aff_unique_dep": "", "aff_unique_url": "https://www.ucl.ac.uk", "aff_unique_abbr": "UCL", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "title": "Colorization Transformer", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2844", "id": "5NA1PinlGFu", "poster": "", "openreview": "https://openreview.net/forum?id=5NA1PinlGFu", "slides": "https://iclr.cc/virtual/2021/poster/2844", "video": "https://iclr.cc/virtual/2021/poster/2844", "author_site": "Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner", "tldr": "", "abstract": "We present the Colorization Transformer, a novel approach for diverse high fidelity image colorization based on self-attention. Given a grayscale image, the colorization proceeds in three steps. We first use a conditional autoregressive transformer to produce a low resolution coarse coloring of the grayscale image. Our architecture adopts conditional transformer layers to effectively condition grayscale input. Two subsequent fully parallel networks upsample the coarse colored low resolution image into a finely colored high resolution image. Sampling from the Colorization Transformer produces diverse colorings whose fidelity outperforms the previous state-of-the-art on colorising ImageNet based on FID results and based on a human evaluation in a Mechanical Turk test. Remarkably, in more than 60\\% of cases human evaluators prefer the highest rated among three generated colorings over the ground truth. The code and pre-trained checkpoints for Colorization Transformer are publicly available at https://github.com/google-research/google-research/tree/master/coltran", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Manoj Kumar;Dirk Weissenborn;Nal Kalchbrenner", "authorids": "~Manoj_Kumar1;~Dirk_Weissenborn1;~Nal_Kalchbrenner1", "gender": ";;", "homepage": "https://mechcoder.github.io/;;", "dblp": ";134/9095;", "google_scholar": "https://scholar.google.nl/citations?user=XQJN7dsAAAAJ;https://scholar.google.de/citations?user=DSQ-9ZwAAAAJ;https://scholar.google.co.uk/citations?user=LFyg0tAAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Manoj_Kumar1;~Dirk_Weissenborn1;~Nal_Kalchbrenner1", "aff": "Google;Google;", "aff_domain": "google.com;google.com;", "position": "Research Engineer;Research Scientist;", "bibtex": "@inproceedings{\nkumar2021colorization,\ntitle={Colorization Transformer},\nauthor={Manoj Kumar and Dirk Weissenborn and Nal Kalchbrenner},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=5NA1PinlGFu}\n}", "github": "[![github](/images/github_icon.svg) google-research/google-research](https://github.com/google-research/google-research) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=5NA1PinlGFu)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;4;4;4", "wc_review": "322;2071;412;267", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "952;2254;536;381", "reply_reviewers": "0;0;0;0", "reply_authors": "2;3;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 768.0, 754.0659785456443 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1030.75, 736.4602416288336 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 231, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11966816380914397894&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=5NA1PinlGFu", "email": "google.com;google.com;", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3097", "id": "5NsEIflpbSv", "poster": "", "openreview": "https://openreview.net/forum?id=5NsEIflpbSv", "slides": "https://iclr.cc/virtual/2021/poster/3097", "video": "https://iclr.cc/virtual/2021/poster/3097", "author_site": "Asish Ghoshal, Xilun Chen, Sonal Gupta, Luke Zettlemoyer, Yashar Mehdad", "tldr": "", "abstract": "Training with soft targets instead of hard targets has been shown to improve performance and calibration of deep neural networks. Label smoothing is a popular way of computing soft targets, where one-hot encoding of a class is smoothed with a uniform distribution. Owing to its simplicity, label smoothing has found wide-spread use for training deep neural networks on a wide variety of tasks, ranging from image and text classification to machine translation and semantic parsing. Complementing recent empirical justification for label smoothing, we obtain PAC-Bayesian generalization bounds for label smoothing and show that the generalization error depends on the choice of the noise (smoothing) distribution. Then we propose low-rank adaptive label smoothing (LORAS): a simple yet novel method for training with learned soft targets that generalizes label smoothing and adapts to the latent structure of the label space in structured prediction tasks. Specifically, we evaluate our method on semantic parsing tasks and show that training with appropriately smoothed soft targets can significantly improve accuracy and model calibration, especially in low-resource settings. Used in conjunction with pre-trained sequence-to-sequence models, our method achieves state of the art performance on four semantic parsing data sets. LORAS can be used with any model, improves performance and implicit model calibration without increasing the number of model parameters, and can be scaled to problems with large label spaces containing tens of thousands of labels.", "keywords": "label smoothing;calibration;semantic parsing;structured prediction", "primary_area": "", "supplementary_material": "", "author": "Asish Ghoshal;Xilun Chen;Sonal Gupta;Luke Zettlemoyer;Yashar Mehdad", "authorids": "~Asish_Ghoshal2;~Xilun_Chen1;sonalgupta@fb.com;~Luke_Zettlemoyer1;~Yashar_Mehdad2", "gender": ";;;M;", "homepage": ";https://xilunchen.com;;https://www.cs.washington.edu/people/faculty/lsz/;", "dblp": "https://dblp.uni-trier.de/pers/hd/g/Ghoshal:Asish;96/10207-2.html;;21/6793;", "google_scholar": "xJnThbEAAAAJ;eUk_hy8AAAAJ;;https://scholar.google.com.tw/citations?user=UjpbO6IAAAAJ;", "orcid": ";;;;", "linkedin": ";;;luke-zettlemoyer-a0109b226/;", "or_profile": "~Asish_Ghoshal2;~Xilun_Chen1;sonalgupta@fb.com;~Luke_Zettlemoyer1;~Yashar_Mehdad2", "aff": "Meta AI;Meta FAIR;;Meta;", "aff_domain": "meta.com;meta.com;;meta.com;", "position": "Research Scientist;Research Scientist;;Researcher;", "bibtex": "@inproceedings{\nghoshal2021learning,\ntitle={Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing},\nauthor={Asish Ghoshal and Xilun Chen and Sonal Gupta and Luke Zettlemoyer and Yashar Mehdad},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=5NsEIflpbSv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;2;5;4", "wc_review": "224;547;350;483", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "313;216;402;219", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 401.0, 124.46887161053561 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 287.5, 76.75447869668584 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 21, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3621655159422216463&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=5NsEIflpbSv", "email": "meta.com;meta.com;;meta.com;", "author_num": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta AI", "aff_unique_url": "https://meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "5PiSFHhRe2C", "title": "Meta Auxiliary Labels with Constituent-based Transformer for Aspect-based Sentiment Analysis", "track": "main", "status": "Reject", "tldr": "", "abstract": "Aspect based sentiment analysis (ABSA) is a challenging natural language processing task that could benefit from syntactic information. Previous work exploit dependency parses to improve performance on the task, but this requires the existence of good dependency parsers. In this paper, we build a constituent-based transformer for ABSA that can induce constituents without constituent parsers. We also apply meta auxiliary learning to generate labels on edges between tokens, supervised by the objective of the ABSA task. Without input from dependency parsers, our models outperform previous work on three Twitter data sets and match previous work closely on two review data sets.", "keywords": "Natural Language Processing;Sentiment Analysis", "primary_area": "", "supplementary_material": "", "author": "Ling Min Serena Khoo;Hai Leong Chieu", "authorids": "~Ling_Min_Serena_Khoo1;~Hai_Leong_Chieu1", "gender": "M;F", "homepage": "http://chaileon.github.io/;https://github.com/serenaklm", "dblp": "38/4132;", "google_scholar": "https://scholar.google.com.sg/citations?user=9QO16LcAAAAJ;xNB5IuMAAAAJ", "orcid": "0009-0003-6396-7614;", "linkedin": ";", "or_profile": "~Hai_Leong_Chieu1;~Serena_Khoo1", "aff": "DSO National Laboratories;DSO National Laboratories", "aff_domain": "dso.org.sg;dso.org.sg", "position": "Researcher;Member of Technical Staff", "bibtex": "@misc{\nkhoo2021meta,\ntitle={Meta Auxiliary Labels with Constituent-based Transformer for Aspect-based Sentiment Analysis},\nauthor={Ling Min Serena Khoo and Hai Leong Chieu},\nyear={2021},\nurl={https://openreview.net/forum?id=5PiSFHhRe2C}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=5PiSFHhRe2C", "pdf_size": 0, "rating": "2;3;4", "confidence": "4;5;4", "wc_review": "667;296;492", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "645;611;697", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 3.0, 0.816496580927726 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 485.0, 151.54097355720884 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 651.0, 35.364765892999586 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7987129057586525835&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "DSO National Laboratories", "aff_unique_dep": "", "aff_unique_url": "https://www.dso.org.sg", "aff_unique_abbr": "DSO", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Singapore" }, { "id": "5SST78xEh4A", "title": "ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "To train robust deep neural networks (DNNs), we systematically study several target modification approaches, which include output regularisation, self and non-self label correction (LC). Two key issues are discovered: (1) Self LC is the most appealing as it exploits its own knowledge and requires no extra models. However, how to automatically decide the trust degree of a learner as training goes is not well answered in the literature? (2) Some methods penalise while the others reward low-entropy predictions, prompting us to ask which one is better?\n\nTo resolve the first issue, taking two well-accepted propositions\u2013deep neural networks learn meaningful patterns before fitting noise (Arpit et al., 2017) and minimum entropy regularisation principle (Grandvalet & Bengio, 2006)\u2013we propose a novel end-to-end method named ProSelfLC, which is designed according to learning time and entropy. Specifically, given a data point, we progressively increase trust in its predicted label distribution versus its annotated one if a model has been trained for enough time and the prediction is of low entropy (high confidence). For the second issue, according to ProSelfLC, we empirically prove that it is better to redefine a meaningful low-entropy status and optimise the learner toward it. This serves as a defence of entropy minimisation. \n\nWe demonstrate the effectiveness of ProSelfLC through extensive experiments in both clean and noisy settings.", "keywords": "label correction;entropy minimisation;maximum entropy;confidence penalty;knowledge distillation;regularization;label noise", "primary_area": "", "supplementary_material": "", "author": "Xinshao Wang;Yang Hua;Elyor Kodirov;David A. Clifton;Neil M. Robertson", "authorids": "~Xinshao_Wang1;~Yang_Hua2;~Elyor_Kodirov1;~David_A._Clifton1;~Neil_M._Robertson1", "gender": "M;M;M;M;", "homepage": "https://xinshaoamoswang.github.io/about/;https://pure.qub.ac.uk/en/persons/yang-hua;;http://www.eng.ox.ac.uk/chi;https://pure.qub.ac.uk/en/persons/neil-robertson", "dblp": "230/3751;;123/2306;89/6424;26/7169", "google_scholar": "yOBhB7UAAAAJ;N0tFi8MAAAAJ;https://scholar.google.co.uk/citations?user=8DaEpdoAAAAJ;;https://scholar.google.co.uk/citations?user=vMTJBGEAAAAJ", "orcid": "0000-0001-8907-8258;0000-0001-5536-503X;;;", "linkedin": "xinshaowang/;;;;", "or_profile": "~Xinshao_Wang1;~Yang_Hua2;~Elyor_Kodirov1;~David_A._Clifton1;~Neil_M._Robertson1", "aff": "University of Oxford;Queen's University Belfast;;University of Oxford;Queen's University Belfast", "aff_domain": "eng.ox.ac.uk;qub.ac.uk;;ox.ac.uk;qub.ac.uk", "position": "Visit scholar;Assistant Professor;;Full Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=5SST78xEh4A", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;4;3;4", "wc_review": "120;510;537;248", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "189;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "1;0;0;0", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 353.75, 175.93802175766328 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 47.25, 81.83940065762945 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.25, 0.4330127018922193 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 80, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14790390926220732400&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;0;1", "aff_unique_norm": "University of Oxford;Queen's University Belfast", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://www.qub.ac.uk", "aff_unique_abbr": "Oxford;QUB", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "5Spjp0zDYt", "title": "Failure Modes of Variational Autoencoders and Their Effects on Downstream Tasks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Variational Auto-encoders (VAEs) are deep generative latent variable models that are widely used for a number of downstream tasks. While it has been demonstrated that VAE training can suffer from a number of pathologies, existing literature lacks characterizations of exactly when these pathologies occur and how they impact down-stream task performance. In this paper we concretely characterize conditions under which VAE training exhibits pathologies and connect these failure modes to undesirable effects on specific downstream tasks, such as learning compressed and disentangled representations, adversarial robustness and semi-supervised learning.", "keywords": "Variational Autoencoders;Variational Inference;VAE;Approximate Inference;Semi-Supervision", "primary_area": "", "supplementary_material": "/attachment/98375c5fecd7a2fdf5e3bb34b32c48075593f316.zip", "author": "Yaniv Yacoby;Weiwei Pan;Finale Doshi-Velez", "authorids": "~Yaniv_Yacoby1;~Weiwei_Pan1;~Finale_Doshi-Velez1", "gender": ";;F", "homepage": "https://yanivyacoby.github.io/;;https://finale.seas.harvard.edu/", "dblp": ";;64/7056", "google_scholar": "nEhVgawAAAAJ;;https://scholar.google.com/citations?hl=en", "orcid": ";;", "linkedin": ";;", "or_profile": "~Yaniv_Yacoby1;~Weiwei_Pan1;~Finale_Doshi-Velez1", "aff": "Harvard University;Harvard University;Harvard University", "aff_domain": "harvard.edu;harvard.edu;harvard.edu", "position": "PhD student;Postdoc;Professor", "bibtex": "@misc{\nyacoby2021failure,\ntitle={Failure Modes of Variational Autoencoders and Their Effects on Downstream Tasks},\nauthor={Yaniv Yacoby and Weiwei Pan and Finale Doshi-Velez},\nyear={2021},\nurl={https://openreview.net/forum?id=5Spjp0zDYt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=5Spjp0zDYt", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "5;3;3;3", "wc_review": "1713;646;305;619", "wc_reply_reviewers": "0;307;0;39", "wc_reply_authors": "760;757;409;749", "reply_reviewers": "0;1;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 820.75, 532.2942677692481 ], "wc_reply_reviewers_avg": [ 86.5, 128.29750582143052 ], "wc_reply_authors_avg": [ 668.75, 150.0206235822262 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 26, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10882037403625080405&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Harvard University", "aff_unique_dep": "", "aff_unique_url": "https://www.harvard.edu", "aff_unique_abbr": "Harvard", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "5USOVm2HkfG", "title": "Jointly-Trained State-Action Embedding for Efficient Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "While reinforcement learning has achieved considerable successes in recent years, state-of-the-art models are often still limited by the size of state and action spaces. Model-free reinforcement learning approaches use some form of state representations and the latest work has explored embedding techniques for actions, both with the aim of achieving better generalization and applicability. However, these approaches consider only states or actions, ignoring the interaction between them when generating embedded representations. In this work, we propose a new approach for jointly learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning, which can be applied in both discrete and continuous domains. Specifically, we use a model of the environment to obtain embeddings for states and actions and present a generic architecture that uses these to learn a policy. In this way, the embedded representations obtained via our approach enable better generalization over both states and actions by capturing similarities in the embedding spaces. Evaluations of our approach on several gaming, robotic control, and recommender systems show it significantly outperforms state-of-the-art models in both discrete/continuous domains with large state/action spaces, thus confirming its efficacy and the overall superior performance.", "keywords": "reinforcement learning;embedding;representation learning;state-action embedding", "primary_area": "", "supplementary_material": "", "author": "Paul Julian Pritz;Liang Ma;Kin Leung", "authorids": "~Paul_Julian_Pritz1;~Liang_Ma4;~Kin_Leung1", "gender": "M;;M", "homepage": "https://paulpritz.com/;;http://www.commsp.ee.ic.ac.uk/~kkleung/", "dblp": ";;", "google_scholar": "QgQIRl4AAAAJ;;https://scholar.google.com.tw/citations?user=IpG80kwAAAAJ", "orcid": ";;", "linkedin": ";liang-ma-2935162b;", "or_profile": "~Paul_Julian_Pritz1;~Liang_Ma4;~Kin_Leung1", "aff": "Imperial College London;Dataminr;Imperial College London", "aff_domain": "imperial.ac.uk;dataminr.com;", "position": "PhD student;Research Scientist;Full Professor", "bibtex": "@misc{\npritz2021jointlytrained,\ntitle={Jointly-Trained State-Action Embedding for Efficient Reinforcement Learning},\nauthor={Paul Julian Pritz and Liang Ma and Kin Leung},\nyear={2021},\nurl={https://openreview.net/forum?id=5USOVm2HkfG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=5USOVm2HkfG", "pdf_size": 0, "rating": "3;4;5;5;6", "confidence": "3;4;4;5;2", "wc_review": "494;1254;457;968;602", "wc_reply_reviewers": "0;0;0;756;197", "wc_reply_authors": "1147;767;759;2257;529", "reply_reviewers": "0;0;0;5;2", "reply_authors": "2;2;1;8;3", "rating_avg": [ 4.6, 1.0198039027185568 ], "confidence_avg": [ 3.6, 1.019803902718557 ], "wc_review_avg": [ 755.0, 308.1246501012212 ], "wc_reply_reviewers_avg": [ 190.6, 292.81502693680187 ], "wc_reply_authors_avg": [ 1091.8, 615.4238864392573 ], "reply_reviewers_avg": [ 1.4, 1.9595917942265424 ], "reply_authors_avg": [ 3.2, 2.4819347291981715 ], "replies_avg": [ 32, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.15384615384615383, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17056360192418438184&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Imperial College London;Dataminr", "aff_unique_dep": ";", "aff_unique_url": "https://www.imperial.ac.uk;https://www.dataminr.com", "aff_unique_abbr": "ICL;Dataminr", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United Kingdom;United States" }, { "id": "5UY7aZ_h37", "title": "Transferring Inductive Biases through Knowledge Distillation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Having the right inductive biases can be crucial in many tasks or scenarios where data or computing resources are a limiting factor, or where training data is not perfectly representative of the conditions at test time. However, defining, designing, and efficiently adapting inductive biases is not necessarily straightforward. Inductive biases of a model affect its generalisation behaviour and influence the solution it converges to from different aspects. In this paper, we investigate the power of knowledge distillation in transferring the effects of inductive biases of a teacher model to a student model, when they have different architectures.\nWe consider different families of models: LSTMs vs. Transformers and CNNs vs. MLPs, in the context of tasks and scenarios with linguistics and vision applications, where having the right inductive biases is critical. We train our models in different setups: no knowledge distillation, self-distillation, and distillation using a teacher with a better inductive bias for the task at hand. We show that in the later setup, compared to no distillation and self-distillation, we can not only improve the performance of the students, but also the solutions they converge become similar to their teachers with respect to a wide range of properties, including different task-specific performance metrics, per sample behaviour of the models, representational similarity and how the representational space of the models evolve during training, performance on out-of-distribution datasets, confidence calibration, and finally whether the converged solutions fall within the same basins of attractions.", "keywords": "Knowledge Distillation;Inductive Biases;Analyzing and Understanding Neural Networks;Recurrent Inductive Bias", "primary_area": "", "supplementary_material": "/attachment/f03e695e0ebae7db371838afcd296230e69734e7.zip", "author": "Samira Abnar;Mostafa Dehghani;Willem H. Zuidema", "authorids": "~Samira_Abnar1;~Mostafa_Dehghani1;~Willem_H._Zuidema1", "gender": "Unspecified;M;M", "homepage": "https://samiraabnar.github.io/;http://mostafadehghani.com/;https://staff.fnwi.uva.nl/w.zuidema/", "dblp": "150/5405;125/4062;67/1016", "google_scholar": "https://scholar.google.nl/citations?user=jbxwjgMAAAAJ;https://scholar.google.nl/citations?user=MiHOX3QAAAAJ;MBkG_FYAAAAJ", "orcid": ";;0000-0002-2362-5447", "linkedin": ";;", "or_profile": "~Samira_Abnar1;~Mostafa_Dehghani1;~Willem_Zuidema1", "aff": "University of Amsterdam;Google DeepMind;University of Amsterdam", "aff_domain": "uva.nl;google.com;uva.nl", "position": "PhD student;Research Scientist;Associate Professor", "bibtex": "@misc{\nabnar2021transferring,\ntitle={Transferring Inductive Biases through Knowledge Distillation},\nauthor={Samira Abnar and Mostafa Dehghani and Willem H. Zuidema},\nyear={2021},\nurl={https://openreview.net/forum?id=5UY7aZ_h37}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=5UY7aZ_h37", "pdf_size": 0, "rating": "3;5;5;7", "confidence": "5;4;3;4", "wc_review": "614;548;243;1031", "wc_reply_reviewers": "470;0;0;939", "wc_reply_authors": "4141;1224;612;3423", "reply_reviewers": "7;0;0;5", "reply_authors": "11;2;1;5", "rating_avg": [ 5.0, 1.4142135623730951 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 609.0, 280.9741981036693 ], "wc_reply_reviewers_avg": [ 352.25, 389.32658206189825 ], "wc_reply_authors_avg": [ 2350.0, 1470.3341456961407 ], "reply_reviewers_avg": [ 3.0, 3.082207001484488 ], "reply_authors_avg": [ 4.75, 3.897114317029974 ], "replies_avg": [ 38, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 74, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9909360969004511118&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Amsterdam;Google", "aff_unique_dep": ";Google DeepMind", "aff_unique_url": "https://www.uva.nl;https://deepmind.com", "aff_unique_abbr": "UvA;DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "Netherlands;United Kingdom" }, { "id": "5WcLI0e3cAY", "title": "K-PLUG: KNOWLEDGE-INJECTED PRE-TRAINED LANGUAGE MODEL FOR NATURAL LANGUAGE UNDERSTANDING AND GENERATION", "track": "main", "status": "Reject", "tldr": "", "abstract": "Existing pre-trained language models (PLMs) have demonstrated the effectiveness of self-supervised learning for a broad range of natural language processing (NLP) tasks. However, most of them are not explicitly aware of domain-specific knowledge, which is essential for downstream tasks in many domains, such as tasks in e-commerce scenarios. In this paper,\u00a0we propose K-PLUG, a knowledge-injected pre-trained language model based on the encoder-decoder transformer that can be transferred to both natural language understanding and generation tasks. We verify our method in a diverse range of e-commerce scenarios that require domain-specific knowledge. Specifically, we propose five knowledge-aware self-supervised pre-training objectives to formulate the learning of domain-specific knowledge, including e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions of product entities. K-PLUG achieves new state-of-the-art results on a suite of domain-specific NLP tasks, including product knowledge base completion, abstractive product summarization, and multi-turn dialogue, significantly outperforms baselines across the board, which demonstrates that the proposed method effectively learns a diverse set of domain-specific knowledge for both language understanding and generation tasks. The code, data, and models will be publicly available.\n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Song Xu;Haoran Li;Peng Yuan;Yujia Wang;Youzheng Wu;Xiaodong He;Ying Liu;Bowen Zhou", "authorids": "xusong28@jd.com;~Haoran_Li1;yuanpeng29@jd.com;wangyujia15@jd.com;~Youzheng_Wu2;~Xiaodong_He1;liu.ying@ruc.edu.cn;~Bowen_Zhou1", "gender": ";M;;;M;M;;", "homepage": ";;;;;;;", "dblp": ";50/10038-1;;;01/3620;03/3923-1;;", "google_scholar": ";JFg9QC4AAAAJ;;;fWrjVnQAAAAJ;W5WbqgoAAAAJ;;", "orcid": ";;;;;;;", "linkedin": ";;;;;;;", "or_profile": "xusong28@jd.com;~Haoran_Li1;yuanpeng29@jd.com;wangyujia15@jd.com;~Youzheng_Wu2;~Xiaodong_He1;liu.ying@ruc.edu.cn;~Bowen_Zhou1", "aff": ";JD AI Research;;;JD AI Research;JD AI Research;;", "aff_domain": ";jd.com;;;jd.com;jd.com;;", "position": ";Researcher;;;Full Professor;Director;;", "bibtex": "@misc{\nxu2021kplug,\ntitle={K-{\\{}PLUG{\\}}: {\\{}KNOWLEDGE{\\}}-{\\{}INJECTED{\\}} {\\{}PRE{\\}}-{\\{}TRAINED{\\}} {\\{}LANGUAGE{\\}} {\\{}MODEL{\\}} {\\{}FOR{\\}} {\\{}NATURAL{\\}} {\\{}LANGUAGE{\\}} {\\{}UNDERSTANDING{\\}} {\\{}AND{\\}} {\\{}GENERATION{\\}}},\nauthor={Song Xu and Haoran Li and Peng Yuan and Yujia Wang and Youzheng Wu and Xiaodong He and Ying Liu and Bowen Zhou},\nyear={2021},\nurl={https://openreview.net/forum?id=5WcLI0e3cAY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=5WcLI0e3cAY", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "3;4;4;4", "wc_review": "436;314;327;427", "wc_reply_reviewers": "0;0;0;220", "wc_reply_authors": "641;255;403;751", "reply_reviewers": "0;0;0;2", "reply_authors": "1;1;1;2", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 376.0, 55.780821076782296 ], "wc_reply_reviewers_avg": [ 55.0, 95.26279441628824 ], "wc_reply_authors_avg": [ 512.5, 194.7376440239534 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 23, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16031537424486199023&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "JD", "aff_unique_dep": "JD AI Research", "aff_unique_url": "https://www.jd.com", "aff_unique_abbr": "JD AI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "title": "Generalized Multimodal ELBO", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2632", "id": "5Y21V0RDBV", "poster": "", "openreview": "https://openreview.net/forum?id=5Y21V0RDBV", "slides": "https://iclr.cc/virtual/2021/poster/2632", "video": "https://iclr.cc/virtual/2021/poster/2632", "author_site": "Thomas Sutter, Imant Daunhawer, Julia E Vogt", "tldr": "", "abstract": "Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.", "keywords": "Multimodal;VAE;ELBO;self-supervised;generative learning", "primary_area": "", "supplementary_material": "/attachment/15cf4838bf8bb88476c3f31216c2a5ef6d0a6319.zip", "author": "Thomas M. Sutter;Imant Daunhawer;Julia E Vogt", "authorids": "~Thomas_Marco_Sutter1;~Imant_Daunhawer2;~Julia_E_Vogt1", "gender": ";;F", "homepage": "https://mds.inf.ethz.ch/;https://mds.inf.ethz.ch/team/detail/imant-daunhawer/;http://mds.inf.ethz.ch", "dblp": "259/0609;259/0541;13/8412", "google_scholar": "eySN1UkAAAAJ;;UoeV-8kAAAAJ", "orcid": ";;", "linkedin": ";;julia-vogt-50b53895", "or_profile": "~Thomas_Marco_Sutter1;~Imant_Daunhawer2;~Julia_E_Vogt1", "aff": "ETH Zurich;Swiss Federal Institute of Technology;Swiss Federal Institute of Technology", "aff_domain": "ethz.ch;ethz.ch;ethz.ch", "position": "PhD student;PhD student;Assistant Professor", "bibtex": "@inproceedings{\nsutter2021generalized,\ntitle={Generalized Multimodal {ELBO}},\nauthor={Thomas M. Sutter and Imant Daunhawer and Julia E Vogt},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=5Y21V0RDBV}\n}", "github": "[![github](/images/github_icon.svg) thomassutter/MoPoE](https://github.com/thomassutter/MoPoE)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "3;4;4;4", "wc_review": "331;439;199;853", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 455.5, 244.72995321374128 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 115, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17699698224745360599&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=5Y21V0RDBV", "email": "ethz.ch;ethz.ch;ethz.ch", "author_num": 3, "aff_unique_index": "0;1;1", "aff_unique_norm": "ETH Zurich;Swiss Federal Institute of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.ethz.ch;https://www.ethz.ch", "aff_unique_abbr": "ETHZ;ETH Zurich", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Switzerland" }, { "id": "5ZFeGYBBPgs", "title": "Second-Moment Loss: A Novel Regression Objective for Improved Uncertainties", "track": "main", "status": "Reject", "tldr": "", "abstract": "Quantification of uncertainty is one of the most promising approaches to establish safe machine learning. Despite its importance, it is far from being generally solved, especially for neural networks. One of the most commonly used approaches so far is Monte Carlo dropout, which is computationally cheap and easy to apply in practice. However, it can underestimate the uncertainty. We propose a new objective, referred to as second-moment loss (SML), to address this issue. While the full network is encouraged to model the mean, the dropout networks are explicitly used to optimize the model variance. We analyze the performance of the new objective on various toy and UCI regression datasets. Comparing to the state-of-the-art of deep ensembles, SML leads to comparable prediction accuracies and uncertainty estimates while only requiring a single model. Under distribution shift, we observe moderate improvements. From a safety perspective also the study of worst-case uncertainties is crucial. In this regard we improve considerably. Finally, we show that SML can be successfully applied to SqueezeDet, a modern object detection network. We improve on its uncertainty-related scores while not deteriorating regression quality. As a side result, we introduce an intuitive Wasserstein distance-based uncertainty measure that is non-saturating and thus allows to resolve quality differences between any two uncertainty estimates.", "keywords": "regression;uncertainty quantification;uncertainty evaluation;dropout", "primary_area": "", "supplementary_material": "/attachment/82266707a82bfb31b92cf206a01d9c1925c0d004.zip", "author": "Joachim Sicking;Maram Akila;Maximilian Alexander Pintz;Tim Wirtz;Asja Fischer;Stefan Wrobel", "authorids": "~Joachim_Sicking1;maram.akila@iais.fraunhofer.de;maximilian.alexander.pintz@iais.fraunhofer.de;~Tim_Wirtz1;~Asja_Fischer1;~Stefan_Wrobel1", "gender": ";;;M;F;", "homepage": ";;;;;", "dblp": ";;;;76/8485;w/StefanWrobel", "google_scholar": ";;;;FyZbyIUAAAAJ;https://scholar.google.com/citations?hl=de", "orcid": ";;;;0000-0002-1916-7033;", "linkedin": ";;;;;", "or_profile": "~Joachim_Sicking1;maram.akila@iais.fraunhofer.de;maximilian.alexander.pintz@iais.fraunhofer.de;~Tim_Wirtz1;~Asja_Fischer1;~Stefan_Wrobel1", "aff": ";;;;Ruhr-Universit\u00e4t Bochum;Rheinische Friedrich-Wilhelms-Universit\u00e4t Bonn, Rheinische Friedrich-Wilhelms Universit\u00e4t Bonn", "aff_domain": ";;;;ruhr-uni-bochum.de;cs.uni-bonn.de", "position": ";;;;Full Professor;Full Professor", "bibtex": "@misc{\nsicking2021secondmoment,\ntitle={Second-Moment Loss: A Novel Regression Objective for Improved Uncertainties},\nauthor={Joachim Sicking and Maram Akila and Maximilian Alexander Pintz and Tim Wirtz and Asja Fischer and Stefan Wrobel},\nyear={2021},\nurl={https://openreview.net/forum?id=5ZFeGYBBPgs}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=5ZFeGYBBPgs", "pdf_size": 0, "rating": "4;5;6", "confidence": "5;3;3", "wc_review": "279;312;471", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "961;706;875", "reply_reviewers": "0;0;0", "reply_authors": "2;1;2", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 354.0, 83.8212383587835 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 847.3333333333334, 105.92555037488462 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.8660254037844387, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8421505000164312098&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Ruhr-Universit\u00e4t Bochum;Rheinische Friedrich-Wilhelms-Universit\u00e4t Bonn", "aff_unique_dep": ";", "aff_unique_url": "https://www.ruhr-uni-bochum.de;https://www.uni-bonn.de", "aff_unique_abbr": "RUB;Uni Bonn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "5Zl7WYi7Ndj", "title": "A Unified Framework to Analyze and Design the Nonlocal Blocks for Neural Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. Although having shown excellent performances, they lack the mechanism to encode the rich, structured information among elements in an image. \nIn this paper, to theoretically analyze the property of these nonlocal-based blocks, we provide a unified framework to interpret them, where we view them as a graph filter generated on a fully-connected graph. When choosing Chebyshev graph filter, a generalized formulation can be derived for explaining the existing nonlocal-based blocks (e.g. nonlocal block, nonlocal stage, double attention block) and uses to analyze their irrationality. Furthermore, by removing the irrationality, we propose an efficient and robust Chebyshev spectral nonlocal block, which can be more flexibly inserted into deep neural networks than the existing nonlocal blocks. Experimental results demonstrate the clear-cut improvements and practical applicabilities of the proposed spectral nonlocal blocks on image classification (Cifar-10/100, ImageNet), fine-grained image classification (CUB-200), action recognition (UCF-101) tasks.", "keywords": "Nonlocal Block;Image Classification;Action Recognition;Graph Neural Network", "primary_area": "", "supplementary_material": "/attachment/c099c422d9f16b4714c5f89469fb90ab572badfe.zip", "author": "Lei Zhu;Qi She;Changhu Wang", "authorids": "~Lei_Zhu10;~Qi_She1;~Changhu_Wang3", "gender": "M;M;M", "homepage": ";http://sheqi.mystrikingly.com/;https://changhu.wang", "dblp": "99/549-12;171/7773;30/3393", "google_scholar": "https://scholar.google.com/citations?hl=zh-CN;iHoGTt4AAAAJ;DsVZkjAAAAAJ", "orcid": ";0000-0002-4490-2941;", "linkedin": ";;", "or_profile": "~Lei_Zhu10;~Qi_She1;~Changhu_Wang1", "aff": "Peking University;Bytedance AI Lab;ByteDance Inc.", "aff_domain": "pku.edu.cn;bytedance.com;bytedance.com", "position": "PhD student;Research Scientist;Director", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=5Zl7WYi7Ndj", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "4;5;4;4", "wc_review": "464;222;620;468", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 443.5, 142.50877165985258 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:60lYnBmF1xgJ:scholar.google.com/&scioq=A+Unified+Framework+to+Analyze+and+Design+the+Nonlocal+Blocks+for+Neural+Networks&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;1", "aff_unique_norm": "Peking University;ByteDance", "aff_unique_dep": ";AI Lab", "aff_unique_url": "http://www.pku.edu.cn;https://www.bytedance.com", "aff_unique_abbr": "Peking U;Bytedance AI Lab", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "5aDnCA_RXS", "title": "Neural Networks Preserve Invertibility Across Iterations: A Possible Source of Implicit Data Augmentation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Determining what kind of representations neural networks learn, and how this may relate to generalization, remains a challenging problem. Previous work has utilized a rich set of methods to invert layer representations of neural networks, i.e. given some reference activation $\\Phi_0$ and a layer function $r_{\\ell}$, find $x$ which minimizes $||\\Phi_0 - r_{\\ell}(x)||^2$ . We show that neural networks can preserve invertibility across several iterations. That is, it is possible to interpret activations produced in some later iteration in the context of the layer function of the current iteration. For convolutional and fully connected networks, the lower layers maintain such a consistent representation for several iterations, while in the higher layers invertibility holds for fewer iterations. Adding skip connections such as those found in Resnet allows even higher layers to preserve invertibility across several iterations. We believe the fact that higher layers may interpret weight changes made by lower layers as changes to the data may produce implicit data augmentation. This implicit data augmentation may eventually yield some insight into why neural networks can generalize even with so many parameters.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Arushi Gupta", "authorids": "~Arushi_Gupta1", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~Arushi_Gupta1", "aff": "Department of Computer Science, Princeton University", "aff_domain": "cs.princeton.edu", "position": "PhD student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=5aDnCA_RXS", "pdf_size": 0, "rating": "2;4;4;5", "confidence": "4;4;4;3", "wc_review": "535;477;574;307", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 473.25, 102.00091911350603 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.6622661785325219, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Tzj3uOoH3eEJ:scholar.google.com/&scioq=Neural+Networks+Preserve+Invertibility+Across+Iterations:+A+Possible+Source+of+Implicit+Data+Augmentation&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Princeton University", "aff_unique_dep": "Department of Computer Science", "aff_unique_url": "https://www.princeton.edu", "aff_unique_abbr": "Princeton", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "5fJ0qcwBNr0", "title": "A Gradient-based Kernel Approach for Efficient Network Architecture Search", "track": "main", "status": "Reject", "tldr": "", "abstract": "It is widely accepted that vanishing and exploding gradient values are the main reason behind the difficulty of deep network training.\nIn this work, we take a further step to understand the optimization of deep networks and find that both gradient correlations and gradient values have strong impacts on model training. \nInspired by our new finding, we explore a simple yet effective network architecture search (NAS) approach that leverages gradient correlation and gradient values to find well-performing architectures. To be specific, we first formulate these two terms into a unified gradient-based kernel and then select architectures with the largest kernels at initialization as the final networks. \nThe new approach replaces the expensive ``train-then-test'' evaluation paradigm with a new lightweight function according to the gradient-based kernel at initialization.\nExperiments show that our approach achieves competitive results with orders of magnitude faster than ``train-then-test'' paradigms on image classification tasks. Furthermore, the extremely low search cost enables its wide applications. It also obtains performance improvements on two text classification tasks.", "keywords": "NAS", "primary_area": "", "supplementary_material": "/attachment/f358d2658148d5b0bdd50d7ca15b8f3053f3f17e.zip", "author": "Jingjing Xu;Liang Zhao;Junyang Lin;Xu Sun;Hongxia Yang", "authorids": "~Jingjing_Xu1;~Liang_Zhao8;junyang.ljy@alibaba-inc.com;~Xu_Sun1;~Hongxia_Yang2", "gender": "F;M;;M;F", "homepage": ";https://zhao1iang.github.io/;;https://xusun.org/;https://www4.comp.polyu.edu.hk/~hongxyang/", "dblp": "25/624;63/5422-3.html;;37/1971-1;", "google_scholar": ";https://scholar.google.com/citations?hl=zh-CN;;https://scholar.google.com/citations?hl=en;iJlC5mMAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Jingjing_Xu1;~Liang_Zhao8;junyang.ljy@alibaba-inc.com;~Xu_Sun1;~Hongxia_Yang2", "aff": ";Peking University;;Peking University;Alibaba Group", "aff_domain": ";pku.edu.cn;;pku.edu.cn;alibaba-inc.com", "position": ";MS student;;Associate Professor;Principal Researcher", "bibtex": "@misc{\nxu2021a,\ntitle={A Gradient-based Kernel Approach for Efficient Network Architecture Search},\nauthor={Jingjing Xu and Liang Zhao and Junyang Lin and Xu Sun and Hongxia Yang},\nyear={2021},\nurl={https://openreview.net/forum?id=5fJ0qcwBNr0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=5fJ0qcwBNr0", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "4;4;5;5", "wc_review": "474;454;385;464", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 444.25, 34.93118234471888 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:z_4WKFEM9z4J:scholar.google.com/&scioq=A+Gradient-based+Kernel+Approach+for+Efficient+Network+Architecture+Search&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "Peking University;Alibaba Group", "aff_unique_dep": ";", "aff_unique_url": "http://www.pku.edu.cn;https://www.alibaba.com", "aff_unique_abbr": "Peking U;Alibaba", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "5g5x0eVdRg", "title": "DHOG: Deep Hierarchical Object Grouping", "track": "main", "status": "Reject", "tldr": "", "abstract": "Unsupervised learning of categorical representations using data augmentations appears to be a promising approach and has proven useful for finding suitable representations for downstream tasks. However current state-of-the-art methods require preprocessing (e.g. Sobel edge detection) to work. We introduce a mutual information minimization strategy for unsupervised learning from augmentations, that prevents learning from locking on to easy to find, yet unimportant, representations at the expense of more informative ones requiring more complex processing. We demonstrate specifically that this process learns representations which capture higher mutual information between augmentations, and demonstrate that these representations are better suited to the downstream exemplar task of clustering. We obtain substantial accuracy improvements on CIFAR-10, CIFAR-100-20, and SVHN.", "keywords": "Unsupervised learning;Deep neural networks;clustering", "primary_area": "", "supplementary_material": "", "author": "Luke Nicholas Darlow;Amos Storkey", "authorids": "~Luke_Nicholas_Darlow1;~Amos_Storkey1", "gender": ";Not Specified", "homepage": ";http://homepages.inf.ed.ac.uk/amos/", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Luke_Nicholas_Darlow1;~Amos_Storkey1", "aff": ";University of Edinburgh", "aff_domain": ";ed.ac.uk", "position": ";Full Professor", "bibtex": "@misc{\ndarlow2021dhog,\ntitle={{\\{}DHOG{\\}}: Deep Hierarchical Object Grouping},\nauthor={Luke Nicholas Darlow and Amos Storkey},\nyear={2021},\nurl={https://openreview.net/forum?id=5g5x0eVdRg}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=5g5x0eVdRg", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "4;2;4;4", "wc_review": "367;413;226;318", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "698;247;529;570", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 331.0, 69.30728677419135 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 511.0, 164.6739202181086 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1984477709004999415&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "University of Edinburgh", "aff_unique_dep": "", "aff_unique_url": "https://www.ed.ac.uk", "aff_unique_abbr": "Edinburgh", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "5i4vRgoZauw", "title": "Wiring Up Vision: Minimizing Supervised Synaptic Updates Needed to Produce a Primate Ventral Stream", "track": "main", "status": "Reject", "tldr": "", "abstract": "After training on large datasets, certain deep neural networks are surprisingly good models of the neural mechanisms of adult primate visual object recognition. Nevertheless, these models are poor models of the development of the visual system because they posit millions of sequential, precisely coordinated synaptic updates, each based on a labeled image. While ongoing research is pursuing the use of unsupervised proxies for labels, we here explore a complementary strategy of reducing the required number of supervised synaptic updates to produce an adult-like ventral visual stream (as judged by the match to V1, V2, V4, IT, and behavior). Such models might require less precise machinery and energy expenditure to coordinate these updates and would thus move us closer to viable neuroscientific hypotheses about how the visual system wires itself up. Relative to the current leading model of the adult ventral stream, we here demonstrate that the total number of supervised weight updates can be substantially reduced using three complementary strategies: First, we find that only 2% of supervised updates (epochs and images) are needed to achieve ~80% of the match to adult ventral stream. Second, by improving the random distribution of synaptic connectivity, we find that 54% of the brain match can already be achieved \u201cat birth\" (i.e. no training at all). Third, we find that, by training only ~5% of model synapses, we can still achieve nearly 80% of the match to the ventral stream. When these three strategies are applied in combination, we find that these new models achieve ~80% of a fully trained model's match to the brain, while using two orders of magnitude fewer supervised synaptic updates. These results reflect first steps in modeling not just primate adult visual processing during inference, but also how the ventral visual stream might be \"wired up\" by evolution (a model's \"birth\" state) and by developmental learning (a model's updates based on visual experience).", "keywords": "computational neuroscience;primate ventral stream;convolutional neural networks;biologically plausible learning", "primary_area": "", "supplementary_material": "", "author": "Franziska Geiger;Martin Schrimpf;Tiago Marques;James J. DiCarlo", "authorids": "franzigeiger94@googlemail.com;~Martin_Schrimpf1;tmarques@mit.edu;~James_J._DiCarlo1", "gender": ";;;M", "homepage": ";http://mschrimpf.com/;;http://dicarlolab.mit.edu", "dblp": ";190/7063;;80/7658", "google_scholar": ";RiZ-RdwAAAAJ;;", "orcid": ";0000-0001-7766-7223;;0000-0002-1592-5896", "linkedin": ";mschrimpf/;;james-j-dicarlo/", "or_profile": "franzigeiger94@googlemail.com;~Martin_Schrimpf1;tmarques@mit.edu;~James_J._DiCarlo1", "aff": ";Massachusetts Institute of Technology;;Massachusetts Institute of Technology", "aff_domain": ";mit.edu;;mit.edu", "position": ";PhD student;;Full Professor", "bibtex": "@misc{\ngeiger2021wiring,\ntitle={Wiring Up Vision: Minimizing Supervised Synaptic Updates Needed to Produce a Primate Ventral Stream},\nauthor={Franziska Geiger and Martin Schrimpf and Tiago Marques and James J. DiCarlo},\nyear={2021},\nurl={https://openreview.net/forum?id=5i4vRgoZauw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=5i4vRgoZauw", "pdf_size": 0, "rating": "3;6;6;8", "confidence": "4;4;5;3", "wc_review": "546;758;868;574", "wc_reply_reviewers": "0;0;500;0", "wc_reply_authors": "1141;988;1149;491", "reply_reviewers": "0;0;1;0", "reply_authors": "2;2;2;1", "rating_avg": [ 5.75, 1.7853571071357126 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 686.5, 132.71303628506132 ], "wc_reply_reviewers_avg": [ 125.0, 216.50635094610965 ], "wc_reply_authors_avg": [ 942.25, 268.3126674236608 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.39605901719066966, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8208290480111546994&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;0", "aff_unique_norm": "Massachusetts Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://web.mit.edu", "aff_unique_abbr": "MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2529", "id": "5jRVa89sZk", "poster": "", "openreview": "https://openreview.net/forum?id=5jRVa89sZk", "slides": "https://iclr.cc/virtual/2021/poster/2529", "video": "https://iclr.cc/virtual/2021/poster/2529", "author_site": "Yangming Li, lemao liu, Shuming Shi", "tldr": "", "abstract": "In many scenarios, named entity recognition (NER) models severely suffer from unlabeled entity problem, where the entities of a sentence may not be fully annotated. Through empirical studies performed on synthetic datasets, we find two causes of performance degradation. One is the reduction of annotated entities and the other is treating unlabeled entities as negative instances. The first cause has less impact than the second one and can be mitigated by adopting pretraining language models. The second cause seriously misguides a model in training and greatly affects its performances. Based on the above observations, we propose a general approach, which can almost eliminate the misguidance brought by unlabeled entities. The key idea is to use negative sampling that, to a large extent, avoids training NER models with unlabeled entities. Experiments on synthetic datasets and real-world datasets show that our model is robust to unlabeled entity problem and surpasses prior baselines. On well-annotated datasets, our model is competitive with the state-of-the-art method.", "keywords": "Named Entity Recognition;Unlabeled Entity Problem;Negative Sampling", "primary_area": "", "supplementary_material": "", "author": "Yangming Li;lemao liu;Shuming Shi", "authorids": "~Yangming_Li1;~lemao_liu1;~Shuming_Shi1", "gender": ";M;M", "homepage": ";https://lemaoliu.github.io/homepage/;", "dblp": ";41/10887.html;s/ShumingShi", "google_scholar": ";;Lg31AKMAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Yangming_Li1;~lemao_liu1;~Shuming_Shi1", "aff": ";Tencent;Tencent AI Lab", "aff_domain": ";tencent.com;tencent.com", "position": ";Researcher;Principal Researcher", "bibtex": "@inproceedings{\nli2021empirical,\ntitle={Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition},\nauthor={Yangming Li and lemao liu and Shuming Shi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=5jRVa89sZk}\n}", "github": "[![github](/images/github_icon.svg) LeePleased/NegSampling-NER](https://github.com/LeePleased/NegSampling-NER)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "4;4;4;4", "wc_review": "255;498;108;213", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "779;846;189;497", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 268.5, 142.9099366734168 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 577.75, 259.8589761774644 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 82, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2091894969577971912&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=5jRVa89sZk", "email": ";tencent.com;tencent.com", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Tencent", "aff_unique_dep": "Tencent Holdings Limited", "aff_unique_url": "https://www.tencent.com", "aff_unique_abbr": "Tencent", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "title": "Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2832", "id": "5jzlpHvvRk", "poster": "", "openreview": "https://openreview.net/forum?id=5jzlpHvvRk", "slides": "https://iclr.cc/virtual/2021/poster/2832", "video": "https://iclr.cc/virtual/2021/poster/2832", "author_site": "Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li", "tldr": "", "abstract": "Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models. For object detection, the well-established classification and regression loss functions have been carefully designed by considering diverse learning challenges (e.g. class imbalance, hard negative samples, and scale variances). Inspired by the recent progress in network architecture search, it is interesting to explore the possibility of discovering new loss function formulations via directly searching the primitive operation combinations. So that the learned losses not only fit for diverse object detection challenges to alleviate huge human efforts, but also have better alignment with evaluation metric and good mathematical convergence property. Beyond the previous auto-loss works on face recognition and image classification, our work makes the first attempt to discover new loss functions for the challenging object detection from primitive operation levels and finds the searched losses are insightful. We propose an effective convergence-simulation driven evolutionary search algorithm, called CSE-Autoloss, for speeding up the search progress by regularizing the mathematical rationality of loss candidates via two progressive convergence simulation modules: convergence property verification and model optimization simulation. CSE-Autoloss involves the search space (i.e. 21 mathematical operators, 3 constant-type inputs, and 3 variable-type inputs) that cover a wide range of the possible variants of existing losses and discovers best-searched loss function combination within a short time (around 1.5 wall-clock days with 20x speedup in comparison to the vanilla evolutionary algorithm). We conduct extensive evaluations of loss function search on popular detectors and validate the good generalization capability of searched losses across diverse architectures and various datasets. Our experiments show that the best-discovered loss function combinations outperform default combinations (Cross-entropy/Focal loss for classification and L1 loss for regression) by 1.1% and 0.8% in terms of mAP for two-stage and one-stage detectors on COCO respectively. Our searched losses are available at https://github.com/PerdonLiu/CSE-Autoloss.", "keywords": "Object detection;AutoML;Evolutionary algorithm;Loss function search", "primary_area": "", "supplementary_material": "", "author": "Peidong Liu;Gengwei Zhang;Bochao Wang;Hang Xu;Xiaodan Liang;Yong Jiang;Zhenguo Li", "authorids": "~Peidong_Liu2;~Gengwei_Zhang1;~Bochao_Wang2;~Hang_Xu1;~Xiaodan_Liang2;~Yong_Jiang3;~Zhenguo_Li1", "gender": "M;M;M;F;M;M;M", "homepage": "https://perdonliu.github.io/;https://gengdavid.github.io/;;https://www.sysu-hcp.net/;;http://www.ee.columbia.edu/~zgli/;https://github.com/sergeywong/", "dblp": ";226/6522;;;74/1552-1.html;23/6479;", "google_scholar": "pNBIQ8wAAAAJ;YcikIekAAAAJ;https://scholar.google.com.hk/citations?user=J_8TX6sAAAAJ;voxznZAAAAAJ;;XboZC1AAAAAJ;", "orcid": ";0000-0003-1823-502X;0000-0003-3645-8972;;;;", "linkedin": ";;;;;;", "or_profile": "~Peidong_Liu2;~Gengwei_Zhang1;~Hang_Xu1;~Xiaodan_Liang2;~Yong_Jiang3;~Zhenguo_Li1;~Sergey_Wong1", "aff": "Tsinghua University;University of Technology Sydney;Huawei Noah\u2018s Ark Lab;SUN YAT-SEN UNIVERSITY;Tsinghua University;Huawei Noah's Ark Lab;Huawei Noah\u2018s Ark Lab", "aff_domain": "tsinghua.edu.cn;student.uts.edu.au;huawei.com;sysu.edu.cn;tsinghua.edu.cn;huawei.com;huawei.com", "position": "MS student;PhD student;Researcher;Associate Professor;Full Professor;Principal Researcher;Researcher", "bibtex": "@inproceedings{\nliu2021loss,\ntitle={Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search},\nauthor={Peidong Liu and Gengwei Zhang and Bochao Wang and Hang Xu and Xiaodan Liang and Yong Jiang and Zhenguo Li},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=5jzlpHvvRk}\n}", "github": "[![github](/images/github_icon.svg) PerdonLiu/CSE-Autoloss](https://github.com/PerdonLiu/CSE-Autoloss)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;4;4;5", "wc_review": "223;269;278;434", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 301.0, 79.57072325924906 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 40, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14213953429648652822&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=5jzlpHvvRk", "email": "tsinghua.edu.cn;student.uts.edu.au;huawei.com;sysu.edu.cn;tsinghua.edu.cn;huawei.com;huawei.com", "author_num": 7, "aff_unique_index": "0;1;2;3;0;2;2", "aff_unique_norm": "Tsinghua University;University of Technology Sydney;Huawei;Sun Yat-sen University", "aff_unique_dep": ";;Noah's Ark Lab;", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.uts.edu.au;https://www.huawei.com;http://www.sysu.edu.cn", "aff_unique_abbr": "THU;UTS;Huawei;SYSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0;0;0;0", "aff_country_unique": "China;Australia" }, { "title": "Autoregressive Entity Retrieval", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2642", "id": "5k8F6UU39V", "poster": "", "openreview": "https://openreview.net/forum?id=5k8F6UU39V", "slides": "https://iclr.cc/virtual/2021/poster/2642", "video": "https://iclr.cc/virtual/2021/poster/2642", "author_site": "Nicola De Cao, Gautier Izacard, Sebastian Riedel, Fabio Petroni", "tldr": "", "abstract": "Entities are at the center of how we represent and aggregate knowledge. For instance, Encyclopedias such as Wikipedia are structured by entities (e.g., one per Wikipedia article). The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering. One way to understand current approaches is as classifiers among atomic labels, one for each entity. Their weight vectors are dense entity representations produced by encoding entity meta information such as their descriptions. This approach leads to several shortcomings: (i) context and entity affinity is mainly captured through a vector dot product, potentially missing fine-grained interactions between the two; (ii) a large memory footprint is needed to store dense representations when considering large entity sets; (iii) an appropriately hard set of negative data has to be subsampled at training time. In this work, we propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion and conditioned on the context. This enables us to mitigate the aforementioned technical issues since: (i) the autoregressive formulation allows us to directly capture relations between context and entity name, effectively cross encoding both; (ii) the memory footprint is greatly reduced because the parameters of our encoder-decoder architecture scale with vocabulary size, not entity count; (iii) the exact softmax loss can be efficiently computed without the need to subsample negative data. We show the efficacy of the approach, experimenting with more than 20 datasets on entity disambiguation, end-to-end entity linking and document retrieval tasks, achieving new state-of-the-art or very competitive results while using a tiny fraction of the memory footprint of competing systems. Finally, we demonstrate that new entities can be added by simply specifying their unambiguous name. Code and pre-trained models at https://github.com/facebookresearch/GENRE.", "keywords": "entity retrieval;document retrieval;autoregressive language model;entity linking;end-to-end entity linking;entity disambiguation;constrained beam search", "primary_area": "", "supplementary_material": "", "author": "Nicola De Cao;Gautier Izacard;Sebastian Riedel;Fabio Petroni", "authorids": "~Nicola_De_Cao1;~Gautier_Izacard1;~Sebastian_Riedel1;~Fabio_Petroni2", "gender": "M;Unspecified;M;M", "homepage": "https://nicola-decao.github.io;;https://www.riedelcastro.org/;http://www.fabiopetroni.com/", "dblp": "218/6626;222/3621;18/3348-1.html;118/5349", "google_scholar": "CqTR3sIAAAAJ;https://scholar.google.com/citations?view_op=list_works;https://scholar.google.com.tw/citations?user=AcCtcrsAAAAJ;https://scholar.google.it/citations?user=vxQc2L4AAAAJ", "orcid": ";;;", "linkedin": "nicoladecao;;;petronifabio/", "or_profile": "~Nicola_De_Cao1;~Gautier_Izacard1;~Sebastian_Riedel1;~Fabio_Petroni2", "aff": "Meta Facebook;Meta Facebook;Meta Facebook;", "aff_domain": "fb.com;fb.com;fb.com;", "position": "Intern;PhD student;Researcher;", "bibtex": "@inproceedings{\ncao2021autoregressive,\ntitle={Autoregressive Entity Retrieval},\nauthor={Nicola De Cao and Gautier Izacard and Sebastian Riedel and Fabio Petroni},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=5k8F6UU39V}\n}", "github": "[![github](/images/github_icon.svg) facebookresearch/GENRE](https://github.com/facebookresearch/GENRE) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=5k8F6UU39V)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "7;8;8;8", "confidence": "5;4;4;4", "wc_review": "291;308;517;181", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "492;330;248;463", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.75, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 324.25, 121.49356978869294 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 383.25, 99.13973723991808 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 596, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12682955665631142454&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=5k8F6UU39V", "email": "fb.com;fb.com;fb.com;", "author_num": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta Platforms, Inc.", "aff_unique_url": "https://meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Spatially Structured Recurrent Modules", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3333", "id": "5l9zj5G7vDY", "poster": "", "openreview": "https://openreview.net/forum?id=5l9zj5G7vDY", "slides": "https://iclr.cc/virtual/2021/poster/3333", "video": "https://iclr.cc/virtual/2021/poster/3333", "author_site": "Nasim Rahaman, Anirudh Goyal, Muhammad Waleed Gondal, Manuel Wuthrich, Stefan Bauer, Yash Sharma, Yoshua Bengio, Bernhard Schoelkopf", "tldr": "", "abstract": "Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalise well and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. To this end, we model the dynamical system as a collection of autonomous but sparsely interacting sub-systems that interact according to a learned topology which is informed by the spatial structure of the underlying system. This gives rise to a class of models that are well suited for capturing the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modelling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalisation to novel tasks without additional training than strong baselines that perform equally well or better on the training distribution. ", "keywords": "spatio-temporal modelling;modular architectures;recurrent neural networks;partially observed environments", "primary_area": "", "supplementary_material": "/attachment/2714bd119bf11d6a4ff4b4c4981257e90973dd60.zip", "author": "Nasim Rahaman;Anirudh Goyal;Muhammad Waleed Gondal;Manuel Wuthrich;Stefan Bauer;Yash Sharma;Yoshua Bengio;Bernhard Sch\u00f6lkopf", "authorids": "~Nasim_Rahaman1;~Anirudh_Goyal1;~Muhammad_Waleed_Gondal1;~Manuel_Wuthrich1;~Stefan_Bauer1;~Yash_Sharma1;~Yoshua_Bengio1;~Bernhard_Sch\u00f6lkopf1", "gender": "M;M;M;M;;;M;", "homepage": ";https://anirudh9119.github.io/;https://www.is.mpg.de/person/wgondal;;https://cifar.ca/bios/stefan-bauer/;http://www.yash-sharma.com;http://yoshuabengio.org;", "dblp": "222/3165;172/1039;;https://dblp.uni-trier.de/pers/hd/w/W=uuml=thrich:Manuel;;121/9967-1;56/953;", "google_scholar": "https://scholar.google.de/citations?user=iH9DuY0AAAAJ;krrh6OUAAAAJ;https://scholar.google.de/citations?user=KJTsSAQAAAAJ;;O-oICE8AAAAJ;AlGCn8wAAAAJ;kukA0LcAAAAJ;", "orcid": ";;;;;;;", "linkedin": "https://de.linkedin.com/in/nasim-rahaman/de;;;;;yashjsharma/;yoshuabengio/?originalSubdomain=ca;", "or_profile": "~Nasim_Rahaman1;~Anirudh_Goyal1;~Muhammad_Waleed_Gondal1;~Manuel_Wuthrich1;~Stefan_Bauer1;~Yash_Sharma1;~Yoshua_Bengio1;~Bernhard_Sch\u00f6lkopf1", "aff": "Max Planck Institute for Intelligent Systems, Max-Planck Institute;University of Montreal;Max Planck Institute for Intelligent Systems, Max-Planck Institute;Max Planck Institute for Intelligent Systems;Max Planck Institute for Intelligent Systems, Max-Planck Institute;University of Tuebingen;University of Montreal;", "aff_domain": "tuebingen.mpg.de;umontreal.ca;tuebingen.mpg.de;mpg.tuebingen.de;tuebingen.mpg.de;uni-tuebingen.de;umontreal.ca;", "position": "PhD student;PhD student;PhD student;Postdoc;Research Group Leader;PhD student;Full Professor;", "bibtex": "@inproceedings{\nrahaman2021spatially,\ntitle={Spatially Structured Recurrent Modules},\nauthor={Nasim Rahaman and Anirudh Goyal and Muhammad Waleed Gondal and Manuel Wuthrich and Stefan Bauer and Yash Sharma and Yoshua Bengio and Bernhard Sch{\\\"o}lkopf},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=5l9zj5G7vDY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;3;4;4", "wc_review": "1255;751;761;642", "wc_reply_reviewers": "0;48;0;0", "wc_reply_authors": "1994;1224;646;1064", "reply_reviewers": "0;1;0;0", "reply_authors": "4;4;2;3", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 852.25, 237.165949284462 ], "wc_reply_reviewers_avg": [ 12.0, 20.784609690826528 ], "wc_reply_authors_avg": [ 1232.0, 487.9364712746937 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 3.25, 0.82915619758885 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10539564784260285471&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=5l9zj5G7vDY", "email": "tuebingen.mpg.de;umontreal.ca;tuebingen.mpg.de;mpg.tuebingen.de;tuebingen.mpg.de;uni-tuebingen.de;umontreal.ca;", "author_num": 8, "aff_unique_index": "0;1;0;0;0;2;1", "aff_unique_norm": "Max Planck Institute for Intelligent Systems;University of Montreal;University of Tuebingen", "aff_unique_dep": "Intelligent Systems;;", "aff_unique_url": "https://www.mpi-is.mpg.de;https://wwwumontreal.ca;https://www.uni-tuebingen.de/", "aff_unique_abbr": "MPI-IS;UM;Uni T\u00fcbingen", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0;0;0;1", "aff_country_unique": "Germany;Canada" }, { "title": "Enforcing robust control guarantees within neural network policies", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2899", "id": "5lhWG3Hj2By", "poster": "", "openreview": "https://openreview.net/forum?id=5lhWG3Hj2By", "slides": "https://iclr.cc/virtual/2021/poster/2899", "video": "https://iclr.cc/virtual/2021/poster/2899", "author_site": "Priya Donti, Melrose Roderick, Mahyar Fazlyab, Zico Kolter", "tldr": "", "abstract": "When designing controllers for safety-critical systems, practitioners often face a challenging tradeoff between robustness and performance. While robust control methods provide rigorous guarantees on system stability under certain worst-case disturbances, they often yield simple controllers that perform poorly in the average (non-worst) case. In contrast, nonlinear control methods trained using deep learning have achieved state-of-the-art performance on many control tasks, but often lack robustness guarantees. In this paper, we propose a technique that combines the strengths of these two approaches: constructing a generic nonlinear control policy class, parameterized by neural networks, that nonetheless enforces the same provable robustness criteria as robust control. Specifically, our approach entails integrating custom convex-optimization-based projection layers into a neural network-based policy. We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.", "keywords": "robust control;reinforcement learning;differentiable optimization", "primary_area": "", "supplementary_material": "", "author": "Priya L. Donti;Melrose Roderick;Mahyar Fazlyab;J Zico Kolter", "authorids": "~Priya_L._Donti1;mroderick@cmu.edu;mahyarfa@seas.upenn.edu;~J_Zico_Kolter1", "gender": "F;;;", "homepage": "https://priyadonti.com/;;;", "dblp": "198/0500;;;", "google_scholar": "PfRSkfEAAAAJ;;;", "orcid": ";;;", "linkedin": "priya-donti/;;;", "or_profile": "~Priya_L._Donti1;mroderick@cmu.edu;mahyarfa@seas.upenn.edu;~J_Zico_Kolter1", "aff": "Carnegie Mellon University;;;", "aff_domain": "cmu.edu;;;", "position": "PhD student;;;", "bibtex": "@inproceedings{\ndonti2021enforcing,\ntitle={Enforcing robust control guarantees within neural network policies},\nauthor={Priya L. Donti and Melrose Roderick and Mahyar Fazlyab and J Zico Kolter},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=5lhWG3Hj2By}\n}", "github": "[![github](/images/github_icon.svg) locuslab/robust-nn-control](https://github.com/locuslab/robust-nn-control)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "4;4;4;3", "wc_review": "312;357;351;173", "wc_reply_reviewers": "0;0;0;15", "wc_reply_authors": "632;448;1834;644", "reply_reviewers": "0;0;0;1", "reply_authors": "2;2;4;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 298.25, 74.34841962005649 ], "wc_reply_reviewers_avg": [ 3.75, 6.49519052838329 ], "wc_reply_authors_avg": [ 889.5, 550.8128084930488 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 1.0897247358851685 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 84, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18128961654135874405&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=5lhWG3Hj2By", "email": "cmu.edu;;;", "author_num": 4, "aff_unique_index": "0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2615", "id": "5m3SEczOV8L", "poster": "", "openreview": "https://openreview.net/forum?id=5m3SEczOV8L", "slides": "https://iclr.cc/virtual/2021/poster/2615", "video": "https://iclr.cc/virtual/2021/poster/2615", "author_site": "Zhisheng Xiao, Karsten Kreis, Jan Kautz, Arash Vahdat", "tldr": "", "abstract": "Energy-based models (EBMs) have recently been successful in representing complex distributions of small images. However, sampling from them requires expensive Markov chain Monte Carlo (MCMC) iterations that mix slowly in high dimensional pixel space. Unlike EBMs, variational autoencoders (VAEs) generate samples quickly and are equipped with a latent space that enables fast traversal of the data manifold. However, VAEs tend to assign high probability density to regions in data space outside the actual data distribution and often fail at generating sharp images. In this paper, we propose VAEBM, a symbiotic composition of a VAE and an EBM that offers the best of both worlds. VAEBM captures the overall mode structure of the data distribution using a state-of-the-art VAE and it relies on its EBM component to explicitly exclude non-data-like regions from the model and refine the image samples. Moreover, the VAE component in VAEBM allows us to speed up MCMC updates by reparameterizing them in the VAE's latent space. Our experimental results show that VAEBM outperforms state-of-the-art VAEs and EBMs in generative quality on several benchmark image datasets by a large margin. It can generate high-quality images as large as 256$\\times$256 pixels with short MCMC chains. We also demonstrate that VAEBM provides complete mode coverage and performs well in out-of-distribution detection. ", "keywords": "Energy-based Models;Variational Auto-encoder;MCMC", "primary_area": "", "supplementary_material": "", "author": "Zhisheng Xiao;Karsten Kreis;Jan Kautz;Arash Vahdat", "authorids": "~Zhisheng_Xiao1;~Karsten_Kreis1;~Jan_Kautz1;~Arash_Vahdat3", "gender": "M;;;M", "homepage": "https://xavierxiao.github.io;https://karstenkreis.github.io/;http://jankautz.com;http://latentspace.cc/", "dblp": ";238/6834;48/6214;92/8108", "google_scholar": "3Wex6VIAAAAJ;https://scholar.google.de/citations?user=rFd-DiAAAAAJ;P9FclNEAAAAJ;https://scholar.google.ca/citations?user=p9-nlRIAAAAJ", "orcid": ";;;", "linkedin": ";karstenkreis;;", "or_profile": "~Zhisheng_Xiao1;~Karsten_Kreis1;~Jan_Kautz1;~Arash_Vahdat3", "aff": "University of Chicago;NVIDIA;NVIDIA;NVIDIA", "aff_domain": "uchicago.edu;nvidia.com;nvidia.com;nvidia.com", "position": "PhD student;Research Scientist;VP Research;Research Scientist", "bibtex": "@inproceedings{\nxiao2021vaebm,\ntitle={{\\{}VAEBM{\\}}: A Symbiosis between Variational Autoencoders and Energy-based Models},\nauthor={Zhisheng Xiao and Karsten Kreis and Jan Kautz and Arash Vahdat},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=5m3SEczOV8L}\n}", "github": "[![github](/images/github_icon.svg) NVlabs/VAEBM](https://github.com/NVlabs/VAEBM)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "3;4;5;4", "wc_review": "145;250;251;264", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "370;493;734;385", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 227.5, 47.95049530505394 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 495.5, 145.64425838322634 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 137, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16833810899074704050&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=5m3SEczOV8L", "email": "uchicago.edu;nvidia.com;nvidia.com;nvidia.com", "author_num": 4, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "University of Chicago;NVIDIA", "aff_unique_dep": ";NVIDIA Corporation", "aff_unique_url": "https://www.uchicago.edu;https://www.nvidia.com", "aff_unique_abbr": "UChicago;NVIDIA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "5mhViEOQxaV", "title": "Controllable Pareto Multi-Task Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "A multi-task learning (MTL) system aims at solving multiple related tasks at the same time. With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together. Multiple models with different preferences over tasks have to be trained and stored for many real-world applications where the trade-off has to be made online. This work proposes a novel controllable Pareto multi-task learning framework, to enable the system to make real-time trade-off switch among different tasks with a single model. To be specific, we formulate the MTL as a preference-conditioned multiobjective optimization problem, for which there is a parametric mapping from the preferences to the Pareto stationary solutions. A single hypernetwork-based multi-task neural network is built to learn all tasks with different trade-off preferences among them, where the hypernetwork generates the model parameters conditioned on the preference. At the inference time, MTL practitioners can easily control the model performance based on different trade-off preferences in real-time. Experiments on different applications demonstrate that the proposed model is efficient for solving various multi-task learning problems. ", "keywords": "Multi-Task Learning;Multi-Objective Optimization", "primary_area": "", "supplementary_material": "/attachment/472d0aa4a37d5ac7d2a5e3d0dfebb8bbdbe85ff4.zip", "author": "Xi Lin;Zhiyuan Yang;Qingfu Zhang;Sam Kwong", "authorids": "~Xi_Lin2;~Zhiyuan_Yang2;qingfu.zhang@cityu.edu.hk;~Sam_Kwong1", "gender": "M;;;M", "homepage": "https://xi-l.github.io/;;;https://scholars.ln.edu.hk/en/persons/sam-tak-wu-kwong", "dblp": "43/489-1;;;18/30", "google_scholar": "QB_MUboAAAAJ;;;_PVI6EAAAAAJ", "orcid": ";;;0000-0001-7484-7261", "linkedin": ";;;", "or_profile": "~Xi_Lin2;~Zhiyuan_Yang2;qingfu.zhang@cityu.edu.hk;~Sam_Kwong1", "aff": "City University of Hong Kong;;;", "aff_domain": "cityu.edu.hk;;;", "position": "Postdoc;;;", "bibtex": "@misc{\nlin2021controllable,\ntitle={Controllable Pareto Multi-Task Learning},\nauthor={Xi Lin and Zhiyuan Yang and Qingfu Zhang and Sam Kwong},\nyear={2021},\nurl={https://openreview.net/forum?id=5mhViEOQxaV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=5mhViEOQxaV", "pdf_size": 0, "rating": "4;5;7", "confidence": "5;4;4", "wc_review": "199;146;496", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1693;1366;2317", "reply_reviewers": "0;0;0", "reply_authors": "3;3;4", "rating_avg": [ 5.333333333333333, 1.247219128924647 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 280.3333333333333, 154.0266932132941 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1792.0, 394.5047528230807 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 3.3333333333333335, 0.4714045207910317 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7559289460184544, "gs_citation": 87, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5523361401444540559&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "City University of Hong Kong", "aff_unique_dep": "", "aff_unique_url": "https://www.cityu.edu.hk", "aff_unique_abbr": "CityU", "aff_campus_unique_index": "0", "aff_campus_unique": "Hong Kong SAR", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "5qK0RActG1x", "title": "Democratizing Evaluation of Deep Model Interpretability through Consensus", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep learning interpretability tools, such as (Bau et al., 2017; Ribeiro et al., 2016; Smilkov et al., 2017), have been proposed to explain and visualize the ways that deep neural networks make predictions. The success of these methods highly relies on human subjective interpretations, i.e., the ground truth of interpretations, such as feature importance ranking or locations of visual objects, when evaluating the interpretability of the deep models on a speci\ufb01c task. For tasks that the ground truth of interpretations is not available, we propose a novel framework Consensus incorporating an ensemble of deep models as the committee for interpretability evaluation. Given any task/dataset, Consensus \ufb01rst obtains the interpretation results using existing tools, e.g., LIME (Ribeiro et al., 2016), for every model in the committee, then aggregates the results from the entire committee and approximates the \u201cground truth\u201d of interpretations through voting. With such approximated ground truth, Consensus evaluates the interpretability of a model through matching its interpretation result and the approximated one, and ranks the matching scores together with committee members, so as to pursue the absolute and relative interpretability evaluation results. We carry out extensive experiments to validate Consensus on various datasets. The results show that Consensus can precisely identify the interpretability for a wide range of models on ubiquitous datasets that the ground truth is not available. Robustness analyses further demonstrate the advantage of the proposed framework to reach the consensus of interpretations through simple voting and evaluate the interpretability of deep models. Through the proposed Consensus framework, the interpretability evaluation has been democratized without the need of ground truth as criterion.", "keywords": "interpretability evaluation;deep model interpretability", "primary_area": "", "supplementary_material": "/attachment/3bbd2b102c3d0226e2ce985f922784848b186a5f.zip", "author": "Xuhong Li;Haoyi Xiong;Siyu Huang;Shilei Ji;Yanjie Fu;Dejing Dou", "authorids": "~Xuhong_Li3;~Haoyi_Xiong1;~Siyu_Huang2;jishilei@baidu.com;~Yanjie_Fu2;~Dejing_Dou1", "gender": ";M;M;;;", "homepage": ";https://sites.google.com/site/haoyixiongshomepage/;https://siyuhuang.github.io;;;", "dblp": ";06/2700;146/9031.html;;;", "google_scholar": ";f_Kcie0AAAAJ;hQN7Zn0AAAAJ;;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Xuhong_Li3;~Haoyi_Xiong1;~Siyu_Huang2;jishilei@baidu.com;~Yanjie_Fu2;~Dejing_Dou1", "aff": ";Baidu;Baidu Research;;;", "aff_domain": ";baidu.com;baidu.com;;;", "position": ";Principal Researcher;Research Scientist;;;", "bibtex": "@misc{\nli2021democratizing,\ntitle={Democratizing Evaluation of Deep Model Interpretability through Consensus},\nauthor={Xuhong Li and Haoyi Xiong and Siyu Huang and Shilei Ji and Yanjie Fu and Dejing Dou},\nyear={2021},\nurl={https://openreview.net/forum?id=5qK0RActG1x}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=5qK0RActG1x", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "3;4;5;3", "wc_review": "598;768;660;112", "wc_reply_reviewers": "0;131;32;0", "wc_reply_authors": "970;1510;809;151", "reply_reviewers": "0;1;1;0", "reply_authors": "3;3;3;1", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 534.5, 251.40157119636305 ], "wc_reply_reviewers_avg": [ 40.75, 53.71859547679928 ], "wc_reply_authors_avg": [ 860.0, 484.73755785992074 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.5, 0.8660254037844386 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.1348399724926484, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:bF3lX77MAMoJ:scholar.google.com/&scioq=Democratizing+Evaluation+of+Deep+Model+Interpretability+through+Consensus&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Baidu", "aff_unique_dep": "Baidu, Inc.", "aff_unique_url": "https://www.baidu.com", "aff_unique_abbr": "Baidu", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "5rc0K0ezhqI", "title": "Unpacking Information Bottlenecks: Surrogate Objectives for Deep Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "The Information Bottleneck principle offers both a mechanism to explain how deep neural networks train and generalize, as well as a regularized objective with which to train models. However, multiple competing objectives are proposed in the literature, and the information-theoretic quantities used in these objectives are difficult to compute for large deep neural networks, which in turn limits their use as a training objective. In this work, we review these quantities, compare and unify previously proposed objectives, which allows us to develop surrogate objectives more friendly to optimization without relying on cumbersome tools such as density estimation. We find that these surrogate objectives allow us to apply the information bottleneck to modern neural network architectures. We demonstrate our insights on MNIST, CIFAR-10 and ImageNette with modern DNN architectures (ResNets).", "keywords": "deep learning;information bottleneck;information theory", "primary_area": "", "supplementary_material": "/attachment/1dac13c514f4f785ee13387b8ef1962076223e81.zip", "author": "Andreas Kirsch;Clare Lyle;Yarin Gal", "authorids": "~Andreas_Kirsch1;~Clare_Lyle1;~Yarin_Gal1", "gender": ";;", "homepage": "https://www.blackhc.net;;http://www.cs.ox.ac.uk/people/yarin.gal/website//", "dblp": "56/2914-2;192/1910;67/9076", "google_scholar": "WYQVZpYAAAAJ;;https://scholar.google.co.uk/citations?user=SIayDoQAAAAJ", "orcid": "0000-0001-8244-7700;;", "linkedin": "blackhc;;", "or_profile": "~Andreas_Kirsch1;~Clare_Lyle1;~Yarin_Gal1", "aff": "University of Oxford;University of Oxford;University of Oxford", "aff_domain": "ox.ac.uk;ox.ac.uk;ox.ac.uk", "position": "PhD student;PhD student;Associate Professor", "bibtex": "@misc{\nkirsch2021unpacking,\ntitle={Unpacking Information Bottlenecks: Surrogate Objectives for Deep Learning},\nauthor={Andreas Kirsch and Clare Lyle and Yarin Gal},\nyear={2021},\nurl={https://openreview.net/forum?id=5rc0K0ezhqI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=5rc0K0ezhqI", "pdf_size": 0, "rating": "4;6;6;8", "confidence": "4;4;4;3", "wc_review": "329;561;410;236", "wc_reply_reviewers": "0;24;0;0", "wc_reply_authors": "1405;481;921;162", "reply_reviewers": "0;1;0;0", "reply_authors": "2;1;2;1", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 384.0, 119.30423295088904 ], "wc_reply_reviewers_avg": [ 6.0, 10.392304845413264 ], "wc_reply_authors_avg": [ 742.25, 468.00928142505893 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11859636823431577632&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "5slGDu_bVc6", "title": "Learning from deep model via exploring local targets", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep neural networks often have huge number of parameters, which posts challenges in deployment in application scenarios with limited memory and computation capacity. Knowledge distillation is one approach to derive compact models from bigger ones. \nHowever, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima. In this paper, we propose proKT, a new model-agnostic method by projecting the supervision signals of a teacher model into the student's parameter space. Such projection is implemented by decomposing the training objective into local intermediate targets with approximate mirror descent technique. The proposed method could be less sensitive with the quirks during optimization which could result in a better local optima. Experiments on both image and text datasets show that our proposed proKT consistently achieves the state-of-the-art performance comparing to all existing knowledge distillation methods. ", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/fa850b70eaac3888832747c0c071b9556dabbfe0.zip", "author": "Wenxian Shi;Yuxuan Song;Hao Zhou;Bohan Li;Lei Li", "authorids": "~Wenxian_Shi1;~Yuxuan_Song2;~Hao_Zhou6;libohan.05@bytedance.com;~Lei_Li11", "gender": ";M;;;M", "homepage": ";https://yuxuansong.com;;;https://www.cs.cmu.edu/~leili", "dblp": ";;;;13/7007-5.html", "google_scholar": ";xlnZ1OIAAAAJ;;;BYXqAlwAAAAJ", "orcid": ";;;;0000-0003-3095-9776", "linkedin": ";;;;", "or_profile": "~Wenxian_Shi1;~Yuxuan_Song2;~Hao_Zhou6;libohan.05@bytedance.com;~Lei_Li11", "aff": ";Bytedance;;;ByteDance AI Lab", "aff_domain": ";bytedance.com;;;bytedance.com", "position": ";Researcher;;;Director", "bibtex": "@misc{\nshi2021learning,\ntitle={Learning from deep model via exploring local targets},\nauthor={Wenxian Shi and Yuxuan Song and Hao Zhou and Bohan Li and Lei Li},\nyear={2021},\nurl={https://openreview.net/forum?id=5slGDu_bVc6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=5slGDu_bVc6", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "5;5;5;3", "wc_review": "416;1125;358;500", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "173;573;175;246", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 599.75, 307.4267189103771 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 291.75, 165.02026390719413 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15167321464114917961&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "ByteDance", "aff_unique_dep": "", "aff_unique_url": "https://www.bytedance.com", "aff_unique_abbr": "Bytedance", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "5tJMTHv0l8g", "title": "Stego Networks: Information Hiding on Deep Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "The best way of keeping a secret is to pretend there is not one. In this spirit, a class of techniques called steganography aims to hide secret messages on various media leaving as little detectable trace as possible. This paper considers neural networks as novel steganographic cover media, which we call stego networks, that can be used to hide one's secret messages. Although there have been numerous attempts to hide information in the output of neural networks, techniques for hiding information in the neural network parameters themselves have not been actively studied in the literature. The widespread use of deep learning models in various cloud computing platforms and millions of mobile devices as of today implies the importance of safety issues regarding stego networks among deep learning researchers and practitioners. In response, this paper presents the advantages of stego networks over other types of stego media in terms of security and capacity. We provide observations that the fraction bits of some typical network parameters in a floating-point representation tend to follow uniform distributions and explain how it can help a secret sender to encrypt messages that are indistinguishable from the original content. We demonstrate that network parameters can embed a large amount of secret information. Even the most significant fraction bits can be used for hiding secrets without inducing noticeable performance degradation while making it significantly hard to remove secrets by perturbing insignificant bits. Finally, we discuss possible use cases of stego networks and methods to detect or remove secrets from stego networks.", "keywords": "Steganography;Information Hiding;Security", "primary_area": "", "supplementary_material": "/attachment/99ad07e92397c9b23941479fa76f84c7896c2247.zip", "author": "Youngwoo Cho;Beomsoo Kim;Jaegul Choo", "authorids": "~Youngwoo_Cho1;~Beomsoo_Kim2;~Jaegul_Choo1", "gender": "M;M;M", "homepage": ";;https://sites.google.com/site/jaegulchoo/", "dblp": "276/6715;;07/2074", "google_scholar": "Ys4ejKUAAAAJ;;GHJYsLEAAAAJ", "orcid": "0000-0001-6082-9468;;", "linkedin": "youngwoo-cho;beomsoo-kim-6713a870/;", "or_profile": "~Youngwoo_Cho1;~Beomsoo_Kim2;~Jaegul_Choo1", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "position": "PhD student;MS student;Associate Professor", "bibtex": "@misc{\ncho2021stego,\ntitle={Stego Networks: Information Hiding on Deep Neural Networks},\nauthor={Youngwoo Cho and Beomsoo Kim and Jaegul Choo},\nyear={2021},\nurl={https://openreview.net/forum?id=5tJMTHv0l8g}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=5tJMTHv0l8g", "pdf_size": 0, "rating": "3;7;7", "confidence": "3;3;4", "wc_review": "687;317;752", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1553;508;811", "reply_reviewers": "0;0;0", "reply_authors": "3;2;2", "rating_avg": [ 5.666666666666667, 1.8856180831641267 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 585.3333333333334, 191.58693993995402 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 957.3333333333334, 438.98848377706776 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.3333333333333335, 0.4714045207910317 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15478265186508846041&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "5vShUEyjmm", "title": "Dual Contradistinctive Generative Autoencoder", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We present a new generative autoencoder model with dual contradistinctive losses to improve generative autoencoder that performs simultaneous inference (reconstruction) and synthesis (generation). We name our model dual contradistinctive generative autoencoder (DC-VAE) that integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for the reconstruction/synthesis), both being contradistinctive. There also exists a mathematical connection between the instance-based classification and instance-level conditional distribution. DC-VAE achieves competitive results in three tasks, including image synthesis, image reconstruction, and representation learning. DC-VAE is applicable to various tasks in computer vision and machine learning.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Gaurav Parmar;Dacheng Li;Kwonjoon Lee;Zhuowen Tu", "authorids": "~Gaurav_Parmar1;~Dacheng_Li1;~Kwonjoon_Lee1;~Zhuowen_Tu1", "gender": "M;;M;", "homepage": "https://gauravparmar.com/;;https://kjunelee.github.io;", "dblp": "239/9682;;127/7948;", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;;C6Wu8M0AAAAJ;", "orcid": ";;0000-0002-1433-551X;", "linkedin": ";;;", "or_profile": "~Gaurav_Parmar1;~Dacheng_Li1;~Kwonjoon_Lee1;~Zhuowen_Tu1", "aff": "Carnegie Mellon University;;University of California, San Diego;", "aff_domain": "cmu.edu;;ucsd.edu;", "position": "MS student;;PhD student;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=5vShUEyjmm", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "4;5;4;4", "wc_review": "318;619;214;580", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "752;778;455;632", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 432.75, 171.31166772873354 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 654.25, 127.53896463434224 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 103, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2829217617158687899&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1", "aff_unique_norm": "Carnegie Mellon University;University of California, San Diego", "aff_unique_dep": ";", "aff_unique_url": "https://www.cmu.edu;https://www.ucsd.edu", "aff_unique_abbr": "CMU;UCSD", "aff_campus_unique_index": "1", "aff_campus_unique": ";San Diego", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "5wmNjjvGOXh", "title": "Selfish Sparse RNN Training", "track": "main", "status": "Reject", "tldr": "", "abstract": "Sparse neural networks have been widely applied to reduce the necessary resource requirements to train and deploy over-parameterized deep neural networks. For inference acceleration, methods that induce sparsity from a pre-trained dense network (dense-to-sparse) work effectively. Recently, dynamic sparse training (DST) has been proposed to train sparse neural networks without pre-training a large and dense network (sparse-to-sparse), so that the training process can also be accelerated. However, previous sparse-to-sparse methods mainly focus on Multilayer Perceptron Networks (MLPs) and Convolutional Neural Networks (CNNs), failing to match the performance of dense-to-sparse methods in Recurrent Neural Networks (RNNs) setting. In this paper, we propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance. During training, we allow RNN layers to have a non-uniform redistribution across cell weights for a better regularization. Further, we introduce SNT-ASGD, a variant of the averaged stochastic gradient optimizer, which significantly improves the performance of all sparse training methods for RNNs. Using these strategies, we achieve state-of-the-art sparse training results, even better than dense model results, with various types of RNNs on Penn TreeBank and Wikitext-2 datasets.", "keywords": "dynamic sparse training;sparse neural networks;dynamic sparse RNN training;recurrent neural networks", "primary_area": "", "supplementary_material": "/attachment/c06e76f7ccb5be017669a10d6f2ecb4e3a35b88d.zip", "author": "Shiwei Liu;Decebal Constantin Mocanu;Yulong Pei;Mykola Pechenizkiy", "authorids": "~Shiwei_Liu2;~Decebal_Constantin_Mocanu1;~Yulong_Pei1;~Mykola_Pechenizkiy1", "gender": "M;M;;M", "homepage": "https://shiweiliuiiiiiii.github.io/;https://wwwen.uni.lu/recherche/fstm/dcs/members/decebal_constantin_mocanu;;http://www.win.tue.nl/~mpechen/", "dblp": "234/8697-3.html;133/7764;;37/4649", "google_scholar": "73IbXtsAAAAJ;RlQgUwEAAAAJ;;https://scholar.google.com.tw/citations?user=F0uFT_kAAAAJ", "orcid": ";0000-0002-5636-7683;;0000-0003-4955-0743", "linkedin": ";;;mpechen/", "or_profile": "~Shiwei_Liu2;~Decebal_Constantin_Mocanu1;~Yulong_Pei1;~Mykola_Pechenizkiy1", "aff": "Eindhoven University of Technology;University of Twente;;Eindhoven University of Technology", "aff_domain": "tue.nl;utwente.nl;;tue.nl", "position": "PhD student;Assistant Professor;;Full Professor", "bibtex": "@misc{\nliu2021selfish,\ntitle={Selfish Sparse {\\{}RNN{\\}} Training},\nauthor={Shiwei Liu and Decebal Constantin Mocanu and Yulong Pei and Mykola Pechenizkiy},\nyear={2021},\nurl={https://openreview.net/forum?id=5wmNjjvGOXh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=5wmNjjvGOXh", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "3;3;5;3", "wc_review": "295;545;534;575", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "762;983;689;784", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 487.25, 112.00530121382648 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 804.5, 108.89100054641797 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.4714045207910316, "gs_citation": 48, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14857851775115975297&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;0", "aff_unique_norm": "Eindhoven University of Technology;University of Twente", "aff_unique_dep": ";", "aff_unique_url": "https://www.tue.nl;https://www.utwente.nl", "aff_unique_abbr": "TU/e;UT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Netherlands" }, { "id": "5xaInvrGWp", "title": "Adversarially Robust Federated Learning for Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "In federated learning, data is distributed among local clients which collaboratively train a prediction model using secure aggregation. To preserve the privacy of the clients, the federated learning paradigm requires each client to maintain a private local training data set, and only uploads its summarized model updates to the server. In this work, we show that this paradigm could lead to a vulnerable model, which collapses in performance when the corrupted data samples (under adversarial manipulations) are used for prediction after model deployment. To improve model robustness, we first decompose the aggregation error of the central server into bias and variance, and then, propose a robust federated learning framework, named Fed_BVA, that performs on-device adversarial training using the bias-variance oriented adversarial examples supplied by the server via asymmetrical communications. The experiments are conducted on multiple benchmark data sets using several prevalent neural network models, and the empirical results show that our framework is robust against white-box and black-box adversarial corruptions under both IID and non-IID settings. ", "keywords": "federated learning;adversarial training;robustness;bias-variance decomposition", "primary_area": "", "supplementary_material": "/attachment/1283696530c9503b94cce779c478f75cf73d1bbd.zip", "author": "Yao Zhou;Jun Wu;Jingrui He", "authorids": "~Yao_Zhou3;~Jun_Wu3;~Jingrui_He1", "gender": "M;M;F", "homepage": "https://publish.illinois.edu/yaozhou3/;https://junwu6.github.io/;https://www.hejingrui.org", "dblp": "19/8104.html;20/3894-19.html;34/2685", "google_scholar": "-SEKavEAAAAJ;TZXUS-oAAAAJ;hXpZynkAAAAJ", "orcid": "0000-0002-9575-2832;0000-0002-1512-524X;0000-0002-6429-6272", "linkedin": ";jun-wu-08a962176/;", "or_profile": "~Yao_Zhou3;~Jun_Wu3;~Jingrui_He1", "aff": "University of Illinois, Urbana-Champaign;University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign", "aff_domain": "uiuc.edu;illinois.edu;illinois.edu", "position": "PhD student;PhD student;Associate Professor", "bibtex": "@misc{\nzhou2021adversarially,\ntitle={Adversarially Robust Federated Learning for Neural Networks},\nauthor={Yao Zhou and Jun Wu and Jingrui He},\nyear={2021},\nurl={https://openreview.net/forum?id=5xaInvrGWp}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=5xaInvrGWp", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;3;3", "wc_review": "788;425;443;808", "wc_reply_reviewers": "56;0;121;0", "wc_reply_authors": "557;766;791;727", "reply_reviewers": "1;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 616.0, 182.2484567835898 ], "wc_reply_reviewers_avg": [ 44.25, 49.861683685972736 ], "wc_reply_authors_avg": [ 710.25, 91.37115244977487 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15087215646959843517&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1", "aff_unique_norm": "University of Illinois;University of Illinois Urbana-Champaign", "aff_unique_dep": ";", "aff_unique_url": "https://illinois.edu;https://illinois.edu", "aff_unique_abbr": "UIUC;UIUC", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Urbana-Champaign", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "5zErZzsW2U1", "title": "Category Disentangled Context: Turning Category-irrelevant Features Into Treasures", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Deep neural networks have achieved great success in computer vision, thanks to their ability in extracting category-relevant semantic features. On the contrary, irrelevant features (e.g., background and confusing parts) are usually considered to be harmful. In this paper, we bring a new perspective on the potential benefits brought by irrelevant features: they could act as references to help identify relevant ones. Therefore, (1) we formulate a novel Category Disentangled Context (CDC) and develop an adversarial deep network to encode it; (2) we investigate utilizing the CDC to improve image classification with the attention mechanism as a bridge. Extensive comparisons on four benchmarks with various backbone networks demonstrate that the CDC could bring remarkable improvements consistently, validating the usefulness of irrelevant features.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Keke Tang;Guodong Wei;Jie Zhu;Yuexin Ma;Runnan Chen;Zhaoquan Gu;Wenping Wang", "authorids": "~Keke_Tang2;~Guodong_Wei1;~Jie_Zhu2;~Yuexin_Ma2;~Runnan_Chen1;~Zhaoquan_Gu2;~Wenping_Wang1", "gender": "M;;;F;M;M;M", "homepage": "https://tangbohu.github.io/;https://www.researchgate.net/profile/Guodong_Wei6;;http://yuexinma.me/aboutme.html;https://scholar.google.com.hk/citations?hl=en&user=Uq2DuzkAAAAJ&view_op=list_works&sortby=pubdate;;https://engineering.tamu.edu/cse/profiles/Wang-Wenping.html", "dblp": "162/3984;;;209/5925;232/1849;128/8237;", "google_scholar": "9Lk6HpQAAAAJ;;;;https://scholar.google.com.hk/citations?hl=en;v-vA6Z8AAAAJ;28shvv0AAAAJ", "orcid": "0000-0003-0377-1022;;0000-0001-9798-3346;;;;0000-0002-2284-3952", "linkedin": ";;;;;;", "or_profile": "~Keke_Tang2;~Guodong_Wei1;~Jie_Zhu2;~Yuexin_Ma2;~Runnan_Chen1;~Zhaoquan_Gu2;~Wenping_Wang1", "aff": "Guangzhou University;South China University of Technology;;ShanghaiTech University;Tencent Youtu Lab;Guangzhou University, China, Tsinghua University;Texas A&M University - College Station", "aff_domain": "gzhu.edu.cn;scut.edu.cn;;shanghaitech.edu.cn;tencent.com;gzhu.edu.cn;tamu.edu", "position": "Associate Professor;PhD student;;Assistant Professor;Intern;Full Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=5zErZzsW2U1", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;5;4;4", "wc_review": "384;162;226;216", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 247.0, 82.75868534480233 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:6l0s1ZN9-NMJ:scholar.google.com/&scioq=Category+Disentangled+Context:+Turning+Category-irrelevant+Features+Into+Treasures&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2;3;0;4", "aff_unique_norm": "Guangzhou University;South China University of Technology;ShanghaiTech University;Tencent;Texas A&M University", "aff_unique_dep": ";;;Youtu Lab;", "aff_unique_url": "http://www.gzhu.edu.cn;https://www.scut.edu.cn;https://www.shanghaitech.edu.cn;https://www.tencent.com;https://www.tamu.edu", "aff_unique_abbr": "GU;SCUT;ShanghaiTech;Tencent;TAMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";College Station", "aff_country_unique_index": "0;0;0;0;0;1", "aff_country_unique": "China;United States" }, { "id": "60GDNLY-m3M", "title": "Rethinking Pseudo-labeled Sample Mining for Semi-Supervised Object Detection", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Consistency-based method has been proved effective for semi-supervised learning (SSL). However, the impact of the pseudo-labeled samples' quality as well as the mining strategies for high quality training sample have rarely been studied in SSL. An intuitive idea is to select pseudo-labeled training samples by threshold. We find it essential the selection of these thresholds to the final result of SSL. Following this discovery, we propose SEAT (Score Ensemble with Adaptive Threshold), a simple and efficient semi-supervised learning object detection method, in which the high confidence pseudo-labels are selected for self-training. Apart from confidence score as the indicator of the sample's quality, we also introduce the scores of temporal consistency and augmentation consistency. The scores provide a more comprehensive description to the quality of each sample. To cope with the data distribution difference among categories, the adaptive threshold strategy is used to automatically determine the sample mining threshold for each category. We conduct experiments on PASCAL-VOC and MSCOCO, extensive results show that our method is competitive and can be easily combined with consistency-based methods.", "keywords": "semi-supervised learning;object detection;sample mining;threshold;consistency", "primary_area": "", "supplementary_material": "", "author": "Duo Li;Sanli Tang;Zhanzhan Cheng;Shiliang Pu;Yi Niu;Wenming Tan;Fei Wu;Xiaokang Yang", "authorids": "liduo6@hikvision.com;~Sanli_Tang1;~Zhanzhan_Cheng1;~Shiliang_Pu1;~Yi_Niu1;~Wenming_Tan1;~Fei_Wu2;~Xiaokang_Yang1", "gender": ";M;M;M;;M;;M", "homepage": ";https://github.com/tangsanli5201/;;;;;https://person.zju.edu.cn/wufei;https://icne.sjtu.edu.cn/info/1064/1078.htm", "dblp": ";227/3326;163/6485;155/3173;42/7663;224/0172;84/3254-1;06/3071-1.html", "google_scholar": ";;YrafmOUAAAAJ;https://scholar.google.com.hk/citations?user=NWR_wpoAAAAJ;;https://scholar.google.com/citations?hl=en;XJLn4MYAAAAJ;yDEavdMAAAAJ", "orcid": ";;;;;0000-0003-1338-4536;;0000-0003-4029-3322", "linkedin": ";;;;;;;", "or_profile": "liduo6@hikvision.com;~Sanli_Tang1;~Zhanzhan_Cheng1;~Shiliang_Pu1;~Yi_Niu1;~Wenming_Tan1;~Fei_Wu2;~Xiaokang_Yang1", "aff": ";Hikvision Research Institute;Zhejiang University;;Hikvision Research Institute;Hikvision Research Institute;Zhejiang University;Shanghai Jiaotong University", "aff_domain": ";hikvision.com;zju.edu.cn;;hikvision.com;hikvision.com;zju.edu.cn;sjtu.edu.cn", "position": ";Researcher;PhD student;;Researcher;Researcher;Full Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=60GDNLY-m3M", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:tOV0yAe-12UJ:scholar.google.com/&scioq=Rethinking+Pseudo-labeled+Sample+Mining+for+Semi-Supervised+Object+Detection&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0;0;1;2", "aff_unique_norm": "Hikvision Research Institute;Zhejiang University;Shanghai Jiao Tong University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.hikvision.com/cn/;https://www.zju.edu.cn;https://www.sjtu.edu.cn", "aff_unique_abbr": "Hikvision;ZJU;SJTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "China" }, { "title": "Meta-learning with negative learning rates", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3183", "id": "60j5LygnmD", "poster": "", "openreview": "https://openreview.net/forum?id=60j5LygnmD", "slides": "https://iclr.cc/virtual/2021/poster/3183", "video": "https://iclr.cc/virtual/2021/poster/3183", "tldr": "", "abstract": "Deep learning models require a large amount of data to perform well. When data is scarce for a target task, we can transfer the knowledge gained by training on similar tasks to quickly learn the target. A successful approach is meta-learning, or \"learning to learn\" a distribution of tasks, where \"learning\" is represented by an outer loop, and \"to learn\" by an inner loop of gradient descent. However, a number of recent empirical studies argue that the inner loop is unnecessary and more simple models work equally well or even better. We study the performance of MAML as a function of the learning rate of the inner loop, where zero learning rate implies that there is no inner loop. Using random matrix theory and exact solutions of linear models, we calculate an algebraic expression for the test loss of MAML applied to mixed linear regression and nonlinear regression with overparameterized models. Surprisingly, while the optimal learning rate for adaptation is positive, we find that the optimal learning rate for training is always negative, a setting that has never been considered before. Therefore, not only does the performance increase by decreasing the learning rate to zero, as suggested by recent work, but it can be increased even further by decreasing the learning rate to negative\nvalues. These results help clarify under what circumstances meta-learning performs best.", "keywords": "Meta-learning", "primary_area": "", "supplementary_material": "/attachment/e34c29f2a8252f67626a8ad3d84f86309d577ff8.zip", "author": "Alberto Bernacchia", "authorids": "~Alberto_Bernacchia1", "gender": "", "homepage": "", "dblp": "68/9669", "google_scholar": "n48pFqcAAAAJ", "orcid": "", "linkedin": "", "or_profile": "~Alberto_Bernacchia1", "aff": "MedaiTek Research", "aff_domain": "mtkresearch.com", "position": "Team Lead", "bibtex": "@inproceedings{\nbernacchia2021metalearning,\ntitle={Meta-learning with negative learning rates},\nauthor={Alberto Bernacchia},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=60j5LygnmD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "3;4;4;4", "wc_review": "263;376;1061;204", "wc_reply_reviewers": "153;0;346;0", "wc_reply_authors": "371;281;1046;47", "reply_reviewers": "1;0;1;0", "reply_authors": "2;2;2;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 476.0, 343.35768522052916 ], "wc_reply_reviewers_avg": [ 124.75, 142.19243123317077 ], "wc_reply_authors_avg": [ 436.25, 371.3727069939308 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15617189750031025588&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=60j5LygnmD", "email": "mtkresearch.com", "author_num": 1, "aff_unique_index": "0", "aff_unique_norm": "MedaiTek", "aff_unique_dep": "Research", "aff_unique_url": "", "aff_unique_abbr": "" }, { "id": "65MxtdJwEnl", "title": "Neural CDEs for Long Time Series via the Log-ODE Method", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural Controlled Differential Equations (Neural CDEs) are the continuous-time analogue of an RNN, just as Neural ODEs are analogous to ResNets. However just like RNNs, training Neural CDEs can be difficult for long time series. Here, we propose to apply a technique drawn from stochastic analysis, namely the log-ODE method. Instead of using the original input sequence, our procedure summarises the information over local time intervals via the log-signature map, and uses the resulting shorter stream of log-signatures as the new input. This represents a length/channel trade-off. In doing so we demonstrate efficacy on problems of length up to 17k observations and observe significant training speed-ups, improvements in model performance, and reduced memory requirements compared to the existing algorithm.", "keywords": "CDE;neural differential equation;time series;long time series;log-ODE", "primary_area": "", "supplementary_material": "/attachment/dccad0e0737a821360019f6aa4e9affaecd59f72.zip", "author": "James Morrill;Patrick Kidger;Cristopher Salvi;James Foster;Terry Lyons", "authorids": "morrill@maths.ox.ac.uk;~Patrick_Kidger1;salvi@maths.ox.ac.uk;foster@maths.ox.ac.uk;tlyons@maths.ox.ac.uk", "gender": ";;;;", "homepage": ";https://kidger.site/;;;", "dblp": ";241/7262;;;", "google_scholar": ";https://scholar.google.co.uk/citations?user=5cCLsNQAAAAJ;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "morrill@maths.ox.ac.uk;~Patrick_Kidger1;salvi@maths.ox.ac.uk;foster@maths.ox.ac.uk;tlyons@maths.ox.ac.uk", "aff": ";University of Oxford;;;", "aff_domain": ";ox.ac.uk;;;", "position": ";PhD student;;;", "bibtex": "@misc{\nmorrill2021neural,\ntitle={Neural {\\{}CDE{\\}}s for Long Time Series via the Log-{\\{}ODE{\\}} Method},\nauthor={James Morrill and Patrick Kidger and Cristopher Salvi and James Foster and Terry Lyons},\nyear={2021},\nurl={https://openreview.net/forum?id=65MxtdJwEnl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=65MxtdJwEnl", "pdf_size": 0, "rating": "5;6;7", "confidence": "5;4;4", "wc_review": "857;329;501", "wc_reply_reviewers": "582;88;49", "wc_reply_authors": "1122;345;351", "reply_reviewers": "1;1;1", "reply_authors": "2;1;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 562.3333333333334, 219.87471179944473 ], "wc_reply_reviewers_avg": [ 239.66666666666666, 242.58927337282572 ], "wc_reply_authors_avg": [ 606.0, 364.87532117149277 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 19, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16892352006425975841&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "65_RUwah5kr", "title": "Multi-level Graph Matching Networks for Deep and Robust Graph Similarity Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "While the celebrated graph neural networks yield effective representations for individual nodes of a graph, there has been relatively less success in extending to graph similarity learning. Recent works have considered either global-level graph-graph interactions or low-level node-node interactions, ignoring the rich cross-level interactions (e.g., between nodes of a graph and the other whole graph). In this paper, we propose a Multi-level Graph Matching Network (MGMN) for computing the graph similarity between any pair of graph-structured objects in an end-to-end fashion. The proposed MGMN model consists of a node-graph matching network for effectively learning cross-level interactions between nodes of a graph and the other whole graph, and a siamese graph neural network to learn global-level interactions between two graphs. Furthermore, to bridge the gap of the lack of standard graph similarity learning benchmark, we have created and collected a set of datasets for both graph-graph classification and regression tasks with different sizes in order to evaluate the robustness of models. Our comprehensive experiments demonstrate that MGMN consistently outperforms state-of-the-art baselines on these graph similarity learning benchmarks. ", "keywords": "Semi-supervised Learning;Graph Neural Network;Graph Similarity Learning", "primary_area": "", "supplementary_material": "", "author": "Xiang Ling;Lingfei Wu;Saizhuo Wang;Tengfei Ma;Fangli Xu;Alex X. Liu;Chunming Wu;Shouling Ji", "authorids": "~Xiang_Ling1;~Lingfei_Wu1;~Saizhuo_Wang1;~Tengfei_Ma1;~Fangli_Xu2;alexliu@antfin.com;wuchunming@zju.edu.cn;~Shouling_Ji1", "gender": "M;;M;M;;;;M", "homepage": ";https://sites.google.com/view/teddy-lfwu/;https://saizhuo.wang;https://sites.google.com/site/matf0123/;https://www.linkedin.com/in/lily-xu-2018/;;;https://nesa.zju.edu.cn/", "dblp": "26/5329-1;27/9060;;94/9023-1;89/10932.html;;;07/8388", "google_scholar": "5gaFkzAAAAAJ;https://scholar.google.com/citations?hl=en;;9OvNakkAAAAJ;TFxZdJ0AAAAJ;;;https://scholar.google.com.vn/citations?hl=en", "orcid": ";;;0000-0002-1086-529X;;;;0000-0003-4268-372X", "linkedin": ";;;;;;;", "or_profile": "~Xiang_Ling1;~Lingfei_Wu1;~Saizhuo_Wang1;~Tengfei_Ma1;~Fangli_Xu2;alexliu@antfin.com;wuchunming@zju.edu.cn;~Shouling_Ji1", "aff": "Zhejiang University;International Business Machines;Hong Kong University of Science and Technology;International Business Machines;Squirrel AI Learning;;;Zhejiang University", "aff_domain": "zju.edu.cn;ibm.com;ust.hk;ibm.com;yixue.us;;;zju.edu.cn", "position": "PhD student;Research Staff Member;PhD student;Researcher;Machine Learning Engineer;;;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=65_RUwah5kr", "pdf_size": 0, "rating": "4;4;5;5;5", "confidence": "4;5;4;4;4", "wc_review": "368;371;284;452;245", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 4.6, 0.48989794855663565 ], "confidence_avg": [ 4.2, 0.39999999999999997 ], "wc_review_avg": [ 344.0, 72.62231062146122 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.6123724356957946, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11848454039629741818&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;1;3;0", "aff_unique_norm": "Zhejiang University;International Business Machines Corporation;Hong Kong University of Science and Technology;Squirrel Ai Learning", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.zju.edu.cn;https://www.ibm.com;https://www.ust.hk;https://www.squirrelai.com/", "aff_unique_abbr": "ZJU;IBM;HKUST;", "aff_campus_unique_index": "1", "aff_campus_unique": ";Hong Kong SAR", "aff_country_unique_index": "0;1;0;1;0;0", "aff_country_unique": "China;United States" }, { "id": "65sCF5wmhpv", "title": "Learning to Observe with Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "We consider a decision making problem where an autonomous agent decides on which actions to take based on the observations it collects from the environment. We are interested in revealing the information structure of the observation space illustrating which type of observations are the most important (such as position versus velocity) and the dependence of this on the state of agent (such as at the bottom versus top of a hill). We approach this problem by associating a cost with collecting observations which increases with the accuracy. We adopt a reinforcement learning (RL) framework where the RL agent learns to adjust the accuracy of the observations alongside learning to perform the original task. We consider both the scenario where the accuracy can be adjusted continuously and also the scenario where the agent has to choose between given preset levels, such as taking a sample perfectly or not taking a sample at all. In contrast to the existing work that mostly focuses on sample efficiency during training, our focus is on the behaviour during the actual task. Our results illustrate that the RL agent can learn to use the observation space efficiently and obtain satisfactory performance in the original task while collecting effectively smaller amount of data. By uncovering the relative usefulness of different types of observations and trade-offs within, these results also provide insights for further design of active data acquisition schemes. ", "keywords": "Reinforcement learning;observation strategies;active data collection", "primary_area": "", "supplementary_material": "", "author": "Mehmet Koseoglu;Ece Kunduracioglu;Ayca Ozcelikkale", "authorids": "~Mehmet_Koseoglu1;ecekundura@gmail.com;~Ayca_Ozcelikkale1", "gender": "M;;", "homepage": "https://web.cs.hacettepe.edu.tr/~mkoseoglu/;;https://sites.google.com/site/aycaozcelikkale", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Mehmet_Koseoglu1;ecekundura@gmail.com;~Ayca_Ozcelikkale1", "aff": "Hacettepe University;;Uppsala University", "aff_domain": "cs.hacettepe.edu.tr;;uu.se", "position": "Associate Professor;;Associate Professor", "bibtex": "@misc{\nkoseoglu2021learning,\ntitle={Learning to Observe with Reinforcement Learning},\nauthor={Mehmet Koseoglu and Ece Kunduracioglu and Ayca Ozcelikkale},\nyear={2021},\nurl={https://openreview.net/forum?id=65sCF5wmhpv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=65sCF5wmhpv", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "5;3;3;4", "wc_review": "476;265;532;187", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "611;496;401;343", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 365.0, 143.08563869235795 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 462.75, 101.53416912547223 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.0909090909090909, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8943319262874339898&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Hacettepe University;Uppsala University", "aff_unique_dep": ";", "aff_unique_url": "https://www.hacettepe.edu.tr;https://www.uu.se", "aff_unique_abbr": "Hacettepe;UU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "T\u00fcrkiye;Sweden" }, { "id": "66H4g_OHdnl", "title": "Revealing the Structure of Deep Neural Networks via Convex Duality", "track": "main", "status": "Reject", "tldr": "", "abstract": "We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks with $K$ outputs, we prove that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. We then verify our theory via numerical experiments.", "keywords": "Convex optimization;non-convex optimization;deep learning;convex duality;regularization;ReLU activation;linear networks", "primary_area": "", "supplementary_material": "/attachment/18d179765825d2e45c28ec4186536f83e26e0f63.zip", "author": "Tolga Ergen;Mert Pilanci", "authorids": "~Tolga_Ergen1;~Mert_Pilanci3", "gender": "M;M", "homepage": "https://tolgaergen.github.io/;https://stanford.edu/~pilanci/", "dblp": "202/7477.html;45/8056", "google_scholar": "https://scholar.google.com.tr/citations?user=T1pWaCsAAAAJ;aSAS-aAAAAAJ", "orcid": "0000-0003-4806-0224;", "linkedin": ";mert-pilanci-ba615743/", "or_profile": "~Tolga_Ergen1;~Mert_Pilanci3", "aff": "Stanford University;Stanford University", "aff_domain": "stanford.edu;stanford.edu", "position": "PhD student;Assistant Professor", "bibtex": "@misc{\nergen2021revealing,\ntitle={Revealing the Structure of Deep Neural Networks via Convex Duality},\nauthor={Tolga Ergen and Mert Pilanci},\nyear={2021},\nurl={https://openreview.net/forum?id=66H4g_OHdnl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=66H4g_OHdnl", "pdf_size": 0, "rating": "3;6;6;8", "confidence": "5;4;4;3", "wc_review": "906;260;500;277", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "960;647;1020;50", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;2;1", "rating_avg": [ 5.75, 1.7853571071357126 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 485.75, 260.4576501084197 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 669.25, 384.5538811402116 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9901475429766743, "gs_citation": 78, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2628178598188433341&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;0", "aff_unique_norm": "Stanford University", "aff_unique_dep": "", "aff_unique_url": "https://www.stanford.edu", "aff_unique_abbr": "Stanford", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Stanford", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "67ChnrC0ybo", "title": "Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Despite the great advantage in search efficiency, DARTS often suffers weak stability, which reflects in the large variation among individual trials as well as the sensitivity to the hyper-parameters of the search process. This paper owes such instability to an optimization gap between the super-network and its sub-networks, namely, improving the validation accuracy of the super-network does not necessarily lead to a higher expectation on the performance of the sampled sub-networks. Then, we point out that the gap is due to the inaccurate estimation of the architectural gradients, based on which we propose an amended estimation method. Mathematically, our method guarantees a bounded error from the true gradients while the original estimation does not. Our approach bridges the gap from two aspects, namely, amending the estimation on the architectural gradients, and unifying the hyper-parameter settings in the search and re-training stages. Experiments on CIFAR10, ImageNet, and Penn Treebank demonstrate that our approach largely improves search stability and, more importantly, enables DARTS-based approaches to explore much larger search spaces that have not been investigated before.\n", "keywords": "Neural Architecture Search;DARTS;Gradient Estimation", "primary_area": "", "supplementary_material": "", "author": "Kaifeng Bi;Lingxi Xie;Changping Hu;Xin Chen;Longhui Wei;Qi Tian", "authorids": "bikaifeng1@huawei.com;~Lingxi_Xie1;hcp06@mails.tsinghua.edu.cn;chenxin180@huawei.com;~Longhui_Wei1;~Qi_Tian3", "gender": ";M;;;M;M", "homepage": ";http://lingxixie.com/;;;https://joinwei-pku.github.io/longhuiwei.github.io/;https://www.qitian1987.com/index.html", "dblp": ";123/2869;;;206/6179;78/1467-1.html", "google_scholar": ";EEMm7hwAAAAJ;;;thhnAhIAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;0000-0002-7252-5047", "linkedin": ";;;;;", "or_profile": "bikaifeng1@huawei.com;~Lingxi_Xie1;hcp06@mails.tsinghua.edu.cn;chenxin180@huawei.com;~Longhui_Wei1;~Qi_Tian3", "aff": ";Huawei Technologies Ltd.;;;University of Science and Technology of China;Huawei Technologies Ltd.", "aff_domain": ";huawei.com;;;ustc.edu.cn;huawei.com", "position": ";Researcher;;;PhD student;Principal Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=67ChnrC0ybo", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;5;3;4", "wc_review": "849;605;208;289", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 487.75, 255.93492825325737 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.42640143271122083, "gs_citation": 65, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14776804219341807316&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;0", "aff_unique_norm": "Huawei;University of Science and Technology of China", "aff_unique_dep": "Huawei Technologies;", "aff_unique_url": "https://www.huawei.com;http://www.ustc.edu.cn", "aff_unique_abbr": "Huawei;USTC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "67q9f8gChCF", "title": "Learning Efficient Planning-based Rewards for Imitation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Imitation learning from limited demonstrations is challenging. Most inverse reinforcement learning (IRL) methods are unable to perform as good as the demonstrator, especially in a high-dimensional environment, e.g, the Atari domain. To address this challenge, we propose a novel reward learning method, which streamlines a differential planning module with dynamics modeling. Our method learns useful planning computations with a meaningful reward function that focuses on the resulting region of an agent executing an action. Such a planning-based reward function leads to policies with better generalization ability. Empirical results with multiple network architectures and reward instances show that our method can outperform state-of-the-art IRL methods on multiple Atari games and continuous control tasks. Our method achieves performance that is averagely 1,139.1% of the demonstration. ", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/ed05b25076f28f5ecc1253e5a4cb0ed4e623535c.zip", "author": "Xingrui Yu;Yueming Lyu;Ivor Tsang", "authorids": "~Xingrui_Yu1;~Yueming_Lyu1;~Ivor_Tsang1", "gender": "M;M;M", "homepage": "https://xingruiyu.github.io/;https://yueminglyu.github.io/;https://www.a-star.edu.sg/cfar/about-cfar/management/prof-ivor-tsang", "dblp": "211/1926;;35/5873", "google_scholar": "a1UeOvUAAAAJ;uQXB6-oAAAAJ;rJMOlVsAAAAJ", "orcid": "0000-0002-8941-2698;;", "linkedin": "xingrui-yu-180450114/;;", "or_profile": "~Xingrui_Yu1;~Yueming_Lyu1;~Ivor_W_Tsang1", "aff": "University of Technology Sydney;University of Technology Sydney;University of Technology Sydney", "aff_domain": "uts.edu.au;uts.edu.au;uts.edu.au", "position": "PhD student;PhD student;Full Professor", "bibtex": "@misc{\nyu2021learning,\ntitle={Learning Efficient Planning-based Rewards for Imitation Learning},\nauthor={Xingrui Yu and Yueming Lyu and Ivor Tsang},\nyear={2021},\nurl={https://openreview.net/forum?id=67q9f8gChCF}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=67q9f8gChCF", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;3;2;3", "wc_review": "807;502;293;437", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "2005;895;794;683", "reply_reviewers": "0;0;0;0", "reply_authors": "3;2;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 509.75, 187.54382821090115 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1094.25, 531.1409299799818 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Uz6N7-0xic0J:scholar.google.com/&scioq=Learning+Efficient+Planning-based+Rewards+for+Imitation+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Technology Sydney", "aff_unique_dep": "", "aff_unique_url": "https://www.uts.edu.au", "aff_unique_abbr": "UTS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Australia" }, { "id": "68747kJ0qKt", "title": "On Dropout, Overfitting, and Interaction Effects in Deep Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We examine Dropout through the perspective of interactions. Given $N$ variables, there are $\\mathcal{O}(N^2)$ possible pairwise interactions, $\\mathcal{O}(N^3)$ possible 3-way interactions, i.e. $\\mathcal{O}(N^k)$ possible interactions of $k$ variables. Conversely, the probability of an interaction of $k$ variables surviving Dropout at rate $p$ is $\\mathcal{O}((1-p)^k)$. In this paper, we show that these rates cancel, and as a result, Dropout selectively regularizes against learning higher-order interactions. We prove this new perspective analytically for Input Dropout and empirically for Activation Dropout. This perspective on Dropout has several practical implications: (1) higher Dropout rates should be used when we need stronger regularization against spurious high-order interactions, (2) caution must be used when interpreting Dropout-based feature saliency measures, and (3) networks trained with Input Dropout are biased estimators, even with infinite data. We also compare Dropout to regularization via weight decay and early stopping and find that it is difficult to obtain the same regularization against high-order interactions with these methods.", "keywords": "Dropout;Interaction Effects;Neural Networks;Functional ANOVA", "primary_area": "", "supplementary_material": "", "author": "Ben Lengerich;Eric Xing;Rich Caruana", "authorids": "~Ben_Lengerich1;~Eric_Xing1;~Rich_Caruana1", "gender": ";M;M", "homepage": "http://web.mit.edu/~blengeri/www/;http://www.cs.cmu.edu/~epxing/;", "dblp": "203/8210;36/3855;", "google_scholar": "a1Ck1CMAAAAJ;https://scholar.google.com.tw/citations?user=5pKTRxEAAAAJ;https://scholar.google.com/scholar?hl=en", "orcid": "0000-0001-8690-9554;;", "linkedin": ";;", "or_profile": "~Ben_Lengerich1;~Eric_Xing1;~Rich_Caruana1", "aff": "Massachusetts Institute of Technology;School of Computer Science, Carnegie Mellon University;", "aff_domain": "mit.edu;cs.cmu.edu;", "position": "Postdoc;Full Professor;", "bibtex": "@misc{\nlengerich2021on,\ntitle={On Dropout, Overfitting, and Interaction Effects in Deep Neural Networks},\nauthor={Ben Lengerich and Eric Xing and Rich Caruana},\nyear={2021},\nurl={https://openreview.net/forum?id=68747kJ0qKt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer5;AnonReviewer3", "site": "https://openreview.net/forum?id=68747kJ0qKt", "pdf_size": 0, "rating": "4;4;7", "confidence": "3;4;5", "wc_review": "296;1620;320", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "184;629;97", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 1.4142135623730951 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 745.3333333333334, 618.5603356899704 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 303.3333333333333, 233.00405337437564 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.8660254037844387, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18282622351597402461&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Massachusetts Institute of Technology;Carnegie Mellon University", "aff_unique_dep": ";School of Computer Science", "aff_unique_url": "https://web.mit.edu;https://www.cmu.edu", "aff_unique_abbr": "MIT;CMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "68NusZqF0da", "title": "Weakly Supervised Formula Learner for Solving Mathematical Problems", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Mathematical reasoning task is a subset of the natural language question answering task. Several approaches have been proposed in existing work to solve mathematical reasoning problems. Among them, the two-phase solution to first predict formulas from questions and then calculate answers from formulas has achieved desirable performance. However, this design results in the reliance on annotated formulas as the intermediate labels for training. In this work, we put forward a brand-new idea to enable the models to explore the formulas by themselves to eliminate the reliance on formula annotations. To realize this, we proposed Weakly Supervised Formula Leaner, a learning framework that can autonomously search for the optimal formulas through the training process and continuously update itself. Our experiment is conducted on a typical mathematical dataset MathQA. The result shows that our models learning with weak supervision outperform the baseline methods.", "keywords": "mathematical reasoning;weakly supervised learning", "primary_area": "", "supplementary_material": "/attachment/c52a07c31d07850cd7529c05bb1a6b2a2f1aadaf.zip", "author": "Yuxuan Wu;Hideki Nakayama", "authorids": "~Yuxuan_Wu1;~Hideki_Nakayama1", "gender": "M;M", "homepage": "http://www.nlab.ci.i.u-tokyo.ac.jp/members-e.html;https://www.nlab.ci.i.u-tokyo.ac.jp/index-e.html", "dblp": "207/7149;09/1592", "google_scholar": ";lZAYGJoAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Yuxuan_Wu1;~Hideki_Nakayama1", "aff": "The University of Tokyo, Tokyo Institute of Technology;The University of Tokyo", "aff_domain": "u-tokyo.ac.jp;u-tokyo.ac.jp", "position": "PhD student;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=68NusZqF0da", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3096951935982501780&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_campus_unique_index": "0", "aff_campus_unique": "Tokyo;", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "id": "69EFStdgTD2", "title": "Secure Byzantine-Robust Machine Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Increasingly machine learning systems are being deployed to edge servers and devices (e.g. mobile phones) and trained in a collaborative manner. Such distributed/federated/decentralized training raises a number of concerns about the robustness, privacy, and security of the procedure. While extensive work has been done in tackling with robustness, privacy, or security individually, their combination has rarely been studied. In this paper, we propose a secure multi-server protocol that offers both input privacy and Byzantine-robustness. In addition, this protocol is communication-efficient, fault-tolerant, and enjoys local differential privacy.", "keywords": "Byzantine robustness;distributed learning;secure aggregation", "primary_area": "", "supplementary_material": "/attachment/bdfc7c6927b9c338bd2e80795457c56a7c9a937e.zip", "author": "Lie He;Sai Praneeth Karimireddy;Martin Jaggi", "authorids": "~Lie_He1;~Sai_Praneeth_Karimireddy1;~Martin_Jaggi1", "gender": "M;M;M", "homepage": "https://liehe.github.io/;https://spkreddy.org;https://mlo.epfl.ch", "dblp": "225/5245;217/3342;17/4402", "google_scholar": "rIAYxaMAAAAJ;wKJeOQoAAAAJ;https://scholar.google.ch/citations?user=r1TJBr8AAAAJ", "orcid": ";;0000-0003-1579-5558", "linkedin": ";;", "or_profile": "~Lie_He1;~Sai_Praneeth_Karimireddy1;~Martin_Jaggi1", "aff": "EPFL - EPF Lausanne;Swiss Federal Institute of Technology Lausanne;EPFL", "aff_domain": "epfl.ch;epfl.ch;epfl.ch", "position": "PhD student;PhD student;Assistant Professor", "bibtex": "@misc{\nhe2021secure,\ntitle={Secure Byzantine-Robust Machine Learning},\nauthor={Lie He and Sai Praneeth Karimireddy and Martin Jaggi},\nyear={2021},\nurl={https://openreview.net/forum?id=69EFStdgTD2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=69EFStdgTD2", "pdf_size": 0, "rating": "3;5;6;7", "confidence": "5;2;4;3", "wc_review": "304;362;287;623", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "658;525;160;173", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 1.479019945774904 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 394.0, 135.10551432121488 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 379.0, 217.68899834396777 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.529150262212918, "gs_citation": 74, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5376880607059530852&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "EPFL;Swiss Federal Institute of Technology Lausanne", "aff_unique_dep": ";", "aff_unique_url": "https://www.epfl.ch;https://www.epfl.ch", "aff_unique_abbr": "EPFL;EPFL", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Lausanne;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Switzerland" }, { "title": "Partitioned Learned Bloom Filters", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2672", "id": "6BRLOfrMhW", "poster": "", "openreview": "https://openreview.net/forum?id=6BRLOfrMhW", "slides": "https://iclr.cc/virtual/2021/poster/2672", "video": "https://iclr.cc/virtual/2021/poster/2672", "author_site": "Kapil Vaidya, Eric Knorr, Michael Mitzenmacher, Tim Kraska", "tldr": "", "abstract": "Bloom filters are space-efficient probabilistic data structures that are used to test whether an element is a member of a set, and may return false positives. Recently, variations referred to as learned Bloom filters were developed that can provide improved performance in terms of the rate of false positives, by using a learned model for the represented set. However, previous methods for learned Bloom filters do not take full advantage of the learned model. Here we show how to frame the problem of optimal model utilization as an optimization problem, and using our framework derive algorithms that can achieve near-optimal performance in many cases.", "keywords": "optimization;data structures;algorithms;theory;learned algorithms", "primary_area": "", "supplementary_material": "", "author": "Kapil Vaidya;Eric Knorr;Michael Mitzenmacher;Tim Kraska", "authorids": "~Kapil_Vaidya1;eric_knorr@g.harvard.edu;~Michael_Mitzenmacher1;~Tim_Kraska1", "gender": ";;M;M", "homepage": ";;;", "dblp": ";;74/838;26/6037", "google_scholar": "https://scholar.google.com/citations?hl=en;;e8aRmAsAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Kapil_Vaidya1;eric_knorr@g.harvard.edu;~Michael_Mitzenmacher1;~Tim_Kraska1", "aff": "Massachusetts Institute of Technology;;Harvard University;Massachusetts Institute of Technology", "aff_domain": "mit.edu;;harvard.edu;mit.edu", "position": "PhD student;;Full Professor;Associate Professor", "bibtex": "@inproceedings{\nvaidya2021partitioned,\ntitle={Partitioned Learned Bloom Filters},\nauthor={Kapil Vaidya and Eric Knorr and Michael Mitzenmacher and Tim Kraska},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6BRLOfrMhW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7", "confidence": "3;3;3", "wc_review": "350;402;217", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "291;359;166", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 323.0, 77.90164739379179 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 272.0, 79.92913528036361 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 68, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17513313421763897245&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=6BRLOfrMhW", "email": "mit.edu;;harvard.edu;mit.edu", "author_num": 4, "aff_unique_index": "0;1;0", "aff_unique_norm": "Massachusetts Institute of Technology;Harvard University", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://www.harvard.edu", "aff_unique_abbr": "MIT;Harvard", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "6BWY3yDdDi", "title": "A Truly Constant-time Distribution-aware Negative Sampling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Softmax classifiers with a very large number of classes naturally occur in many applications such as natural language processing and information retrieval. The calculation of full-softmax is very expensive from the computational and energy perspective. There have been a variety of sampling approaches to overcome this challenge, popularly known as negative sampling (NS). Ideally, NS should sample negative classes from a distribution that is dependent on the input data, the current parameters, and the correct positive class. Unfortunately, due to the dynamically updated parameters and data samples, there does not exist any sampling scheme that is truly adaptive and also samples the negative classes in constant time every iteration. Therefore, alternative heuristics like random sampling, static frequency-based sampling, or learning-based biased sampling; which primarily trade either the sampling cost or the adaptivity of samples per iteration, are adopted. In this paper, we show a class of distribution where the sampling scheme is truly adaptive and provably generates negative samples in constant time. We demonstrate a negative sampling implementation that is significantly faster, in terms of wall clock time, compared to the most optimized TensorFlow implementations of standard softmax or other sampling approaches on the best available GPUs (V100s).", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Shabnam Daghaghi;Tharun Medini;Beidi Chen;Mengnan Zhao;Anshumali Shrivastava", "authorids": "~Shabnam_Daghaghi1;~Tharun_Medini1;~Beidi_Chen1;mengnan.zhao@rice.edu;~Anshumali_Shrivastava1", "gender": "F;M;F;;M", "homepage": ";https://tharun24.github.io/;https://www.andrew.cmu.edu/user/beidic/;;https://www.cs.rice.edu/~as143/", "dblp": ";;192/1339;;63/9828", "google_scholar": ";-ZW9lF4AAAAJ;;;https://scholar.google.com.tw/citations?user=SGT23RAAAAAJ", "orcid": ";;;;", "linkedin": ";tharunmedini;;;", "or_profile": "~Shabnam_Daghaghi1;~Tharun_Medini1;~Beidi_Chen1;mengnan.zhao@rice.edu;~Anshumali_Shrivastava1", "aff": "Rice University;Rice University;Stanford University;;Rice University", "aff_domain": "rice.edu;rice.edu;stanford.edu;;rice.edu", "position": "PhD student;PhD student;Postdoc;;Assistant Professor", "bibtex": "@misc{\ndaghaghi2021a,\ntitle={A Truly Constant-time Distribution-aware Negative Sampling},\nauthor={Shabnam Daghaghi and Tharun Medini and Beidi Chen and Mengnan Zhao and Anshumali Shrivastava},\nyear={2021},\nurl={https://openreview.net/forum?id=6BWY3yDdDi}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=6BWY3yDdDi", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "4;4;4;2", "wc_review": "295;593;289;110", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "604;777;428;26", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 321.75, 173.35134121200215 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 458.75, 278.65693513709647 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8783100656536799, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:yCZr83IP0p0J:scholar.google.com/&scioq=A+Truly+Constant-time+Distribution-aware+Negative+Sampling&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "Rice University;Stanford University", "aff_unique_dep": ";", "aff_unique_url": "https://www.rice.edu;https://www.stanford.edu", "aff_unique_abbr": "Rice;Stanford", "aff_campus_unique_index": "1", "aff_campus_unique": ";Stanford", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2564", "id": "6DOZ8XNNfGN", "poster": "", "openreview": "https://openreview.net/forum?id=6DOZ8XNNfGN", "slides": "https://iclr.cc/virtual/2021/poster/2564", "video": "https://iclr.cc/virtual/2021/poster/2564", "author_site": "Elan Markowitz, Keshav Balasubramanian, Mehrnoosh Mirtaheri, Sami Abu-El-Haija, Bryan Perozzi, Greg Ver Steeg, Aram Galstyan", "tldr": "", "abstract": "Graph Representation Learning (GRL) methods have impacted fields from chemistry to social science. However, their algorithmic implementations are specialized to specific use-cases e.g. \"message passing\" methods are run differently from \"node embedding\" ones. Despite their apparent differences, all these methods utilize the graph structure, and therefore, their learning can be approximated with stochastic graph traversals. We propose Graph Traversal via Tensor Functionals (GTTF), a unifying meta-algorithm framework for easing the implementation of diverse graph algorithms and enabling transparent and efficient scaling to large graphs. GTTF is founded upon a data structure (stored as a sparse tensor) and a stochastic graph traversal algorithm (described using tensor operations). The algorithm is a functional that accept two functions, and can be specialized to obtain a variety of GRL models and objectives, simply by changing those two functions. We show for a wide class of methods, our algorithm learns in an unbiased fashion and, in expectation, approximates the learning as if the specialized implementations were run directly.\nWith these capabilities, we scale otherwise non-scalable methods to set state-of-the-art on large graph datasets while being more efficient than existing GRL libraries -- with only a handful of lines of code for each method specialization.", "keywords": "Graph;Learning;Algorithm;Scale;Message Passing;Node Embeddings", "primary_area": "", "supplementary_material": "", "author": "Elan Sopher Markowitz;Keshav Balasubramanian;Mehrnoosh Mirtaheri;Sami Abu-El-Haija;Bryan Perozzi;Greg Ver Steeg;Aram Galstyan", "authorids": "~Elan_Sopher_Markowitz2;keshavba@usc.edu;mehrnoom@usc.edu;~Sami_Abu-El-Haija1;~Bryan_Perozzi1;~Greg_Ver_Steeg1;~Aram_Galstyan1", "gender": "M;;;M;;M;M", "homepage": "https://elanmarkowitz.github.io/;;;http://www.haija.org;http://www.perozzi.net/;https://profiles.ucr.edu/app/home/profile/gregoryv;http://www.isi.edu/~galstyan", "dblp": "284/9401;;;127/6620;91/10813;82/9058;16/3411", "google_scholar": ";;;t80qlTcAAAAJ;rZgbMs4AAAAJ;goLucoIAAAAJ;rJTwW0MAAAAJ", "orcid": ";;;;;0000-0002-0793-141X;", "linkedin": ";;;samihaija/;;;aram-galstyan-4a01373/", "or_profile": "~Elan_Sopher_Markowitz2;keshavba@usc.edu;mehrnoom@usc.edu;~Sami_Abu-El-Haija1;~Bryan_Perozzi1;~Greg_Ver_Steeg1;~Aram_Galstyan1", "aff": "University of Southern California;;;University of Southern California;Google;USC/ISI;Amazon Alexa", "aff_domain": "usc.edu;;;usc.edu;google.com;isi.edu;amazon.com", "position": "PhD student;;;PhD student;Researcher;Associate Professor;Scholar", "bibtex": "@inproceedings{\nmarkowitz2021graph,\ntitle={Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning},\nauthor={Elan Sopher Markowitz and Keshav Balasubramanian and Mehrnoosh Mirtaheri and Sami Abu-El-Haija and Bryan Perozzi and Greg Ver Steeg and Aram Galstyan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6DOZ8XNNfGN}\n}", "github": "[![github](/images/github_icon.svg) isi-usc-edu/gttf](https://github.com/isi-usc-edu/gttf)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "3;3;2;3", "wc_review": "136;760;241;434", "wc_reply_reviewers": "0;0;45;0", "wc_reply_authors": "195;438;519;307", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;3;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 392.75, 237.4461781120092 ], "wc_reply_reviewers_avg": [ 11.25, 19.48557158514987 ], "wc_reply_authors_avg": [ 364.75, 123.80301894541991 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 26, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4421735277125867362&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=6DOZ8XNNfGN", "email": "usc.edu;;;usc.edu;google.com;isi.edu;amazon.com", "author_num": 7, "aff_unique_index": "0;0;1;0;2", "aff_unique_norm": "University of Southern California;Google;Amazon", "aff_unique_dep": ";Google;Amazon Alexa", "aff_unique_url": "https://www.usc.edu;https://www.google.com;https://www.amazon.com/alexa", "aff_unique_abbr": "USC;Google;Amazon Alexa", "aff_campus_unique_index": "0;0;1;2", "aff_campus_unique": "Los Angeles;Mountain View;ISI;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "DOP: Off-Policy Multi-Agent Decomposed Policy Gradients", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2751", "id": "6FqKiVAdI3Y", "poster": "", "openreview": "https://openreview.net/forum?id=6FqKiVAdI3Y", "slides": "https://iclr.cc/virtual/2021/poster/2751", "video": "https://iclr.cc/virtual/2021/poster/2751", "author_site": "Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang", "tldr": "", "abstract": "Multi-agent policy gradient (MAPG) methods recently witness vigorous progress. However, there is a significant performance discrepancy between MAPG methods and state-of-the-art multi-agent value-based approaches. In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP). This method introduces the idea of value function decomposition into the multi-agent actor-critic framework. Based on this idea, DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment in both discrete and continuous action spaces. We formally show that DOP critics have sufficient representational capability to guarantee convergence. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that DOP outperforms both state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms. Demonstrative videos are available at https://sites.google.com/view/dop-mapg/.", "keywords": "Multi-Agent Reinforcement Learning;Multi-Agent Policy Gradients", "primary_area": "", "supplementary_material": "/attachment/adce0b66fa512b2b4cff45182a99e3742d6ed7f0.zip", "author": "Yihan Wang;Beining Han;Tonghan Wang;Heng Dong;Chongjie Zhang", "authorids": "~Yihan_Wang1;~Beining_Han1;~Tonghan_Wang1;~Heng_Dong1;~Chongjie_Zhang1", "gender": "M;M;M;M;", "homepage": ";;https://tonghanwang.github.io/;https://drdh.cc;", "dblp": "121/1314;266/7819;175/6039-1.html;387/8933.html;29/6693", "google_scholar": "rYc6BsYAAAAJ;LVjU7xIAAAAJ;-AR1yc4AAAAJ;K26AU1EAAAAJ;LjxqXycAAAAJ", "orcid": ";;;0000-0001-7548-3455;", "linkedin": ";%E8%B4%9D%E5%AE%81-%E9%9F%A9-b79204207/details/experience/;;;", "or_profile": "~Yihan_Wang1;~Beining_Han1;~Tonghan_Wang1;~Heng_Dong1;~Chongjie_Zhang1", "aff": "Tsinghua University;IIIS, Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University", "aff_domain": "tsinghua.edu.cn;mails.tsinghua.edu.cn;tsinghua.edu.cn;tsinghua.edu.cn;tsinghua.edu.cn", "position": "Undergrad student;Undergrad student;MS student;PhD student;Assistant Professor", "bibtex": "@inproceedings{\nwang2021dop,\ntitle={{\\{}DOP{\\}}: Off-Policy Multi-Agent Decomposed Policy Gradients},\nauthor={Yihan Wang and Beining Han and Tonghan Wang and Heng Dong and Chongjie Zhang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6FqKiVAdI3Y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "3;7;7;9", "confidence": "5;4;3;4", "wc_review": "485;246;322;143", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1627;278;285;212", "reply_reviewers": "0;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.5, 2.179449471770337 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 299.0, 124.7697880097582 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 600.5, 593.3340121718963 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.6488856845230502, "gs_citation": 158, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14824920745881368408&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=6FqKiVAdI3Y", "email": "tsinghua.edu.cn;mails.tsinghua.edu.cn;tsinghua.edu.cn;tsinghua.edu.cn;tsinghua.edu.cn", "author_num": 5, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "Tsinghua University", "aff_unique_dep": "", "aff_unique_url": "https://www.tsinghua.edu.cn", "aff_unique_abbr": "THU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "China" }, { "id": "6FsCHsZ66Fp", "title": "Towards certifying $\\ell_\\infty$ robustness using Neural networks with $\\ell_\\infty$-dist Neurons", "track": "main", "status": "Reject", "tldr": "", "abstract": "It is well-known that standard neural networks, even with a high classification accuracy, are vulnerable to small $\\ell_\\infty$ perturbations. Many attempts have been tried to learn a network that can resist such adversarial attacks. However, most previous works either can only provide empirical verification of the defense to a particular attack method or can only develop a theoretical guarantee of the model robustness in limited scenarios. In this paper, we develop a theoretically principled neural network that inherently resists $\\ell_\\infty$ perturbations. In particular, we design a novel neuron that uses $\\ell_\\infty$ distance as its basic operation, which we call $\\ell_\\infty$-dist neuron. We show that the $\\ell_\\infty$-dist neuron is naturally a 1-Lipschitz function with respect to the $\\ell_\\infty$ norm, and the neural networks constructed with $\\ell_\\infty$-dist neuron ($\\ell_{\\infty}$-dist Nets) enjoy the same property. This directly provides a theoretical guarantee of the certified robustness based on the margin of the prediction outputs. We further prove that the $\\ell_{\\infty}$-dist Nets have enough expressiveness power to approximate any 1-Lipschitz function, and can generalize well as the robust test error can be upper-bounded by the performance of a large margin classifier on the training data. Preliminary experiments show that even without the help of adversarial training, the learned networks with high classification accuracy are already provably robust.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Bohang Zhang;Zhou Lu;Tianle Cai;Di He;Liwei Wang", "authorids": "zhangbohang@pku.edu.cn;~Zhou_Lu1;~Tianle_Cai1;~Di_He1;~Liwei_Wang1", "gender": ";;M;M;M", "homepage": ";https://leozoroaster.github.io/;https://tianle.website;https://dihe-pku.github.io/;http://www.liweiwang-pku.com/", "dblp": ";68/11524;241/9458;74/184;", "google_scholar": ";17_nX_kAAAAJ;CvwLRSMAAAAJ;https://scholar.google.co.jp/citations?user=orVoz4IAAAAJ;VZHxoh8AAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "zhangbohang@pku.edu.cn;~Zhou_Lu1;~Tianle_Cai1;~Di_He1;~Liwei_Wang1", "aff": ";Princeton University;Princeton University;Microsoft;Peking University", "aff_domain": ";princeton.edu;princeton.edu;microsoft.com;pku.edu.cn", "position": ";PhD student;PhD student;Senior Researcher;Full Professor", "bibtex": "@misc{\nzhang2021towards,\ntitle={Towards certifying {\\$}{\\textbackslash}ell{\\_}{\\textbackslash}infty{\\$} robustness using Neural networks with {\\$}{\\textbackslash}ell{\\_}{\\textbackslash}infty{\\$}-dist Neurons},\nauthor={Bohang Zhang and Zhou Lu and Tianle Cai and Di He and Liwei Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=6FsCHsZ66Fp}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=6FsCHsZ66Fp", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "3;5;5;4", "wc_review": "455;322;540;715", "wc_reply_reviewers": "0;243;160;0", "wc_reply_authors": "112;245;367;361", "reply_reviewers": "0;1;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 508.0, 142.54648364656353 ], "wc_reply_reviewers_avg": [ 100.75, 104.93658799484572 ], "wc_reply_authors_avg": [ 271.25, 104.01051629522853 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0909090909090909, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "Princeton University;Microsoft;Peking University", "aff_unique_dep": ";Microsoft Corporation;", "aff_unique_url": "https://www.princeton.edu;https://www.microsoft.com;http://www.pku.edu.cn", "aff_unique_abbr": "Princeton;Microsoft;Peking U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "United States;China" }, { "id": "6FtFPKw8aLj", "title": "Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures", "track": "main", "status": "Reject", "tldr": "", "abstract": "There are many cluster similarity indices used to evaluate clustering algorithms, and choosing the best one for a particular task remains an open problem. We demonstrate that this problem is crucial: there are many disagreements among the indices, these disagreements do affect which algorithms are chosen in applications, and this can lead to degraded performance in real-world systems. We propose a theoretical solution to this problem: we develop a list of desirable properties and theoretically verify which indices satisfy them. This allows for making an informed choice: given a particular application, one can first make a selection of properties that are desirable for a given application and then identify indices satisfying these. We observe that many popular indices have significant drawbacks. Instead, we advocate using other ones that are not so widely adopted but have beneficial properties.", "keywords": "cluster similarity indices;cluster validation;clustering;community detection;constant baseline", "primary_area": "", "supplementary_material": "/attachment/133f3ee2136e771c9c7b5633620e7887ba60d11b.zip", "author": "Martijn G\u00f6sgens;Liudmila Prokhorenkova;Aleksei Tikhonov", "authorids": "~Martijn_G\u00f6sgens1;~Liudmila_Prokhorenkova1;~Aleksei_Tikhonov1", "gender": "M;F;M", "homepage": "https://martijngosgens.nl;;http://altsoph.com", "dblp": "254/0995;45/11468;82/8978", "google_scholar": "iusaOxAAAAAJ;https://scholar.google.ru/citations?user=6JyZlSEAAAAJ;X6vNzpoAAAAJ", "orcid": "0000-0002-7197-7682;;", "linkedin": "martijn-g%C3%B6sgens-175165100/;;altsoph/", "or_profile": "~Martijn_G\u00f6sgens1;~Liudmila_Prokhorenkova1;~Aleksei_Tikhonov1", "aff": "Eindhoven University of Technology;Moscow Institute of Physics and Technology;Yandex", "aff_domain": "tue.nl;mipt.edu;yandex.ru", "position": "PhD student;Researcher;Researcher", "bibtex": "@misc{\ng{\\\"o}sgens2021systematic,\ntitle={Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures},\nauthor={Martijn G{\\\"o}sgens and Liudmila Prokhorenkova and Aleksei Tikhonov},\nyear={2021},\nurl={https://openreview.net/forum?id=6FtFPKw8aLj}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=6FtFPKw8aLj", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "3;4;4;3", "wc_review": "429;211;473;782", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "788;306;258;582", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;2", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 473.75, 203.75153373655866 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 483.5, 214.92963964981655 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 29, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2257720371580982697&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;2", "aff_unique_norm": "Eindhoven University of Technology;Moscow Institute of Physics and Technology;Yandex", "aff_unique_dep": ";;", "aff_unique_url": "https://www.tue.nl;https://www.mipt.ru/en;https://yandex.com", "aff_unique_abbr": "TU/e;MIPT;Yandex", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1", "aff_country_unique": "Netherlands;Russian Federation" }, { "id": "6GkL6qM3LV", "title": "N-Bref : A High-fidelity Decompiler Exploiting Programming Structures", "track": "main", "status": "Reject", "tldr": "", "abstract": "Binary decompilation is a powerful technique for analyzing and understanding software, when source code is unavailable. It is a critical problem in the computer security domain. With the success of neural machine translation (NMT), recent efforts on neural-based decompiler show promising results compared to traditional approaches. However, several key challenges remain: (i) Prior neural-based decompilers focus on simplified programs without considering sophisticated yet widely-used data types such as pointers; furthermore, many high-level expressions map to the same low-level code (expression collision), which incurs critical decompiling performance degradation; (ii) State-of-the-art NMT models(e.g., transformer and its variants) mainly deal with sequential data; this is inefficient for decompilation, where the input and output data are highly structured. In this paper, we propose N-Bref, a new framework for neural decompilers that addresses the two aforementioned challenges with two key design principles: (i)N-Bref designs a structural transformer with three key design components for better comprehension of structural data \u2013 an assembly encoder, an abstract syntax tree encoder, and a tree decoder, extending transformer models in the context of decompilation. (ii) N-Bref introduces a program generation tool that can control the complexity of code generation and removes expression collisions. Extensive experiments demonstrate that N-Bref outperforms previous neural-based decompilers by a margin of 6.1%/8.8% accuracy in datatype recovery and source code generation. In particular, N-Bref decompiled human-written Leetcode programs with complex library calls and data types in high accuracy.", "keywords": "Programming Language;Reverse engineering;neural machine translation;machine learning for system", "primary_area": "", "supplementary_material": "", "author": "Cheng Fu;Kunlin Yang;Xinyun Chen;Yuandong Tian;Jishen Zhao", "authorids": "~Cheng_Fu1;k6yang@eng.ucsd.edu;~Xinyun_Chen1;~Yuandong_Tian1;~Jishen_Zhao1", "gender": "M;;;M;F", "homepage": "https://chengfu0118.github.io;;;http://yuandong-tian.com;https://cseweb.ucsd.edu/~jzhao/", "dblp": ";;;t/YuandongTian;66/8314.html", "google_scholar": "EIhhyj8AAAAJ;;;0mgEF28AAAAJ;https://scholar.google.com.tw/citations?user=MDuCskIAAAAJ", "orcid": ";;;0000-0003-4202-4847;", "linkedin": ";;;yuandongtian;", "or_profile": "~Cheng_Fu1;k6yang@eng.ucsd.edu;~Xinyun_Chen1;~Yuandong_Tian1;~Jishen_Zhao1", "aff": "University of California, San Diego, University of California, San Diego;;;Meta AI (FAIR);University of California, San Diego", "aff_domain": "eng.ucsd.edu;;;meta.com;ucsd.edu", "position": "PhD student;;;Research Scientist;Associate Professor", "bibtex": "@misc{\nfu2021nbref,\ntitle={N-Bref : A High-fidelity Decompiler Exploiting Programming Structures },\nauthor={Cheng Fu and Kunlin Yang and Xinyun Chen and Yuandong Tian and Jishen Zhao},\nyear={2021},\nurl={https://openreview.net/forum?id=6GkL6qM3LV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=6GkL6qM3LV", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "4;4;3;3", "wc_review": "1234;314;290;670", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "607;280;382;374", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 627.0, 381.39087561188455 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 410.75, 120.19437382839514 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8451542547285166, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17530011546463487957&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of California, San Diego;Meta", "aff_unique_dep": ";Facebook AI Research (FAIR)", "aff_unique_url": "https://www.ucsd.edu;https://ai.facebook.com", "aff_unique_abbr": "UCSD;Meta AI", "aff_campus_unique_index": "0;0", "aff_campus_unique": "San Diego;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "6HlaJSlQFEj", "title": "Small Input Noise is Enough to Defend Against Query-based Black-box Attacks", "track": "main", "status": "Reject", "tldr": "", "abstract": "While deep neural networks show unprecedented performance in various tasks, the vulnerability to adversarial examples hinders their deployment in safety-critical systems. Many studies have shown that attacks are also possible even in a black-box setting where an adversary cannot access the target model's internal information. Most black-box attacks are based on queries, each of which obtains the target model's output for an input, and many recent studies focus on reducing the number of required queries. In this paper, we pay attention to an implicit assumption of these attacks that the target model's output exactly corresponds to the query input. If some randomness is introduced into the model to break this assumption, query-based attacks may have tremendous difficulty in both gradient estimation and local search, which are the core of their attack process. From this motivation, we observe even a small additive input noise can neutralize most query-based attacks and name this simple yet effective approach Small Noise Defense (SND). We analyze how SND can defend against query-based black-box attacks and demonstrate its effectiveness against eight different state-of-the-art attacks with CIFAR-10 and ImageNet datasets. Even with strong defense ability, SND almost maintains the original clean accuracy and computational speed. SND is readily applicable to pre-trained models by adding only one line of code at the inference stage, so we hope that it will be used as a baseline of defense against query-based black-box attacks in the future.", "keywords": "Gaussian noise;input noise;adversarial defense;black-box attack;adversarial attack;query-based attack", "primary_area": "", "supplementary_material": "/attachment/475d397cd84f9f532ab011eead432920fceda885.zip", "author": "Junyoung Byun;Hyojun Go;Changick Kim", "authorids": "~Junyoung_Byun2;gohyojun15@kaist.ac.kr;~Changick_Kim1", "gender": "M;;M", "homepage": "https://cilabs.kaist.ac.kr/members/ph-d/junyoung-byun;;https://cilabs.kaist.ac.kr", "dblp": "236/1961;;40/5999", "google_scholar": "https://scholar.google.co.kr/citations?user=jwAH7WcAAAAJ;;https://scholar.google.co.kr/citations?user=ABH_2lcAAAAJ", "orcid": ";;", "linkedin": "junyoung-byun/;;", "or_profile": "~Junyoung_Byun2;gohyojun15@kaist.ac.kr;~Changick_Kim1", "aff": "Korea Advanced Institute of Science & Technology;;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;;kaist.ac.kr", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nbyun2021small,\ntitle={Small Input Noise is Enough to Defend Against Query-based Black-box Attacks},\nauthor={Junyoung Byun and Hyojun Go and Changick Kim},\nyear={2021},\nurl={https://openreview.net/forum?id=6HlaJSlQFEj}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=6HlaJSlQFEj", "pdf_size": 0, "rating": "3;4;6;7", "confidence": "5;5;3;3", "wc_review": "239;572;432;389", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1648;1614;1014;274", "reply_reviewers": "0;0;0;0", "reply_authors": "5;3;3;1", "rating_avg": [ 5.0, 1.5811388300841898 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 408.0, 118.73710456297981 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1137.5, 558.6919992267653 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 3.0, 1.4142135623730951 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.948683298050514, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=760595154063901256&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "id": "6IVdytR2W90", "title": "MSFM: Multi-Scale Fusion Module for Object Detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "Feature fusion is beneficial to object detection tasks in two folds. On one hand, detail and position information can be combined with semantic information when high and low-resolution features from shallow and deep layers are fused. On the other hand, objects can be detected in different scales, which improves the robustness of the framework. In this work, we present a Multi-Scale Fusion Module (MSFM) that extracts both detail and semantical information from a single input but at different scales within the same layer. Specifically, the input of the module will be resized into different scales on which position and semantic information will be processed, and then they will be rescaled back and combined with the module input. The MSFM is lightweight and can be used as a drop-in layer to many existing object detection frameworks. Experiments show that MSFM can bring +2.5% mAP improvement with only 2.4M extra parameters on Faster R-CNN with ResNet-50 FPN backbone on COCO Object Detection minival set, outperforming that with ResNet-101 FPN backbone without the module which obtains +2.0% mAP with 19.0M extra parameters. The best resulting model achieves a 45.7% mAP on test-dev set. Code will be available.", "keywords": "Feature Fusion;Object Detection;Multi-Scale", "primary_area": "", "supplementary_material": "", "author": "Xuesong Wang;Caisheng Wang", "authorids": "~Xuesong_Wang2;~Caisheng_Wang1", "gender": "M;", "homepage": ";", "dblp": ";", "google_scholar": "https://scholar.google.com/citations?hl=en;", "orcid": "0000-0001-8074-0436;", "linkedin": "xuesong-wang-david;", "or_profile": "~Xuesong_Wang2;~Caisheng_Wang1", "aff": ";Wayne State University", "aff_domain": ";wayne.edu", "position": ";", "bibtex": "@misc{\nwang2021msfm,\ntitle={{\\{}MSFM{\\}}: Multi-Scale Fusion Module for Object Detection},\nauthor={Xuesong Wang and Caisheng Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=6IVdytR2W90}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=6IVdytR2W90", "pdf_size": 0, "rating": "3;3;3;4", "confidence": "4;4;5;5", "wc_review": "314;130;189;400", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "17;17;17;17", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.25, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 258.25, 105.40961768263843 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 17.0, 0.0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10223881758388123397&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Wayne State University", "aff_unique_dep": "", "aff_unique_url": "https://wayne.edu", "aff_unique_abbr": "WSU", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "6KZ_kUVCfTa", "title": "Non-Markovian Predictive Coding For Planning In Latent Space", "track": "main", "status": "Reject", "tldr": "", "abstract": "High-dimensional observations are a major challenge in the application of model-based reinforcement learning (MBRL) to real-world environments. In order to handle high-dimensional sensory inputs, existing MBRL approaches use representation learning to map high-dimensional observations into a lower-dimensional latent space that is more amenable to dynamics estimation and planning. Crucially, the task-relevance and predictability of the learned representations play critical roles in the success of planning in latent space. In this work, we present Non-Markovian Predictive Coding (NMPC), an information-theoretic approach for planning from high-dimensional observations with two key properties: 1) it formulates a mutual information objective that prioritizes the encoding of task-relevant components of the environment; and 2) it employs a recurrent neural network capable of modeling non-Markovian latent dynamics. To demonstrate NMPC\u2019s ability to prioritize task-relevant information, we evaluate our new model on a challenging modification of standard DMControl tasks where the DMControl background is replaced with natural videos, containing complex but irrelevant information to the planning task. Our experiments show that NMPC is superior to existing methods in the challenging complex-background setting while remaining competitive with current state-of-the-art MBRL models in the standard setting.", "keywords": "representation learning;reinforcement learning;information theory", "primary_area": "", "supplementary_material": "", "author": "Tung Nguyen;Rui Shu;Tuan Pham;Hung Bui;Stefano Ermon", "authorids": "~Tung_Nguyen2;~Rui_Shu1;v.tuanpa36@vinai.io;~Hung_Bui1;~Stefano_Ermon1", "gender": "M;M;;M;M", "homepage": "https://tung-nd.github.io/;http://ruishu.github.io;;https://sites.google.com/site/buihhung/home;http://cs.stanford.edu/~ermon/", "dblp": ";146/0885;;;47/8135", "google_scholar": "https://scholar.google.com.vn/citations?user=F9mgq3sAAAAJ;UB7UZEYAAAAJ;;mDLwSZAAAAAJ;", "orcid": ";;;;", "linkedin": "tung-nguyen-40703616b/;;;;", "or_profile": "~Tung_Nguyen2;~Rui_Shu1;v.tuanpa36@vinai.io;~Hung_Bui1;~Stefano_Ermon1", "aff": "VinAI Research;Stanford University;;VinAI Research;Stanford University", "aff_domain": "vinai.io;stanford.edu;;vinai.io;stanford.edu", "position": "AI Research Resident;PhD student;;Principal Researcher;Assistant Professor", "bibtex": "@misc{\nnguyen2021nonmarkovian,\ntitle={Non-Markovian Predictive Coding For Planning In Latent Space},\nauthor={Tung Nguyen and Rui Shu and Tuan Pham and Hung Bui and Stefano Ermon},\nyear={2021},\nurl={https://openreview.net/forum?id=6KZ_kUVCfTa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=6KZ_kUVCfTa", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "5;5;4;3", "wc_review": "596;476;398;325", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "850;549;1002;786", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;2;2", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 448.75, 100.39266656484426 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 796.75, 163.14008550935603 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:OGWJxpJahg4J:scholar.google.com/&scioq=Non-Markovian+Predictive+Coding+For+Planning+In+Latent+Space&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0;1", "aff_unique_norm": "VinAI Research;Stanford University", "aff_unique_dep": ";", "aff_unique_url": "https://www.vinai.io/;https://www.stanford.edu", "aff_unique_abbr": "VinAI;Stanford", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Stanford", "aff_country_unique_index": "0;1;0;1", "aff_country_unique": "Vietnam;United States" }, { "id": "6Lhv4x2_9pw", "title": "Bayesian neural network parameters provide insights into the earthquake rupture physics.", "track": "main", "status": "Reject", "tldr": "", "abstract": "I present a simple but informative approach to gain insight into the Bayesian neural network (BNN) trained parameters. I used 2000 dynamic rupture simulations to train a BNN model to predict if an earthquake can break through a simple 2D fault. In each simulation, fault geometry, stress conditions, and friction parameters vary. The trained BNN parameters show that the network learns the physics of earthquake rupture. Neurons with high positive weights contribute to the earthquake rupture and vice versa. The results show that the stress condition of the fault plays a critical role in determining its strength. The stress is also the top source of uncertainty, followed by the dynamic friction coefficient. When stress and friction drop of a fault have higher value and are combined with higher weighted neurons, the prediction score increases, thus fault likely to be ruptured. Fault's width and height have the least amount of uncertainty, which may not be correct in a real scenario. The study shows that the potentiality of BNN that provides data patterns about rupture physics to make an additional information source for scientists studying the earthquake rupture.", "keywords": "Bayesian neural network;earthquake rupture;simulation;Explainable neural network", "primary_area": "", "supplementary_material": "", "author": "Sabber Ahamed", "authorids": "~Sabber_Ahamed1", "gender": "M", "homepage": "https://github.com/msahamed", "dblp": "", "google_scholar": "https://scholar.google.com/citations?hl=en", "orcid": "", "linkedin": "sabber-ahamed/", "or_profile": "~Sabber_Ahamed1", "aff": "Asurion", "aff_domain": "asurion.com", "position": "Data Scientist", "bibtex": "@misc{\nahamed2021bayesian,\ntitle={Bayesian neural network parameters provide insights into the earthquake rupture physics.},\nauthor={Sabber Ahamed},\nyear={2021},\nurl={https://openreview.net/forum?id=6Lhv4x2_9pw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=6Lhv4x2_9pw", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "5;4;3;4", "wc_review": "60;306;1069;475", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 477.5, 372.02049674715505 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:X60WcJhQTO8J:scholar.google.com/&scioq=Bayesian+neural+network+parameters+provide+insights+into+the+earthquake+rupture+physics.&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Asurion", "aff_unique_dep": "", "aff_unique_url": "https://www.asurion.com", "aff_unique_abbr": "", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "6M4c3WegNtX", "title": "Neural Ensemble Search for Uncertainty Estimation and Dataset Shift", "track": "main", "status": "Reject", "tldr": "", "abstract": "Ensembles of neural networks achieve superior performance compared to stand-alone networks not only in terms of predictive performance, but also uncertainty calibration and robustness to dataset shift. Diversity among networks is believed to be key for building strong ensembles, but typical approaches, such as \\emph{deep ensembles}, only ensemble different weight vectors of a fixed architecture. Instead, we propose two methods for constructing ensembles to exploit diversity among networks with \\emph{varying} architectures. We find that the resulting ensembles are indeed more diverse and also exhibit better uncertainty calibration, predictive performance and robustness to dataset shift in comparison with deep ensembles on a variety of classification tasks.", "keywords": "uncertainty estimation;deep ensemble;dataset shift;robustness;uncertainty calibration", "primary_area": "", "supplementary_material": "/attachment/ba56feb788a2a67d1ca4a0776d635333008c6318.zip", "author": "Sheheryar Zaidi;Arber Zela;Thomas Elsken;Chris Holmes;Frank Hutter;Yee Whye Teh", "authorids": "~Sheheryar_Zaidi1;~Arber_Zela1;~Thomas_Elsken1;chris.holmes@stats.ox.ac.uk;~Frank_Hutter1;~Yee_Whye_Teh1", "gender": "M;M;M;;M;M", "homepage": "https://shehzaidi.github.io/;https://ml.informatik.uni-freiburg.de/people/zela/index.html;;;http://ml.informatik.uni-freiburg.de/~hutter/;http://csml.stats.ox.ac.uk/people/teh/", "dblp": ";;;;89/5383;88/2483", "google_scholar": "P0aplJAAAAAJ;hD_6YioAAAAJ;tzDC8FQAAAAJ;;https://scholar.google.de/citations?user=YUrxwrkAAAAJ;https://scholar.google.co.uk/citations?user=y-nUzMwAAAAJ", "orcid": ";;;;0000-0002-2037-3694;", "linkedin": ";https://de.linkedin.com/in/arber-zela-ba85a2145;;;frank-hutter-9190b24b/;", "or_profile": "~Sheheryar_Zaidi1;~Arber_Zela1;~Thomas_Elsken1;chris.holmes@stats.ox.ac.uk;~Frank_Hutter1;~Yee_Whye_Teh1", "aff": "University of Oxford;University of Freiburg;University of Freiburg, Albert-Ludwigs-Universit\u00e4t Freiburg;;Albert-Ludwigs-Universit\u00e4t Freiburg;University of Oxford", "aff_domain": "ox.ac.uk;uni-freiburg.de;cs.uni-freiburg.de;;uni-freiburg.de;ox.ac.uk", "position": "PhD student;PhD student;PhD student;;Full Professor;Full Professor", "bibtex": "@misc{\nzaidi2021neural,\ntitle={Neural Ensemble Search for Uncertainty Estimation and Dataset Shift},\nauthor={Sheheryar Zaidi and Arber Zela and Thomas Elsken and Chris Holmes and Frank Hutter and Yee Whye Teh},\nyear={2021},\nurl={https://openreview.net/forum?id=6M4c3WegNtX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer5;AnonReviewer4", "site": "https://openreview.net/forum?id=6M4c3WegNtX", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "5;5;4;4", "wc_review": "472;153;294;684", "wc_reply_reviewers": "0;51;35;0", "wc_reply_authors": "616;675;480;1036", "reply_reviewers": "0;1;1;0", "reply_authors": "1;2;2;2", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 400.75, 198.79810738535716 ], "wc_reply_reviewers_avg": [ 21.5, 22.23173407541571 ], "wc_reply_authors_avg": [ 701.75, 205.5266101992635 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 98, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11225734588910887046&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;1;2;0", "aff_unique_norm": "University of Oxford;University of Freiburg;Albert-Ludwigs-Universit\u00e4t Freiburg", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ox.ac.uk;https://www.uni-freiburg.de;https://www.uni-freiburg.de", "aff_unique_abbr": "Oxford;UoF;Albert-Ludwigs-Universit\u00e4t", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Freiburg", "aff_country_unique_index": "0;1;1;1;0", "aff_country_unique": "United Kingdom;Germany" }, { "id": "6MaBrlQ5JM", "title": "THE EFFICACY OF L1 REGULARIZATION IN NEURAL NETWORKS", "track": "main", "status": "Reject", "tldr": "", "abstract": "A crucial problem in neural networks is to select the most appropriate number of hidden neurons and obtain tight statistical risk bounds. In this work, we present a new perspective towards the bias-variance tradeoff in neural networks. As an alternative to selecting the number of neurons, we theoretically show that $L_1$ regularization can control the generalization error and sparsify the input dimension. In particular, with an appropriate $L_1$ regularization on the output layer, the network can produce a statistical risk that is near minimax optimal. Moreover, an appropriate $L_1$ regularization on the input layer leads to a risk bound that does not involve the input data dimension. Our analysis is based on a new amalgamation of dimension-based and norm-based complexity analysis to bound the generalization error. A consequent observation from our results is that an excessively large number of neurons do not necessarily inflate generalization errors under a suitable regularization.\n", "keywords": "Model selection;Neural Network;Regularization", "primary_area": "", "supplementary_material": "", "author": "Gen Li;Yuantao Gu;Jie Ding", "authorids": "g-li16@mails.tsinghua.edu.cn;~Yuantao_Gu1;~Jie_Ding2", "gender": ";;M", "homepage": ";;http://jding.org", "dblp": ";;94/1825-2", "google_scholar": ";;ZyqvoqcAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "g-li16@mails.tsinghua.edu.cn;~Yuantao_Gu1;~Jie_Ding2", "aff": ";;University of Minnesota, Minneapolis", "aff_domain": ";;umn.edu", "position": ";;Assistant Professor", "bibtex": "@misc{\nli2021the,\ntitle={{\\{}THE{\\}} {\\{}EFFICACY{\\}} {\\{}OF{\\}} L1 {\\{}REGULARIZATION{\\}} {\\{}IN{\\}} {\\{}NEURAL{\\}} {\\{}NETWORKS{\\}}},\nauthor={Gen Li and Yuantao Gu and Jie Ding},\nyear={2021},\nurl={https://openreview.net/forum?id=6MaBrlQ5JM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=6MaBrlQ5JM", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;4;3", "wc_review": "428;271;636", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "51;168;263", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 445.0, 149.4947044770037 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 160.66666666666666, 86.70383818237549 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15705497329409385798&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Minnesota", "aff_unique_dep": "", "aff_unique_url": "https://www.minnesota.edu", "aff_unique_abbr": "UMN", "aff_campus_unique_index": "0", "aff_campus_unique": "Minneapolis", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "On the Universality of Rotation Equivariant Point Cloud Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3011", "id": "6NFBvWlRXaG", "poster": "", "openreview": "https://openreview.net/forum?id=6NFBvWlRXaG", "slides": "https://iclr.cc/virtual/2021/poster/3011", "video": "https://iclr.cc/virtual/2021/poster/3011", "author_site": "Nadav Dym, Haggai Maron", "tldr": "", "abstract": "Learning functions on point clouds has applications in many fields, including computer vision, computer graphics, physics, and chemistry. Recently, there has been a growing interest in neural architectures that are invariant or equivariant to all three shape-preserving transformations of point clouds: translation, rotation, and permutation. In this paper, we present a first study of the approximation power of these architectures. We first derive two sufficient conditions for an equivariant architecture to have the universal approximation property, based on a novel characterization of the space of equivariant polynomials. We then use these conditions to show that two recently suggested models, Tensor field Networks and SE3-Transformers, are universal, and for devising two other novel universal architectures.", "keywords": "3D deep learning;Rotation invariance;Invariant and equivariant deep networks;Universal approximation;Point clouds", "primary_area": "", "supplementary_material": "", "author": "Nadav Dym;Haggai Maron", "authorids": "nadavdym@gmail.com;~Haggai_Maron1", "gender": ";M", "homepage": ";https://haggaim.github.io/", "dblp": ";181/6629", "google_scholar": ";https://scholar.google.co.il/citations?user=4v8uJrIAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "nadavdym@gmail.com;~Haggai_Maron1", "aff": ";NVIDIA", "aff_domain": ";nvidia.com", "position": ";Research Scientist", "bibtex": "@inproceedings{\ndym2021on,\ntitle={On the Universality of Rotation Equivariant Point Cloud Networks},\nauthor={Nadav Dym and Haggai Maron},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6NFBvWlRXaG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;8;8", "confidence": "2;2;3;3", "wc_review": "263;277;250;694", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "197;229;92;653", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 1.0 ], "confidence_avg": [ 2.5, 0.5 ], "wc_review_avg": [ 371.0, 186.72841240689644 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 292.75, 214.0751912296238 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 104, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8682204623303812991&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=6NFBvWlRXaG", "email": ";nvidia.com", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "NVIDIA", "aff_unique_dep": "NVIDIA Corporation", "aff_unique_url": "https://www.nvidia.com", "aff_unique_abbr": "NVIDIA", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "6R51jA4fOB", "title": "Few-shot Adaptation of Generative Adversarial Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "This paper proposes a simple and effective method, Few-Shot GAN (FSGAN), for adapting GANs in few-shot settings (less than 100 images). FSGAN repurposes component analysis techniques, learning to adapt the singular values of the pre-trained weights while freezing the corresponding singular vectors. This provides a highly expressive parameter space for adaptation while constraining changes to the pretrained weights. We validate our method in a challenging few-shot setting of 5-100 images in the target domain. We show that our method has significant visual quality gains compared with existing GAN adapation methods. We report extensive qualitative and quantitative results showing the effectiveness of our method. We additionally highlight a problem for few-shot synthesis in the standard quantitative metric used by data-efficient image synthesis works.", "keywords": "GAN;Few-shot;SVD;PCA", "primary_area": "", "supplementary_material": "/attachment/ffcf713160258b5b4ff5e8c3fa1195ab621a61bd.zip", "author": "Esther Robb;Wen-Sheng Chu;Abhishek Kumar;Jia-Bin Huang", "authorids": "~Esther_Robb1;~Wen-Sheng_Chu1;~Abhishek_Kumar1;~Jia-Bin_Huang1", "gender": ";;M;", "homepage": ";http://inductivebias.ml;https://jbhuang0604.github.io/;http://estherrobb.com", "dblp": "35/8617;67/6188-1;51/1815-1.html;", "google_scholar": "R-OrlSgAAAAJ;6vghMS0AAAAJ;pp848fYAAAAJ;", "orcid": ";;;", "linkedin": ";;jia-bin-huang-070a7418/;", "or_profile": "~Wen-Sheng_Chu1;~Abhishek_Kumar1;~Jia-Bin_Huang1;~Esther_Anne_Robb1", "aff": "Google Research;Google DeepMind;Virginia Tech;Virginia Tech", "aff_domain": "google.com;google.com;vt.edu;vt.edu", "position": "Research Scientist;Research Scientist;Assistant Professor;MS student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=6R51jA4fOB", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "4;4;4;5", "wc_review": "237;350;662;944", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 548.25, 276.4673353219147 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.8783100656536799, "gs_citation": 105, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7186102532561666668&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;1;1", "aff_unique_norm": "Google;Virginia Tech", "aff_unique_dep": "Google Research;", "aff_unique_url": "https://research.google;https://www.vt.edu", "aff_unique_abbr": "Google Research;VT", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "6SXNhWc5HFe", "title": "Provable Fictitious Play for General Mean-Field Games", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium. When viewing the mean-field state and the policy as two players, we propose a fictitious play algorithm which alternatively updates the mean-field state and the policy via gradient-descent and proximal policy optimization, respectively. Our algorithm is in stark contrast with previous literature which solves each single-agent reinforcement learning problem induced by the iterates mean-field states to the optimum. Furthermore, we prove that our fictitious play algorithm converges to the Nash equilibrium at a sublinear rate. To the best of our knowledge, this seems the first provably convergent reinforcement learning algorithm for mean-field games based on iterative updates of both mean-field state and policy.", "keywords": "Mean-field games;Fictitious play;Entropy regularization;Nash equilibrium", "primary_area": "", "supplementary_material": "", "author": "Qiaomin Xie;Zhuoran Yang;Zhaoran Wang;Andreea Minca", "authorids": "~Qiaomin_Xie1;~Zhuoran_Yang1;~Zhaoran_Wang1;~Andreea_Minca1", "gender": "F;M;Not Specified;F", "homepage": "https://qiaominxie.github.io/;https://zhuoranyang.github.io/;https://zhaoranwang.github.io/;https://people.orie.cornell.edu/acm299/", "dblp": "37/10269;;117/2756;", "google_scholar": "RVNcy4EAAAAJ;;https://scholar.google.com.tw/citations?user=HSx0BgQAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Qiaomin_Xie1;~Zhuoran_Yang1;~Zhaoran_Wang1;~Andreea_Minca1", "aff": "Cornell University;University of California, Berkeley;;", "aff_domain": "cornell.edu;berkeley.edu;;", "position": "Visiting Assistant Professor;Postdoc;;", "bibtex": "@misc{\nxie2021provable,\ntitle={Provable Fictitious Play for General Mean-Field Games},\nauthor={Qiaomin Xie and Zhuoran Yang and Zhaoran Wang and Andreea Minca},\nyear={2021},\nurl={https://openreview.net/forum?id=6SXNhWc5HFe}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=6SXNhWc5HFe", "pdf_size": 0, "rating": "3;5;5;5", "confidence": "5;3;3;5", "wc_review": "350;1141;214;996", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "412;508;142;902", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 675.25, 399.48177367684747 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 491.0, 272.6041085530444 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 23, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6818252811850104743&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Cornell University;University of California, Berkeley", "aff_unique_dep": ";", "aff_unique_url": "https://www.cornell.edu;https://www.berkeley.edu", "aff_unique_abbr": "Cornell;UC Berkeley", "aff_campus_unique_index": "1", "aff_campus_unique": ";Berkeley", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Sharpness-aware Minimization for Efficiently Improving Generalization", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2782", "id": "6Tm1mposlrM", "poster": "", "openreview": "https://openreview.net/forum?id=6Tm1mposlrM", "slides": "https://iclr.cc/virtual/2021/poster/2782", "video": "https://iclr.cc/virtual/2021/poster/2782", "author_site": "Pierre Foret, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur", "tldr": "", "abstract": "In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by the connection between geometry of the loss landscape and generalization---including a generalization bound that we prove here---we introduce a novel, effective procedure for instead simultaneously minimizing loss value and loss sharpness. In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-{10, 100}, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we find that SAM natively provides robustness to label noise on par with that provided by state-of-the-art procedures that specifically target learning with noisy labels.", "keywords": "Sharpness Minimization;Generalization;Regularization;Training Method;Deep Learning", "primary_area": "", "supplementary_material": "", "author": "Pierre Foret;Ariel Kleiner;Hossein Mobahi;Behnam Neyshabur", "authorids": "~Pierre_Foret1;akleiner@google.com;~Hossein_Mobahi2;~Behnam_Neyshabur1", "gender": "M;;;M", "homepage": ";;;https://www.neyshabur.net", "dblp": ";;;131/9898", "google_scholar": ";;;e1ucbCYAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Pierre_Foret1;akleiner@google.com;~Hossein_Mobahi2;~Behnam_Neyshabur1", "aff": "Google;;;Google", "aff_domain": "google.com;;;google.com", "position": "AI Resident;;;Research Scientist", "bibtex": "@inproceedings{\nforet2021sharpnessaware,\ntitle={Sharpness-aware Minimization for Efficiently Improving Generalization},\nauthor={Pierre Foret and Ariel Kleiner and Hossein Mobahi and Behnam Neyshabur},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6Tm1mposlrM}\n}", "github": "[![github](/images/github_icon.svg) google-research/sam](https://github.com/google-research/sam) + [![Papers with Code](/images/pwc_icon.svg) 12 community implementations](https://paperswithcode.com/paper/?openreview=6Tm1mposlrM)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;7;8;8", "confidence": "2;4;4;3", "wc_review": "257;722;1598;541", "wc_reply_reviewers": "0;0;556;0", "wc_reply_authors": "753;1403;2414;144", "reply_reviewers": "0;0;2;0", "reply_authors": "1;2;5;1", "rating_avg": [ 7.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 779.5, 500.7836359147531 ], "wc_reply_reviewers_avg": [ 139.0, 240.75506225207394 ], "wc_reply_authors_avg": [ 1178.5, 840.8479351226356 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.25, 1.6393596310755 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.6363636363636364, "gs_citation": 1709, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10001060203038731755&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=6Tm1mposlrM", "email": "google.com;;;google.com", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Parameter Efficient Multimodal Transformers for Video Representation Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2901", "id": "6UdQLhqJyFD", "poster": "", "openreview": "https://openreview.net/forum?id=6UdQLhqJyFD", "slides": "https://iclr.cc/virtual/2021/poster/2901", "video": "https://iclr.cc/virtual/2021/poster/2901", "author_site": "Sangho Lee, Youngjae Yu, Gunhee Kim, Thomas Breuel, Jan Kautz, Yale Song", "tldr": "", "abstract": "The recent success of Transformers in the language domain has motivated adapting it to a multimodal setting, where a new visual model is trained in tandem with an already pretrained language model. However, due to the excessive memory requirements from Transformers, existing work typically fixes the language model and train only the vision module, which limits its ability to learn cross-modal information in an end-to-end manner. In this work, we focus on reducing the parameters of multimodal Transformers in the context of audio-visual video representation learning. We alleviate the high memory requirement by sharing the parameters of Transformers across layers and modalities; we decompose the Transformer into modality-specific and modality-shared parts so that the model learns the dynamics of each modality both individually and together, and propose a novel parameter sharing scheme based on low-rank approximation. We show that our approach reduces parameters of the Transformers up to 97%, allowing us to train our model end-to-end from scratch. We also propose a negative sampling approach based on an instance similarity measured on the CNN embedding space that our model learns together with the Transformers. To demonstrate our approach, we pretrain our model on 30-second clips (480 frames) from Kinetics-700 and transfer it to audio-visual classification tasks.", "keywords": "Self-supervised learning;audio-visual representation learning;video representation learning", "primary_area": "", "supplementary_material": "", "author": "Sangho Lee;Youngjae Yu;Gunhee Kim;Thomas Breuel;Jan Kautz;Yale Song", "authorids": "~Sangho_Lee1;~Youngjae_Yu1;~Gunhee_Kim1;~Thomas_Breuel1;~Jan_Kautz1;~Yale_Song1", "gender": "M;M;M;M;;M", "homepage": "https://sangho-vision.github.io/;https://yj-yu.github.io/home/;http://vision.snu.ac.kr/gunhee/;;http://jankautz.com;https://people.csail.mit.edu/yalesong", "dblp": "17/5702-8;188/6210;45/115;b/ThomasMBreuel;48/6214;31/9606.html", "google_scholar": "Lq8MN6wAAAAJ;https://scholar.google.co.kr/citations?user=WDO24ZYAAAAJ;https://scholar.google.co.kr/citations?user=CiSdOV0AAAAJ;;P9FclNEAAAAJ;dNHNpxoAAAAJ", "orcid": ";;0000-0002-9543-7453;;;", "linkedin": ";;;;;", "or_profile": "~Sangho_Lee1;~Youngjae_Yu1;~Gunhee_Kim1;~Thomas_Breuel1;~Jan_Kautz1;~Yale_Song1", "aff": "Seoul National University;Seoul National University;Seoul National University;NVIDIA;NVIDIA;Microsoft Research", "aff_domain": "snu.ac.kr;snu.ac.kr;snu.ac.kr;nvidia.com;nvidia.com;microsoft.com", "position": "PhD student;Graduate Student;Full Professor;Researcher;VP Research;Researcher", "bibtex": "@inproceedings{\nlee2021parameter,\ntitle={Parameter Efficient Multimodal Transformers for Video Representation Learning},\nauthor={Sangho Lee and Youngjae Yu and Gunhee Kim and Thomas Breuel and Jan Kautz and Yale Song},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6UdQLhqJyFD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;8", "confidence": "5;3;3;5", "wc_review": "432;587;381;770", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "434;309;149;382", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 1.0897247358851685 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 542.5, 151.68140953986418 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 318.5, 107.46278425575991 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.2294157338705618, "gs_citation": 94, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4468363333457264365&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=6UdQLhqJyFD", "email": "snu.ac.kr;snu.ac.kr;snu.ac.kr;nvidia.com;nvidia.com;microsoft.com", "author_num": 6, "aff_unique_index": "0;0;0;1;1;2", "aff_unique_norm": "Seoul National University;NVIDIA;Microsoft", "aff_unique_dep": ";NVIDIA Corporation;Microsoft Research", "aff_unique_url": "https://www.snu.ac.kr;https://www.nvidia.com;https://www.microsoft.com/en-us/research", "aff_unique_abbr": "SNU;NVIDIA;MSR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1;1;1", "aff_country_unique": "South Korea;United States" }, { "id": "6UurSaf08jx", "title": "Subformer: A Parameter Reduced Transformer", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The advent of the Transformer can arguably be described as a driving force behind many of the recent advances in natural language processing. However, despite their sizeable performance improvements, as recently shown, the model is severely over-parameterized, being parameter inefficient and computationally expensive to train. Inspired by the success of parameter-sharing in pre-trained deep contextualized word representation encoders, we explore parameter-sharing methods in Transformers, with a specific focus on encoder-decoder models for sequence-to-sequence tasks such as Machine Translation. We perform an analysis of different parameter sharing/reduction methods and develop the Subformer, a parameter efficient Transformer-based model which combines the newly proposed Sandwich-style parameter sharing technique and self-attentive embedding factorization (SAFE). Experiments on machine translation, abstractive summarization, and language modeling show that the Subformer can outperform the Transformer even when using significantly fewer parameters. On the WMT'14 English-German test set, we show we can perform equally well, and even sometimes outperform (+0.1 BLEU score) the Transformer-base model while using 40% fewer parameters. We also perform equally well as Transformer-big with 40% fewer parameters, achieve performance within 0.1 BLEU with 70% fewer parameters, and outperform the model by 0.7 BLEU with 12M fewer parameters. We also outperform the standard Transformer-XL model, achieving a significant 3.6 lower perplexity with 37% fewer parameters.", "keywords": "transformers;sequence modeling;machine translation;efficiency", "primary_area": "", "supplementary_material": "", "author": "Machel Reid;Edison Marrese-Taylor;Yutaka Matsuo", "authorids": "~Machel_Reid1;~Edison_Marrese-Taylor2;~Yutaka_Matsuo1", "gender": ";;M", "homepage": "https://machelreid.github.io/;;http://ymatsuo.com", "dblp": "260/6668;;m/YMatsuo.html", "google_scholar": "N8ctPiIAAAAJ;;Dy8iau4AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Machel_Reid1;~Edison_Marrese-Taylor2;~Yutaka_Matsuo1", "aff": "Carnegie Mellon University;;The University of Tokyo", "aff_domain": "cmu.edu;;u-tokyo.ac.jp", "position": "Visiting Student;;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer4", "site": "https://openreview.net/forum?id=6UurSaf08jx", "pdf_size": 0, "rating": "4;4;6", "confidence": "4;4;5", "wc_review": "362;298;399", "wc_reply_reviewers": "60;430;0", "wc_reply_authors": "990;1108;615", "reply_reviewers": "1;3;0", "reply_authors": "4;3;1", "rating_avg": [ 4.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 353.0, 41.72129751897305 ], "wc_reply_reviewers_avg": [ 163.33333333333334, 190.14614262602214 ], "wc_reply_authors_avg": [ 904.3333333333334, 210.1845749705614 ], "reply_reviewers_avg": [ 1.3333333333333333, 1.247219128924647 ], "reply_authors_avg": [ 2.6666666666666665, 1.247219128924647 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.9999999999999997, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13162073254694288623&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Carnegie Mellon University;University of Tokyo", "aff_unique_dep": ";", "aff_unique_url": "https://www.cmu.edu;https://www.u-tokyo.ac.jp", "aff_unique_abbr": "CMU;UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Japan" }, { "id": "6VPl9khIMz", "title": "Adaptive Stacked Graph Filter", "track": "main", "status": "Reject", "tldr": "", "abstract": "We study Graph Convolutional Networks (GCN) from the graph signal processing viewpoint by addressing a difference between learning graph filters with fully-connected weights versus trainable polynomial coefficients. We find that by stacking graph filters with learnable polynomial parameters, we can build a highly adaptive and robust vertex classification model. Our treatment here relaxes the low-frequency (or equivalently, high homophily) assumptions in existing vertex classification models, resulting a more ubiquitous solution in terms of spectral properties. Empirically, by using only one hyper-parameter setting, our model achieves strong results on most benchmark datasets across the frequency spectrum.", "keywords": "Graph Convolutional Network;vertex classification;graph signal processing;adaptive graph filter", "primary_area": "", "supplementary_material": "/attachment/deb888f4c450f6dd4fa72597e551f6ce206fbc02.zip", "author": "Hoang NT;Takanori Maehara;Tsuyoshi Murata", "authorids": "~Hoang_NT1;~Takanori_Maehara1;~Tsuyoshi_Murata1", "gender": "M;M;M", "homepage": "https://tmaehara.gitlab.io;https://www.net.comp.isct.ac.jp/murata.html;https://gearons.org/", "dblp": "05/8510;77/1703;241/5325", "google_scholar": "3ei4ZqoAAAAJ;https://scholar.google.co.jp/citations?user=ws2fHhsAAAAJ;iuSBSHsAAAAJ", "orcid": "0000-0002-2101-1484;0000-0002-3818-7830;", "linkedin": ";;https://linkedin.com/in/hoang-nt", "or_profile": "~Takanori_Maehara1;~Tsuyoshi_Murata1;~Hoang_Thai_Nguyen1", "aff": "Meta (aka. Facebook);Tokyo Institute of Technology;RIKEN AIP", "aff_domain": "fb.com;titech.ac.jp;riken.jp", "position": "Software Engineer;Full Professor;Researcher", "bibtex": "@misc{\nnt2021adaptive,\ntitle={Adaptive Stacked Graph Filter},\nauthor={Hoang NT and Takanori Maehara and Tsuyoshi Murata},\nyear={2021},\nurl={https://openreview.net/forum?id=6VPl9khIMz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=6VPl9khIMz", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "5;5;4;5", "wc_review": "924;511;437;467", "wc_reply_reviewers": "219;147;191;141", "wc_reply_authors": "824;461;782;432", "reply_reviewers": "1;1;1;1", "reply_authors": "3;2;2;2", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 584.75, 197.62638361311983 ], "wc_reply_reviewers_avg": [ 174.5, 32.13642792844283 ], "wc_reply_authors_avg": [ 624.75, 179.16106580392963 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:CpTBWM8G7_IJ:scholar.google.com/&scioq=Adaptive+Stacked+Graph+Filter&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Meta;Tokyo Institute of Technology;RIKEN", "aff_unique_dep": "Meta Platforms, Inc.;;Advanced Institute for Computational Science", "aff_unique_url": "https://meta.com;https://www.titech.ac.jp;https://www.aip.riken.jp", "aff_unique_abbr": "Meta;Titech;RIKEN AIP", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1", "aff_country_unique": "United States;Japan" }, { "id": "6VhmvP7XZue", "title": "Open-world Semi-supervised Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Supervised and semi-supervised learning methods have been traditionally designed for the closed-world setting which is based on the assumption that unlabeled test data contains only classes previously encountered in the labeled training data. However, the real world is often open and dynamic, and thus novel previously unseen classes may appear in the test data or during the model deployment. Here, we introduce a new open-world semi-supervised learning setting in which the model is required to recognize previously seen classes, as well as to discover novel classes never seen in the labeled dataset. To tackle the problem, we propose ORCA, an approach that jointly learns a feature representation and a classifier on the labeled and unlabeled subsets of the data. The key idea in ORCA is in introducing uncertainty based adaptive margin that effectively circumvents the bias caused by the imbalance of variance between seen and novel classes. We demonstrate that ORCA accurately discovers novel classes and assigns samples to previously seen classes on standard benchmark image classification datasets, including CIFAR and ImageNet. Remarkably, despite solving the harder task ORCA outperforms semi-supervised methods on seen classes, as well as novel class discovery methods on unseen classes, achieving 7% and 151% improvements on seen and unseen classes of the ImageNet dataset.", "keywords": "deep learning;semi-supervised learning;novel class discovery;clustering", "primary_area": "", "supplementary_material": "", "author": "Kaidi Cao;Maria Brbic;Jure Leskovec", "authorids": "~Kaidi_Cao1;mbrbic@cs.stanford.edu;~Jure_Leskovec1", "gender": "M;;", "homepage": "https://ai.stanford.edu/~kaidicao/;;http://cs.stanford.edu/~jure/", "dblp": "203/8207;;l/JureLeskovec", "google_scholar": "https://scholar.google.com.hk/citations?user=4Zw1PJ8AAAAJ;;Q_kKkIUAAAAJ", "orcid": ";;0000-0002-5411-923X", "linkedin": ";;leskovec/", "or_profile": "~Kaidi_Cao1;mbrbic@cs.stanford.edu;~Jure_Leskovec1", "aff": "Stanford University;;", "aff_domain": "stanford.edu;;", "position": "PhD student;;", "bibtex": "@misc{\ncao2021openworld,\ntitle={Open-world Semi-supervised Learning},\nauthor={Kaidi Cao and Maria Brbic and Jure Leskovec},\nyear={2021},\nurl={https://openreview.net/forum?id=6VhmvP7XZue}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=6VhmvP7XZue", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "3;4;4;4", "wc_review": "258;555;182;399", "wc_reply_reviewers": "0;118;0;0", "wc_reply_authors": "386;1042;443;933", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;2", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 348.5, 142.39469793500038 ], "wc_reply_reviewers_avg": [ 29.5, 51.09549882328188 ], "wc_reply_authors_avg": [ 701.0, 289.7818144742696 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 251, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13685131570461746231&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "aff_unique_index": "0", "aff_unique_norm": "Stanford University", "aff_unique_dep": "", "aff_unique_url": "https://www.stanford.edu", "aff_unique_abbr": "Stanford", "aff_campus_unique_index": "0", "aff_campus_unique": "Stanford", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "6X_32jLUaDg", "title": "Exploiting Playbacks in Unsupervised Domain Adaptation for 3D Object Detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "Self-driving cars must detect other vehicles and pedestrians in 3D to plan safe routes and avoid collisions. \nState-of-the-art 3D object detectors, based on deep learning, have shown promising accuracy but are prone to over-fit to domain idiosyncrasies, causing them to fail in new environments---a serious problem if autonomous vehicles are meant to operate freely. In this paper, we propose a novel learning approach that drastically reduces this gap by fine-tuning the detector on pseudo-labels in the target domain, which our method generates while the vehicle is parked, based on replays of previously recorded driving sequences. In these replays, objects are tracked over time, and detections are interpolated and extrapolated---crucially, leveraging future information to catch hard cases. We show, on five autonomous driving datasets, that fine-tuning the detector on these pseudo-labels substantially reduces the domain-gap to new driving environments, yielding drastic improvements in accuracy and detection reliability.", "keywords": "Unsupervised domain adaptation;3D vision;object detection;autonomous driving", "primary_area": "", "supplementary_material": "", "author": "Yurong You;Carlos Andres Diaz-Ruiz;Yan Wang;Wei-Lun Chao;Bharath Hariharan;Mark Campbell;Kilian Q Weinberger", "authorids": "~Yurong_You1;~Carlos_Andres_Diaz-Ruiz1;~Yan_Wang10;~Wei-Lun_Chao1;~Bharath_Hariharan3;~Mark_Campbell1;~Kilian_Q_Weinberger1", "gender": "M;M;M;M;M;M;M", "homepage": "http://yurongyou.com;;https://www.cs.cornell.edu/~yanwang/;https://sites.google.com/view/wei-lun-harry-chao;http://campbell.mae.cornell.edu;http://www.cs.cornell.edu/~kilian/;http://home.bharathh.info", "dblp": "199/1968;249/3072;59/2227;64/8842;;88/4801;05/8412", "google_scholar": "rdwkreIAAAAJ;ud0vmoMAAAAJ;nZsD8XwAAAAJ;PGKakWwAAAAJ;e1iAhHQAAAAJ;jsxk8vsAAAAJ;TpglobcAAAAJ", "orcid": ";;;0000-0003-1269-7231;;0009-0008-9313-7239;", "linkedin": "yurong-you/;carlos-diaz-ruiz/;;;;;", "or_profile": "~Yurong_You1;~Carlos_Andres_Diaz-Ruiz1;~Yan_Wang10;~Wei-Lun_Chao1;~Mark_Campbell1;~Kilian_Q_Weinberger1;~Bharath_Hariharan2", "aff": "NVIDIA;Cornell University;Cornell University;Ohio State University;Cornell University;Cornell University;Cornell University", "aff_domain": "nvidia.com;cornell.edu;cornell.edu;osu.edu;cornell.edu;cornell.edu;cornell.edu", "position": "Intern;PhD student;PhD student;Assistant Professor;Full Professor;Associate Professor;Assistant Professor", "bibtex": "@misc{\nyou2021exploiting,\ntitle={Exploiting Playbacks in Unsupervised Domain Adaptation for 3D Object Detection},\nauthor={Yurong You and Carlos Andres Diaz-Ruiz and Yan Wang and Wei-Lun Chao and Bharath Hariharan and Mark Campbell and Kilian Q Weinberger},\nyear={2021},\nurl={https://openreview.net/forum?id=6X_32jLUaDg}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=6X_32jLUaDg", "pdf_size": 0, "rating": "4;6;6;6", "confidence": "4;4;5;3", "wc_review": "225;201;295;551", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "260;453;96;435", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 318.0, 138.88484438555562 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 311.0, 145.2291293095156 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2067943135953406159&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;1;2;1;1;1", "aff_unique_norm": "NVIDIA;Cornell University;Ohio State University", "aff_unique_dep": "NVIDIA Corporation;;", "aff_unique_url": "https://www.nvidia.com;https://www.cornell.edu;https://www.osu.edu", "aff_unique_abbr": "NVIDIA;Cornell;OSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "FedBN: Federated Learning on Non-IID Features via Local Batch Normalization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2846", "id": "6YEQUn0QICG", "poster": "", "openreview": "https://openreview.net/forum?id=6YEQUn0QICG", "slides": "https://iclr.cc/virtual/2021/poster/2846", "video": "https://iclr.cc/virtual/2021/poster/2846", "author_site": "Xiaoxiao Li, Meirui Jiang, Xiaofei Zhang, Michael Kamp, Qi Dou", "tldr": "", "abstract": "The emerging paradigm of federated learning (FL) strives to enable collaborative training of deep models on the network edge without centrally aggregating raw data and hence improving data privacy. In most cases, the assumption of independent and identically distributed samples across local clients does not hold for federated learning setups. Under this setting, neural network training performance may vary significantly according to the data distribution and even hurt training convergence. Most of the previous work has focused on a difference in the distribution of labels or client shifts. Unlike those settings, we address an important problem of FL, e.g., different scanners/sensors in medical imaging, different scenery distribution in autonomous driving (highway vs. city), where local clients store examples with different distributions compared to other clients, which we denote as feature shift non-iid. In this work, we propose an effective method that uses local batch normalization to alleviate the feature shift before averaging models. The resulting scheme, called FedBN, outperforms both classical FedAvg, as well as the state-of-the-art for non-iid data (FedProx) on our extensive experiments. These empirical results are supported by a convergence analysis that shows in a simplified setting that FedBN has a faster convergence rate than FedAvg. Code is available at https://github.com/med-air/FedBN.", "keywords": "Federated Learning;Non-IID;Batch Normalization", "primary_area": "", "supplementary_material": "/attachment/39ec285162fc119a75101dd7fd1c74545717f921.zip", "author": "Xiaoxiao Li;Meirui JIANG;Xiaofei Zhang;Michael Kamp;Qi Dou", "authorids": "~Xiaoxiao_Li1;~Meirui_JIANG1;~Xiaofei_Zhang1;~Michael_Kamp1;~Qi_Dou2", "gender": "Unspecified;M;F;M;F", "homepage": "https://xxlya.github.io/;https://meiruijiang.github.io/MeiruiJiang/;;http://michaelkamp.org;https://www.cse.cuhk.edu.hk/~qdou", "dblp": "71/8042;285/5480;;133/7744;165/7846", "google_scholar": "sdENOQ4AAAAJ;https://scholar.google.com/citations?hl=en;;https://scholar.google.de/citations?user=8R5jbvQAAAAJ;https://scholar.google.com.hk/citations?user=iHh7IJQAAAAJ", "orcid": ";0000-0003-4228-8420; 0000-0002-0551-855X;0000-0001-6231-0694;0000-0002-3416-9950", "linkedin": ";;;michael-kamp-29096a95/;", "or_profile": "~Xiaoxiao_Li1;~Meirui_JIANG1;~Xiaofei_Zhang1;~Michael_Kamp1;~Qi_Dou2", "aff": "Princeton University;Department of Computer Science and Engineering, The Chinese University of Hong Kong;Iowa State University;Monash University;The Chinese University of Hong Kong", "aff_domain": "princeton.edu;cse.cuhk.edu.hk;iastate.edu;monash.edu;cuhk.edu.hk", "position": "Postdoc;PhD student;PhD student;Postdoc;Assistant Professor", "bibtex": "@inproceedings{\nli2021fedbn,\ntitle={Fed{BN}: Federated Learning on Non-{IID} Features via Local Batch Normalization},\nauthor={Xiaoxiao Li and Meirui JIANG and Xiaofei Zhang and Michael Kamp and Qi Dou},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6YEQUn0QICG}\n}", "github": "[![github](/images/github_icon.svg) adap/flower](https://github.com/adap/flower) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=6YEQUn0QICG)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "4;5;7;8", "confidence": "4;5;5;4", "wc_review": "422;302;563;1053", "wc_reply_reviewers": "0;0;144;0", "wc_reply_authors": "1433;1666;645;657", "reply_reviewers": "0;0;1;0", "reply_authors": "3;4;2;1", "rating_avg": [ 6.0, 1.5811388300841898 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 585.0, 285.5547233018568 ], "wc_reply_reviewers_avg": [ 36.0, 62.353829072479584 ], "wc_reply_authors_avg": [ 1100.25, 456.7599889438654 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.5, 1.118033988749895 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1089, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10717256024382138504&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=6YEQUn0QICG", "email": "princeton.edu;cse.cuhk.edu.hk;iastate.edu;monash.edu;cuhk.edu.hk", "author_num": 5, "aff_unique_index": "0;1;2;3;1", "aff_unique_norm": "Princeton University;Chinese University of Hong Kong;Iowa State University;Monash University", "aff_unique_dep": ";Department of Computer Science and Engineering;;", "aff_unique_url": "https://www.princeton.edu;https://www.cuhk.edu.hk;https://www.iastate.edu;https://www.monash.edu", "aff_unique_abbr": "Princeton;CUHK;ISU;Monash", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Hong Kong SAR", "aff_country_unique_index": "0;1;0;2;1", "aff_country_unique": "United States;China;Australia" }, { "id": "6YuRviF_FC-", "title": "ZCal: Machine learning methods for calibrating radio interferometric data", "track": "main", "status": "Reject", "tldr": "", "abstract": "Calibration is the most critical data processing step needed for generating images of high dynamic range \\citep{editioncasa}. With ever-increasing data volumes produced by modern radio telescopes \\cite{aniyan2017classifying}, astronomers are overwhelmed by the amount of data that needs to be manually processed and analyzed using limited computational resources \\citep{yatawatta2020stochastic}. Therefore, intelligent and automated systems are required to overcome these challenges. Traditionally, astronomers use a package such as Common Astronomy Software Applications (CASA) to compute the gain solutions based on regular observations of a known calibrator source \\citep{thompson2017interferometry} \\citep{abebe2015study} \\citep{grobler2016calibration} \\citep{editioncasa}. The traditional approach to calibration is iterative and time-consuming \\citep{jajarmizadeh2017optimal}, thus, the proposal of machine learning techniques. The applications of machine learning have created an opportunity to deal with complex problems currently encountered in radio astronomy data processing \\citep{aniyan2017classifying}. In this work, we propose the use of supervised machine learning models to first generation calibration (1GC), using the KAT-7 telescope environmental and pointing sensor data recorded during observations. Applying machine learning to 1GC, as opposed to calculating the gain solutions in CASA, has shown evidence of reducing computation, as well as accurately predicting the 1GC gain solutions and antenna behaviour. These methods are computationally less expensive, however they have not fully learned to generalise in predicting accurate 1GC solutions by looking at environmental and pointing sensors. We use an ensemble multi-output regression models based on random forest, decision trees, extremely randomized trees and K-nearest neighbor algorithms. The average prediction error obtained during the testing of our models on testing data is $ \\approx 0.01 < rmse < 0.09$ for gain amplitude per antenna, and $0.2 rad < rmse <0.5 rad$ for gain phase. This shows that the instrumental parameters used to train our model strongly correlate with gain amplitude effects than a phase.\n", "keywords": "Radio astronomy;Calibration;Radio interferometry;ska;kat-7;MeerKat", "primary_area": "", "supplementary_material": "", "author": "Simphiwe Zitha;Arun aniyan;Oleg Smirnov;Risuna Nkolele", "authorids": "~Simphiwe_Zitha1;aka.bhagya@gmail.com;osmirnov@gmail.com;risunawisdom@gmail.com", "gender": "M;;;", "homepage": "https://github.com/szitha;;;", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": "simphiwe-zitha-66628382/;;;", "or_profile": "~Simphiwe_Zitha1;aka.bhagya@gmail.com;osmirnov@gmail.com;risunawisdom@gmail.com", "aff": "University of the Witwatersrand;;;", "aff_domain": "wits.ac.za;;;", "position": "PhD student;;;", "bibtex": "@misc{\nzitha2021zcal,\ntitle={{\\{}ZC{\\}}al: Machine learning methods for calibrating radio interferometric data },\nauthor={Simphiwe Zitha and Arun aniyan and Oleg Smirnov and Risuna Nkolele},\nyear={2021},\nurl={https://openreview.net/forum?id=6YuRviF_FC-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=6YuRviF_FC-", "pdf_size": 0, "rating": "2;3;4", "confidence": "5;4;5", "wc_review": "129;963;139", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 3.0, 0.816496580927726 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 410.3333333333333, 390.8156712427086 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:UOZ-vRp-b1AJ:scholar.google.com/&scioq=ZCal:+Machine+learning+methods+for+calibrating+radio+interferometric+data&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of the Witwatersrand", "aff_unique_dep": "", "aff_unique_url": "https://www.wits.ac.za", "aff_unique_abbr": "Wits", "aff_country_unique_index": "0", "aff_country_unique": "South Africa" }, { "id": "6_FjMpi_ebO", "title": "Redesigning the Classification Layer by Randomizing the Class Representation Vectors", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural image classification models typically consist of two components. The first is an image encoder, which is responsible for encoding a given raw image into a representative vector. The second is the classification component, which is often implemented by projecting the representative vector onto target class vectors. The target class vectors, along with the rest of the model parameters, are estimated so as to minimize the loss function. \n\nIn this paper, we analyze how simple design choices for the classification layer affect the learning dynamics. We show that the standard cross-entropy training implicitly captures visual similarities between different classes, which might deteriorate accuracy or even prevents some models from converging. We propose to draw the class vectors randomly and set them as fixed during training, thus invalidating the visual similarities encoded in these vectors. We analyze the effects of keeping the class vectors fixed and show that it can increase the inter-class separability, intra-class compactness, and the overall model accuracy, while maintaining the robustness to image corruptions and the generalization of the learned concepts.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/537a09b32b71063e3fe3a70e524689e47a6afee9.zip", "author": "Gabi Shalev;Gal Lev Shalev;Yossi Keshet", "authorids": "~Gabi_Shalev1;~Gal_Lev_Shalev1;~Yossi_Keshet1", "gender": "M;F;M", "homepage": ";https://www.linkedin.com/in/gal-lev-shalev-631054115/;https://keshet.net.technion.ac.il", "dblp": "215/5204;;45/4451", "google_scholar": ";https://www.linkedin.com/in/gal-lev-shalev-631054115/;https://scholar.google.com.tw/citations?user=GoWgJ1AAAAAJ", "orcid": ";;0000-0003-2332-5783", "linkedin": ";;jkeshet/", "or_profile": "~Gabi_Shalev1;~Gal_Lev_Shalev1;~Yossi_Keshet1", "aff": "Bar Ilan University;Bar Ilan University;Bar-Ilan University", "aff_domain": "biu.ac.il;biu.ac.il;biu.ac.il", "position": "PhD student;PhD student;Associate Professor", "bibtex": "@misc{\nshalev2021redesigning,\ntitle={Redesigning the Classification Layer by Randomizing the Class Representation Vectors},\nauthor={Gabi Shalev and Gal Lev Shalev and Yossi Keshet},\nyear={2021},\nurl={https://openreview.net/forum?id=6_FjMpi_ebO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=6_FjMpi_ebO", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;4;4;4", "wc_review": "930;227;236;218", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "772;353;219;375", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 402.75, 304.47444474044124 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 429.75, 206.4211411168924 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16300153570725708923&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Bar-Ilan University", "aff_unique_dep": "", "aff_unique_url": "https://www.biu.ac.il", "aff_unique_abbr": "BIU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Israel" }, { "id": "6c6KZUdm1Nq", "title": "Regression from Upper One-side Labeled Data", "track": "main", "status": "Reject", "tldr": "", "abstract": "We address a regression problem from weakly labeled data that are correctly labeled only above a regression line, i.e., upper one-side labeled data.\nThe label values of the data are the results of sensing the magnitude of some phenomenon.\nIn this case, the labels often contain missing or incomplete observations whose values are lower than those of correct observations and are also usually lower than the regression line. It follows that data labeled with lower values than the estimations of a regression function (lower-side data) are mixed with data that should originally be labeled above the regression line (upper-side data).\nWhen such missing label observations are observed in a non-negligible amount, we thus should assume our lower-side data to be unlabeled data that are a mix of original upper- and lower-side data.\nWe formulate a regression problem from these upper-side labeled and lower-side unlabeled data. We then derive a learning algorithm in an unbiased and consistent manner to ordinary regression that is learned from data labeled correctly in both upper- and lower-side cases. Our key idea is that we can derive a gradient that requires only upper-side data and unlabeled data as the equivalent expression of that for ordinary regression. We additionally found that a specific class of losses enables us to learn unbiased solutions practically. In numerical experiments on synthetic and real-world datasets, we demonstrate the advantages of our algorithm.", "keywords": "regression;weakly-supervised learning;healthcare", "primary_area": "", "supplementary_material": "/attachment/d33fafb6f2f21ecd74b155d32af45041c192bf6a.zip", "author": "Takayuki Katsuki", "authorids": "~Takayuki_Katsuki2", "gender": "", "homepage": "https://research.ibm.com/people/takayuki-katsuki", "dblp": "01/10264", "google_scholar": "bZZ0I4UAAAAJ", "orcid": "0000-0002-3670-1138", "linkedin": "", "or_profile": "~Takayuki_Katsuki2", "aff": "International Business Machines", "aff_domain": "ibm.com", "position": "Research staff member", "bibtex": "@misc{\nkatsuki2021regression,\ntitle={Regression from Upper One-side Labeled Data},\nauthor={Takayuki Katsuki},\nyear={2021},\nurl={https://openreview.net/forum?id=6c6KZUdm1Nq}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=6c6KZUdm1Nq", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;4;5", "wc_review": "350;411;159", "wc_reply_reviewers": "0;0;218", "wc_reply_authors": "136;78;341", "reply_reviewers": "0;0;3", "reply_authors": "1;1;2", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 306.6666666666667, 107.34471989291737 ], "wc_reply_reviewers_avg": [ 72.66666666666667, 102.76618553244491 ], "wc_reply_authors_avg": [ 185.0, 112.82139277046116 ], "reply_reviewers_avg": [ 1.0, 1.4142135623730951 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:rvm30dswR2sJ:scholar.google.com/&scioq=Regression+from+Upper+One-side+Labeled+Data&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "International Business Machines Corporation", "aff_unique_dep": "", "aff_unique_url": "https://www.ibm.com", "aff_unique_abbr": "IBM", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "6deUA11mOJ5", "title": "A Large-scale Study on Training Sample Memorization in Generative Modeling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many recent developments on generative models for natural images have relied on heuristically-motivated metrics that can be easily gamed by memorizing a small sample from the true distribution or training a model directly to improve the metric.\nIn this work, we critically evaluate the gameability of the benchmarking procedure by running a competition which ultimately resulted in participants attempting to cheat. Our competition received over 11000 submitted models which allowed us to investigate memorization-aware metrics for measuring generative model performance. Specifically, we propose the Memorization-Informed Frechet Inception Distance (MiFID) and discuss ways to ensure that winning submissions were based on genuine improvements in perceptual quality. We evaluate the effectiveness of our benchmark by manually inspecting the code for the 1000 top-performing models and labeling different forms of memorization that were intentionally or unintentionally used. To facilitate future work on benchmarking generative models, we release generated images and our labels for these models as well as code to compute the MiFID metric.", "keywords": "GAN;generative adversarial networks;generative modeling;memorization", "primary_area": "", "supplementary_material": "", "author": "Ching-Yuan Bai;Hsuan-Tien Lin;Colin Raffel;Wendy Kan", "authorids": "b05502055@csie.ntu.edu.tw;~Hsuan-Tien_Lin1;~Colin_Raffel1;~Wendy_Kan1", "gender": ";M;;F", "homepage": ";http://www.csie.ntu.edu.tw/~htlin;http://colinraffel.com;", "dblp": ";10/3718;149/0082;", "google_scholar": ";https://scholar.google.com.tw/citations?user=yAr4UPUAAAAJ;I66ZBYwAAAAJ;ZOdYiPEAAAAJ", "orcid": ";;;", "linkedin": ";;;wendykan/", "or_profile": "b05502055@csie.ntu.edu.tw;~Hsuan-Tien_Lin1;~Colin_Raffel1;~Wendy_Kan1", "aff": ";National Taiwan University;Google;Kaggle", "aff_domain": ";ntu.edu.tw;google.com;kaggle.com", "position": ";Full Professor;Research Scientist;Data Scientist", "bibtex": "@misc{\nbai2021a,\ntitle={A Large-scale Study on Training Sample Memorization in Generative Modeling},\nauthor={Ching-Yuan Bai and Hsuan-Tien Lin and Colin Raffel and Wendy Kan},\nyear={2021},\nurl={https://openreview.net/forum?id=6deUA11mOJ5}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=6deUA11mOJ5", "pdf_size": 0, "rating": "3;4;5", "confidence": "4;4;4", "wc_review": "149;1235;620", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "623;1839;204", "reply_reviewers": "0;0;0", "reply_authors": "1;4;1", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 668.0, 444.65492238363896 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 888.6666666666666, 693.4168218963893 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 1.4142135623730951 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14760605307709909206&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "National Taiwan University;Google;Kaggle", "aff_unique_dep": ";Google;", "aff_unique_url": "https://www.ntu.edu.tw;https://www.google.com;https://www.kaggle.com", "aff_unique_abbr": "NTU;Google;Kaggle", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Taiwan;Mountain View;", "aff_country_unique_index": "0;1;1", "aff_country_unique": "China;United States" }, { "id": "6fb4mex_pUT", "title": "An Algorithm for Out-Of-Distribution Attack to Neural Network Encoder", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep neural networks (DNNs), especially convolutional neural networks, have achieved superior performance on image classification tasks. However, such performance is only guaranteed if the input to a trained model is similar to the training samples, i.e., the input follows the probability distribution of the training set. Out-Of-Distribution (OOD) samples do not follow the distribution of training set, and therefore the predicted class labels on OOD samples become meaningless. Classi\ufb01cation-based methods have been proposed for OOD detection; however, in this study we show that this type of method has no theoretical guarantee and is practically breakable by our OOD Attack algorithm because of dimensionality reduction in the DNN models. We also show that Glow likelihood-based OOD detection is breakable as well. ", "keywords": "Out-Of-Distribution;DNN;image classification", "primary_area": "", "supplementary_material": "", "author": "Liang Liang;Linhai Ma;Linchen Qian;Jiasong Chen", "authorids": "~Liang_Liang2;~Linhai_Ma1;lxq93@miami.edu;jasonchen@miami.edu", "gender": ";M;;", "homepage": ";https://sarielma.github.io/;;", "dblp": ";226/9775;;", "google_scholar": ";https://scholar.google.com.hk/citations?view_op=list_works;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Liang_Liang2;~Linhai_Ma1;lxq93@miami.edu;jasonchen@miami.edu", "aff": "University of Miami;University of Miami;;", "aff_domain": "miami.edu;miami.edu;;", "position": "Assistant Professor;PhD student;;", "bibtex": "@misc{\nliang2021an,\ntitle={An Algorithm for Out-Of-Distribution Attack to Neural Network Encoder },\nauthor={Liang Liang and Linhai Ma and Linchen Qian and Jiasong Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=6fb4mex_pUT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=6fb4mex_pUT", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "4;5;4;4", "wc_review": "821;443;400;1647", "wc_reply_reviewers": "619;0;0;0", "wc_reply_authors": "1918;334;580;736", "reply_reviewers": "2;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 827.75, 500.55438016263525 ], "wc_reply_reviewers_avg": [ 154.75, 268.03486247128376 ], "wc_reply_authors_avg": [ 892.0, 609.4505722369945 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 31, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1015255546462191299&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "University of Miami", "aff_unique_dep": "", "aff_unique_url": "https://www.miami.edu", "aff_unique_abbr": "UM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "6gZJ6f6pU6h", "title": "Multi-EPL: Accurate Multi-source Domain Adaptation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Given multiple source datasets with labels, how can we train a target model with no labeled data? Multi-source domain adaptation (MSDA) aims to train a model using multiple source datasets different from a target dataset in the absence of target data labels. MSDA is a crucial problem applicable to many practical cases where labels for the target data are unavailable due to privacy issues. Existing MSDA frameworks are limited since they align data without considering conditional distributions p(x|y) of each domain. They also do not fully utilize the target data without labels, and rely on limited feature extraction with a single extractor. In this paper, we propose Multi-EPL, a novel method for multi-source domain adaptation. Multi-EPL exploits label-wise moment matching to align conditional distributions p(x|y), uses pseudolabels for the unavailable target labels, and introduces an ensemble of multiple feature extractors for accurate domain adaptation. Extensive experiments show that Multi-EPL provides the state-of-the-art performance for multi-source domain adaptation tasks in both of image domains and text domains.", "keywords": "Multi-Source Domain Adaptation;Label-wise Moment Matching;Pseudolabel;Ensemble of Feature Representation", "primary_area": "", "supplementary_material": "/attachment/c2381300b0dc54cc54a75b7c4d8b9d220e4c26f2.zip", "author": "Seongmin Lee;Hyunsik Jeon;U Kang", "authorids": "~Seongmin_Lee2;jeon185@gmail.com;~U_Kang1", "gender": "F;;M", "homepage": "http://www.seongmin.xyz;;http://datalab.snu.ac.kr/~ukang", "dblp": "317/5565;;13/7122", "google_scholar": "EA4jKm4AAAAJ;;https://scholar.google.com/citations?hl=en", "orcid": ";;0000-0002-8774-6950", "linkedin": "seongmin-lee-8b8a97209/;;", "or_profile": "~Seongmin_Lee2;jeon185@gmail.com;~U_Kang1", "aff": "Georgia Institute of Technology;;Seoul National University", "aff_domain": "gatech.edu;;snu.ac.kr", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nlee2021multiepl,\ntitle={Multi-{\\{}EPL{\\}}: Accurate Multi-source Domain Adaptation},\nauthor={Seongmin Lee and Hyunsik Jeon and U Kang},\nyear={2021},\nurl={https://openreview.net/forum?id=6gZJ6f6pU6h}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=6gZJ6f6pU6h", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "5;4;5;3", "wc_review": "467;368;243;396", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "454;278;245;385", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 368.5, 80.94596963407135 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 340.5, 83.5 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4735881747480260249&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 12, "aff_unique_index": "0;1", "aff_unique_norm": "Georgia Institute of Technology;Seoul National University", "aff_unique_dep": ";", "aff_unique_url": "https://www.gatech.edu;https://www.snu.ac.kr", "aff_unique_abbr": "Georgia Tech;SNU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;South Korea" }, { "id": "6htjOqus6C3", "title": "DynamicVAE: Decoupling Reconstruction Error and Disentangled Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper challenges the common assumption that the weight $\\beta$, in $\\beta$-VAE, should be larger than $1$ in order to effectively disentangle latent factors. We demonstrate that $\\beta$-VAE, with $\\beta < 1$, can not only attain good disentanglement but also significantly improve reconstruction accuracy via dynamic control. The paper \\textit{removes the inherent trade-off} between reconstruction accuracy and disentanglement for $\\beta$-VAE. Existing methods, such as $\\beta$-VAE and FactorVAE, assign a large weight to the KL-divergence term in the objective function, leading to high reconstruction errors for the sake of better disentanglement. To mitigate this problem, a ControlVAE has recently been developed that dynamically tunes the KL-divergence weight in an attempt to \\textit{control the trade-off} to more a favorable point. However, ControlVAE fails to eliminate the conflict between the need for a large $\\beta$ (for disentanglement) and the need for a small $\\beta$ (for smaller reconstruction error). Instead, we propose DynamicVAE that maintains a different $\\beta$ at different stages of training, thereby \\textit{decoupling disentanglement and reconstruction accuracy}. In order to evolve the weight, $\\beta$, along a trajectory that enables such decoupling, DynamicVAE leverages a modified incremental PI (proportional-integral) controller, a variant of proportional-integral-derivative controller (PID) algorithm, and employs a moving average as well as a hybrid annealing method to evolve the value of KL-divergence smoothly in a tightly controlled fashion. We theoretically prove the stability of the proposed approach. Evaluation results on three benchmark datasets demonstrate that DynamicVAE significantly improves the reconstruction accuracy while achieving disentanglement comparable to the best of existing methods. The results verify that our method can separate disentangled representation learning and reconstruction, removing the inherent tension between the two. ", "keywords": "disentangled representation learning;dynamic learning;Variational Autoencoder;PID contoller", "primary_area": "", "supplementary_material": "/attachment/ac01d67aa70ec745ae3c655492ff37bf8aa029db.zip", "author": "Huajie Shao;Haohong Lin;Qinmin Yang;Shuochao Yao;Han Zhao;Tarek Abdelzaher", "authorids": "~Huajie_Shao1;lhh2017@zju.edu.cn;qmyang@zju.edu.cn;~Shuochao_Yao1;~Han_Zhao1;~Tarek_Abdelzaher1", "gender": "M;;;;M;M", "homepage": "https://huajieshao.github.io/;;;https://yscacaca.github.io/;https://hanzhaoml.github.io/;http://abdelzaher.cs.illinois.edu/", "dblp": "179/4173;;;148/1920;03/3520-2;a/TarekFAbdelzaher", "google_scholar": "5-D7ZLsAAAAJ;;;https://scholar.google.com/citations?hl=en;x942ipYAAAAJ;https://scholar.google.com.tw/citations?user=cA28Zs0AAAAJ", "orcid": "0000-0001-7627-5615;;;;0000-0002-8579-1600;0000-0003-3883-7220", "linkedin": "huajie-shao-508465113/;;;;;tarek-abdelzaher-0216071/", "or_profile": "~Huajie_Shao1;lhh2017@zju.edu.cn;qmyang@zju.edu.cn;~Shuochao_Yao1;~Han_Zhao1;~Tarek_Abdelzaher1", "aff": "University of Illinois, Urbana Champaign;;;George Mason University;University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign", "aff_domain": "illinois.edu;;;gmu.edu;illinois.edu;illinois.edu", "position": "PhD student;;;Assistant Professor;Assistant Professor;Full Professor", "bibtex": "@misc{\nshao2021dynamicvae,\ntitle={Dynamic{\\{}VAE{\\}}: Decoupling Reconstruction Error and Disentangled Representation Learning},\nauthor={Huajie Shao and Haohong Lin and Qinmin Yang and Shuochao Yao and Han Zhao and Tarek Abdelzaher},\nyear={2021},\nurl={https://openreview.net/forum?id=6htjOqus6C3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=6htjOqus6C3", "pdf_size": 0, "rating": "4;4;4;4", "confidence": "4;5;4;5", "wc_review": "734;358;294;646", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 508.0, 186.02150413325873 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4756585003560361894&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "University of Illinois Urbana-Champaign;George Mason University", "aff_unique_dep": ";", "aff_unique_url": "https://illinois.edu;https://www.gmu.edu", "aff_unique_abbr": "UIUC;GMU", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Urbana-Champaign;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Private Post-GAN Boosting", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3202", "id": "6isfR3JCbi", "poster": "", "openreview": "https://openreview.net/forum?id=6isfR3JCbi", "slides": "https://iclr.cc/virtual/2021/poster/3202", "video": "https://iclr.cc/virtual/2021/poster/3202", "author_site": "Marcel Neunhoeffer, Steven Wu, Cynthia Dwork", "tldr": "", "abstract": "Differentially private GANs have proven to be a promising approach for generating realistic synthetic data without compromising the privacy of individuals. Due to the privacy-protective noise introduced in the training, the convergence of GANs becomes even more elusive, which often leads to poor utility in the output generator at the end of training. We propose Private post-GAN boosting (Private PGB), a differentially private method that combines samples produced by the sequence of generators obtained during GAN training to create a high-quality synthetic dataset. To that end, our method leverages the Private Multiplicative Weights method (Hardt and Rothblum, 2010) to reweight generated samples. We evaluate Private PGB on two dimensional toy data, MNIST images, US Census data and a standard machine learning prediction task. Our experiments show that Private PGB improves upon a standard private GAN approach across a collection of quality measures. We also provide a non-private variant of PGB that improves the data quality of standard GAN training.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Marcel Neunhoeffer;Steven Wu;Cynthia Dwork", "authorids": "~Marcel_Neunhoeffer1;~Steven_Wu1;~Cynthia_Dwork2", "gender": "M;F;M", "homepage": "https://www.marcel-neunhoeffer.com/;https://dwork.seas.harvard.edu/;https://zstevenwu.com/", "dblp": "263/4219;;137/8350", "google_scholar": "Q491RXUAAAAJ;;MbF6rTEAAAAJ", "orcid": "0000-0002-9137-5785;;", "linkedin": "marcel-neunhoeffer/;;zstevenwu/", "or_profile": "~Marcel_Neunhoeffer1;~Cynthia_Dwork2;~Zhiwei_Steven_Wu1", "aff": "University of Mannheim;Harvard University;Carnegie Mellon University", "aff_domain": "uni-mannheim.de;fas.harvard.edu;cmu.edu", "position": "PhD student;Full Professor;Assistant Professor", "bibtex": "@inproceedings{\nneunhoeffer2021private,\ntitle={Private Post-{\\{}GAN{\\}} Boosting},\nauthor={Marcel Neunhoeffer and Steven Wu and Cynthia Dwork},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6isfR3JCbi}\n}", "github": "[![github](/images/github_icon.svg) mneunhoe/post-gan-boosting](https://github.com/mneunhoe/post-gan-boosting)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;7;8", "confidence": "2;3;4", "wc_review": "245;162;107", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "453;200;8", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 7.0, 0.816496580927726 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 171.33333333333334, 56.72350091060632 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 220.33333333333334, 182.2385494040404 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 39, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=937740189813979153&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=6isfR3JCbi", "email": "uni-mannheim.de;fas.harvard.edu;cmu.edu", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Mannheim;Harvard University;Carnegie Mellon University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.uni-mannheim.de;https://www.harvard.edu;https://www.cmu.edu", "aff_unique_abbr": "UM;Harvard;CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1", "aff_country_unique": "Germany;United States" }, { "id": "6jlNy83JUQ_", "title": "Low Complexity Approximate Bayesian Logistic Regression for Sparse Online Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Theoretical results show that Bayesian methods can achieve lower bounds on regret for online logistic regression. In practice, however, such techniques may not be feasible especially for very large feature sets. Various approximations that, for huge sparse feature sets, diminish the theoretical advantages, must be used. Often, they apply stochastic gradient methods with hyper-parameters that must be tuned on some surrogate loss, defeating theoretical advantages of Bayesian methods. The surrogate loss, defined to approximate the mixture, requires techniques as Monte Carlo sampling, increasing computations per example. We propose low complexity analytical approximations for sparse online logistic and probit regressions. Unlike variational inference and other methods, our methods use analytical closed forms, substantially lowering computations. Unlike dense solutions, \nas Gaussian Mixtures, our methods allow for sparse problems with huge feature sets without increasing complexity. With the analytical closed forms, there is also no need for applying stochastic gradient methods on surrogate losses, and for tuning and balancing learning and regularization hyper-parameters. Empirical results top the performance of the more computationally involved methods. Like such methods, our methods still reveal per feature and per example uncertainty measures.\n", "keywords": "Bayesian methods;logistic regression;regret;online learning;MDL.", "primary_area": "", "supplementary_material": "", "author": "Gil I. Shamir;Wojciech Szpankowski", "authorids": "~Gil_I._Shamir1;~Wojciech_Szpankowski1", "gender": ";", "homepage": ";", "dblp": "22/4711;s/WSzpankowski", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Gil_I._Shamir1;~Wojciech_Szpankowski1", "aff": "Google;Purdue University", "aff_domain": "google.com;ecn.purdue.edu", "position": "Google;", "bibtex": "@misc{\nshamir2021low,\ntitle={Low Complexity Approximate Bayesian Logistic Regression for Sparse Online Learning},\nauthor={Gil I. Shamir and Wojciech Szpankowski},\nyear={2021},\nurl={https://openreview.net/forum?id=6jlNy83JUQ_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=6jlNy83JUQ_", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "4;4;4;3", "wc_review": "1112;757;337;470", "wc_reply_reviewers": "0;143;0;0", "wc_reply_authors": "2188;1790;836;678", "reply_reviewers": "0;1;0;0", "reply_authors": "3;4;2;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 669.0, 297.4130124927287 ], "wc_reply_reviewers_avg": [ 35.75, 61.92081637058736 ], "wc_reply_authors_avg": [ 1373.0, 634.3319320355865 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.5, 1.118033988749895 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9099143422639392875&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1", "aff_unique_norm": "Google;Purdue University", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.purdue.edu", "aff_unique_abbr": "Google;Purdue", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2528", "id": "6k7VdojAIK", "poster": "", "openreview": "https://openreview.net/forum?id=6k7VdojAIK", "slides": "https://iclr.cc/virtual/2021/poster/2528", "video": "https://iclr.cc/virtual/2021/poster/2528", "author_site": "Xiufeng Yang, Tanuj Aasawat, Kazuki Yoshizoe", "tldr": "", "abstract": "It is common practice to use large computational resources to train neural networks, known from many examples, such as reinforcement learning applications. However, while massively parallel computing is often used for training models, it is rarely used to search solutions for combinatorial optimization problems. This paper proposes a novel massively parallel Monte-Carlo Tree Search (MP-MCTS) algorithm that works efficiently for a 1,000 worker scale on a distributed memory environment using multiple compute nodes and applies it to molecular design. This paper is the first work that applies distributed MCTS to a real-world and non-game problem. Existing works on large-scale parallel MCTS show efficient scalability in terms of the number of rollouts up to 100 workers. Still, they suffer from the degradation in the quality of the solutions. MP-MCTS maintains the search quality at a larger scale. By running MP-MCTS on 256 CPU cores for only 10 minutes, we obtained candidate molecules with similar scores to non-parallel MCTS running for 42 hours. Moreover, our results based on parallel MCTS (combined with a simple RNN model) significantly outperform existing state-of-the-art work. Our method is generic and is expected to speed up other applications of MCTS.", "keywords": "parallel Monte Carlo Tree Search (MCTS);Upper Confidence bound applied to Trees (UCT);molecular design", "primary_area": "", "supplementary_material": "/attachment/00f6b8f83cf71baf6b7c60f6e77ac731838a6667.zip", "author": "Xiufeng Yang;Tanuj Aasawat;Kazuki Yoshizoe", "authorids": "~Xiufeng_Yang1;~Tanuj_Aasawat1;~Kazuki_Yoshizoe2", "gender": "M;M;M", "homepage": ";http://ece.ubc.ca/~taasawat;https://www.researchgate.net/profile/Kazuki_Yoshizoe", "dblp": ";;19/5077", "google_scholar": "v1fcn24AAAAJ;;IxrcTTAAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Xiufeng_Yang1;~Tanuj_Aasawat1;~Kazuki_Yoshizoe2", "aff": "RIKEN;;RIKEN", "aff_domain": "riken.jp;;riken.jp", "position": "Postdoc;;Unit Leader", "bibtex": "@inproceedings{\nyang2021practical,\ntitle={Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design},\nauthor={Xiufeng Yang and Tanuj Aasawat and Kazuki Yoshizoe},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6k7VdojAIK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer5;AnonReviewer2", "pdf_size": 0, "rating": "3;5;7;7;8", "confidence": "3;2;3;4;3", "wc_review": "475;559;597;209;537", "wc_reply_reviewers": "0;0;0;36;0", "wc_reply_authors": "790;316;385;315;710", "reply_reviewers": "0;0;0;1;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 6.0, 1.7888543819998317 ], "confidence_avg": [ 3.0, 0.6324555320336759 ], "wc_review_avg": [ 475.4, 138.95265380697126 ], "wc_reply_reviewers_avg": [ 7.2, 14.4 ], "wc_reply_authors_avg": [ 503.2, 204.67281206843276 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.35355339059327373, "gs_citation": 30, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14118216710260770584&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=6k7VdojAIK", "email": "riken.jp;;riken.jp", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "RIKEN", "aff_unique_dep": "", "aff_unique_url": "https://www.riken.jp", "aff_unique_abbr": "RIKEN", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "id": "6lH8nkwKRXV", "title": "Graph Structural Aggregation for Explainable Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph neural networks have proven to be very efficient to solve several tasks in graphs such as node classification or link prediction. These algorithms that operate by propagating information from vertices to their neighbors allow one to build node embeddings that contain local information. In order to use graph neural networks for graph classification, node embeddings must be aggregated to obtain a graph representation able to discriminate among different graphs (of possibly various sizes). Moreover, in analogy to neural networks for image classification, there is a need for explainability regarding the features that are selected in the graph classification process. To this end, we introduce StructAgg, a simple yet effective aggregation process based on the identification of structural roles for nodes in graphs that we use to create an end-to-end model. Through extensive experiments we show that this architecture can compete with state-of-the-art methods. We show how this aggregation step allows us to cluster together nodes that have comparable structural roles and how these roles provide explainability to this neural network model.\n", "keywords": "graph;deep;learning", "primary_area": "", "supplementary_material": "/attachment/54c13e2cb59f07c9db422d500049d6f8cd73de9e.zip", "author": "Alexis Galland;marc lelarge", "authorids": "~Alexis_Galland1;~marc_lelarge1", "gender": "M;M", "homepage": ";http://www.di.ens.fr/~lelarge/", "dblp": ";21/462", "google_scholar": "0wkUfbEAAAAJ;cLGOIdMAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Alexis_Galland1;~marc_lelarge1", "aff": ";INRIA", "aff_domain": ";inria.fr", "position": ";Researcher", "bibtex": "@misc{\ngalland2021graph,\ntitle={Graph Structural Aggregation for Explainable Learning},\nauthor={Alexis Galland and marc lelarge},\nyear={2021},\nurl={https://openreview.net/forum?id=6lH8nkwKRXV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=6lH8nkwKRXV", "pdf_size": 0, "rating": "3;4;6;7", "confidence": "5;4;5;5", "wc_review": "455;628;632;250", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.5811388300841898 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 491.25, 156.54611940255816 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.36514837167011077, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:BkAOYK9_pQsJ:scholar.google.com/&scioq=Graph+Structural+Aggregation+for+Explainable+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "INRIA", "aff_unique_dep": "", "aff_unique_url": "https://www.inria.fr", "aff_unique_abbr": "INRIA", "aff_country_unique_index": "0", "aff_country_unique": "France" }, { "title": "A Good Image Generator Is What You Need for High-Resolution Video Synthesis", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2810", "id": "6puCSjH3hwA", "poster": "", "openreview": "https://openreview.net/forum?id=6puCSjH3hwA", "slides": "https://iclr.cc/virtual/2021/poster/2810", "video": "https://iclr.cc/virtual/2021/poster/2810", "author_site": "Yu Tian, Jian Ren, Menglei Chai, Kyle Olszewski, Xi Peng, Dimitris Metaxas, Sergey Tulyakov", "tldr": "", "abstract": "Image and video synthesis are closely related areas aiming at generating content from noise. While rapid progress has been demonstrated in improving image-based models to handle large resolutions, high-quality renderings, and wide variations in image content, achieving comparable video generation results remains problematic. We present a framework that leverages contemporary image generators to render high-resolution videos. We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator. Not only does such a framework render high-resolution videos, but it also is an order of magnitude more computationally efficient. We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled. With such a representation, our framework allows for a broad range of applications, including content and motion manipulation. Furthermore, we introduce a new task, which we call cross-domain video synthesis, in which the image and motion generators are trained on disjoint datasets belonging to different domains. This allows for generating moving objects for which the desired video data is not available. Extensive experiments on various datasets demonstrate the advantages of our methods over existing video generation techniques. Code will be released at https://github.com/snap-research/MoCoGAN-HD.", "keywords": "high-resolution video generation;contrastive learning;cross-domain video generation", "primary_area": "", "supplementary_material": "/attachment/2ffa08f1858993328f862cac747f90ca575c8693.zip", "author": "Yu Tian;Jian Ren;Menglei Chai;Kyle Olszewski;Xi Peng;Dimitris N. Metaxas;Sergey Tulyakov", "authorids": "~Yu_Tian2;~Jian_Ren2;~Menglei_Chai1;~Kyle_Olszewski1;~Xi_Peng1;~Dimitris_N._Metaxas1;~Sergey_Tulyakov1", "gender": "M;M;M;M;Not Specified;M;M", "homepage": ";https://alanspike.github.io/;http://www.mlchai.com;https://kyleolsz.github.io/;https://deep-real.github.io/dr_xipeng.html;http://www.stulyakov.com/;https://www.cs.rutgers.edu/~dnm/", "dblp": ";59/2180-5;117/6261;165/9717;149/7762-5;40/6115;m/DNMetaxas", "google_scholar": "DxPjkDoAAAAJ;https://scholar.google.co.jp/citations?user=vDALiU4AAAAJ;https://scholar.google.co.jp/citations?user=6Lnb1Z4AAAAJ;FWDVqjgAAAAJ;DWw4v0kAAAAJ;mgzXR0sAAAAJ;https://scholar.google.com.tw/citations?user=a7VNhCIAAAAJ", "orcid": ";;;0000-0001-8775-6879;0000-0002-7772-001X;;", "linkedin": ";;;kyle-olszewski-2623ab1b;xi-peng-74b540b6/;sergeytulyakov/;dimitris-metaxas-1bb74914/", "or_profile": "~Yu_Tian2;~Jian_Ren2;~Menglei_Chai1;~Kyle_Olszewski1;~Xi_Peng1;~Sergey_Tulyakov1;~Dimitris_Metaxas1", "aff": "Rutgers University;Snap Inc.;Snap Inc.;Snap Inc.;University of Delaware;;Rutgers University", "aff_domain": "rutgers.edu;snapchat.com;snap.com;snap.com;udel.edu;;cs.rutgers.edu", "position": "PhD student;Research Scientist;Researcher;Researcher;Assistant Professor;;Full Professor", "bibtex": "@inproceedings{\ntian2021a,\ntitle={A Good Image Generator Is What You Need for High-Resolution Video Synthesis},\nauthor={Yu Tian and Jian Ren and Menglei Chai and Kyle Olszewski and Xi Peng and Dimitris N. Metaxas and Sergey Tulyakov},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6puCSjH3hwA}\n}", "github": "[![github](/images/github_icon.svg) snap-research/MoCoGAN-HD](https://github.com/snap-research/MoCoGAN-HD)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;8;8", "confidence": "3;2;5;5", "wc_review": "386;334;792;401", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "657;676;1103;689", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 7.0, 1.0 ], "confidence_avg": [ 3.75, 1.299038105676658 ], "wc_review_avg": [ 478.25, 182.84197411972997 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 781.25, 186.11068615208532 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.9622504486493763, "gs_citation": 211, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10838620537951090836&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=6puCSjH3hwA", "email": "rutgers.edu;snapchat.com;snap.com;snap.com;udel.edu;;cs.rutgers.edu", "author_num": 7, "aff_unique_index": "0;1;1;1;2;0", "aff_unique_norm": "Rutgers University;Snap Inc.;University of Delaware", "aff_unique_dep": ";;", "aff_unique_url": "https://www.rutgers.edu;https://www.snapinc.com;https://www.udel.edu", "aff_unique_abbr": "Rutgers;Snap;UD", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Debiasing Concept-based Explanations with Causal Analysis", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2732", "id": "6puUoArESGp", "poster": "", "openreview": "https://openreview.net/forum?id=6puUoArESGp", "slides": "https://iclr.cc/virtual/2021/poster/2732", "video": "https://iclr.cc/virtual/2021/poster/2732", "author_site": "Mohammad Taha Bahadori, David Heckerman", "tldr": "", "abstract": "Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model's predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information and noise using a two-stage regression technique borrowed from the instrumental variable literature. We also model the completeness of the concepts set and show that our debiasing method works when the concepts are not complete. Our synthetic and real-world experiments demonstrate the success of our method in removing biases and improving the ranking of the concepts in terms of their contribution to the explanation of the predictions.", "keywords": "Interpretability;Concept-based Explanation", "primary_area": "", "supplementary_material": "", "author": "Mohammad Taha Bahadori;David Heckerman", "authorids": "~Mohammad_Taha_Bahadori1;~David_Heckerman1", "gender": "M;M", "homepage": "http://faculty.washington.edu/bahadori/;http://web.cs.ucla.edu/~eli/", "dblp": "28/10813.html;h/DavidHeckerman", "google_scholar": "tlZvhyoAAAAJ;", "orcid": ";", "linkedin": "tahabahadori/;", "or_profile": "~Mohammad_Taha_Bahadori1;~David_Heckerman1", "aff": "Amazon;University of California - Los Angeles", "aff_domain": "amazon.com;", "position": "Scientist;Full Professor", "bibtex": "@inproceedings{\nbahadori2021debiasing,\ntitle={Debiasing Concept-based Explanations with Causal Analysis},\nauthor={Mohammad Taha Bahadori and David Heckerman},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6puUoArESGp}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "3;4;4;3", "wc_review": "226;334;873;226", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "224;415;223;62", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 414.75, 268.2194763621762 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 231.0, 125.02999640086374 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 52, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12706989521778677314&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=6puUoArESGp", "email": "amazon.com;", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Amazon;University of California, Los Angeles", "aff_unique_dep": "Amazon.com, Inc.;", "aff_unique_url": "https://www.amazon.com;https://www.ucla.edu", "aff_unique_abbr": "Amazon;UCLA", "aff_campus_unique_index": "1", "aff_campus_unique": ";Los Angeles", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "6s480DdlRQQ", "title": "Dynamic Backdoor Attacks Against Deep Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Current Deep Neural Network (DNN) backdooring attacks rely on adding static triggers (with fixed patterns and locations) on model inputs that are prone to detection. In this paper, we propose the first class of dynamic backdooring techniques: Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN). Triggers generated by our techniques have random patterns and locations. In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers. Moreover, c-BaN is the first conditional backdooring technique that given a target label, it can generate a target-specific trigger. Both BaN and c-BaN are essentially a general framework which renders the adversary the flexibility for further customizing backdoor attacks. We extensively evaluate our techniques on three benchmark datasets and show that our techniques achieve almost perfect attack performance on backdoored data with a negligible utility loss. More importantly, our techniques can bypass state-of-the-art defense mechanisms.", "keywords": "Backdoor attack;Deep Neural Networks security", "primary_area": "", "supplementary_material": "", "author": "Ahmed Salem;Rui Wen;Michael Backes;Shiqing Ma;Yang Zhang", "authorids": "~Ahmed_Salem2;rui.wen@cispa.saarland;~Michael_Backes1;shiqing.ma@rutgers.edu;~Yang_Zhang15", "gender": ";;;;M", "homepage": ";;;;https://yangzhangalmo.github.io/", "dblp": ";;;;06/6785-16", "google_scholar": ";;;;Xeb2888AAAAJ", "orcid": ";;;;0000-0003-3612-7348", "linkedin": ";;;;", "or_profile": "~Ahmed_Salem2;rui.wen@cispa.saarland;~Michael_Backes1;shiqing.ma@rutgers.edu;~Yang_Zhang15", "aff": ";;;;CISPA Helmholtz Center for Information Security", "aff_domain": ";;;;cispa.de", "position": ";;;;Assistant Professor", "bibtex": "@misc{\nsalem2021dynamic,\ntitle={Dynamic Backdoor Attacks Against Deep Neural Networks},\nauthor={Ahmed Salem and Rui Wen and Michael Backes and Shiqing Ma and Yang Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=6s480DdlRQQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer2", "site": "https://openreview.net/forum?id=6s480DdlRQQ", "pdf_size": 0, "rating": "5;5;6", "confidence": "4;4;3", "wc_review": "726;403;552", "wc_reply_reviewers": "148;0;0", "wc_reply_authors": "543;180;91", "reply_reviewers": "1;0;0", "reply_authors": "2;1;1", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 560.3333333333334, 131.9957911786921 ], "wc_reply_reviewers_avg": [ 49.333333333333336, 69.76786907707269 ], "wc_reply_authors_avg": [ 271.3333333333333, 195.50333898825247 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.9999999999999997, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9713490879427026693&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "CISPA Helmholtz Center for Information Security", "aff_unique_dep": "", "aff_unique_url": "https://www.cispa.de/", "aff_unique_abbr": "CISPA", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "title": "DDPNOpt: Differential Dynamic Programming Neural Optimizer", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2702", "id": "6s7ME_X5_Un", "poster": "", "openreview": "https://openreview.net/forum?id=6s7ME_X5_Un", "slides": "https://iclr.cc/virtual/2021/poster/2702", "video": "https://iclr.cc/virtual/2021/poster/2702", "author_site": "Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou", "tldr": "", "abstract": "Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order method rooted in the Approximate Dynamic Programming. In this vein, we propose a new class of optimizer, DDP Neural Optimizer (DDPNOpt), for training feedforward and convolution networks. DDPNOpt features layer-wise feedback policies which improve convergence and reduce sensitivity to hyper-parameter over existing methods. It outperforms other optimal-control inspired training methods in both convergence and complexity, and is competitive against state-of-the-art first and second order methods. We also observe DDPNOpt has surprising benefit in preventing gradient vanishing. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.", "keywords": "deep learning training;optimal control;trajectory optimization;differential dynamica programming", "primary_area": "", "supplementary_material": "", "author": "Guan-Horng Liu;Tianrong Chen;Evangelos Theodorou", "authorids": "~Guan-Horng_Liu1;~Tianrong_Chen1;~Evangelos_Theodorou1", "gender": ";M;M", "homepage": "https://ghliu.github.io;https://tianrongchen.github.io/;", "dblp": "143/6907;227/7295;155/9964", "google_scholar": "2Dt0VJ4AAAAJ;r9D3Fg50gMoC;", "orcid": ";;", "linkedin": ";tianrong-chen-757b3216a/;", "or_profile": "~Guan-Horng_Liu1;~Tianrong_Chen1;~Evangelos_Theodorou1", "aff": "Georgia Institute of Technology;Georgia Institute of Technology;Georgia Institute of Technology", "aff_domain": "gatech.edu;gatech.edu;gatech.edu", "position": "PhD student;PhD student;Assistant Professor", "bibtex": "@inproceedings{\nliu2021ddpnopt,\ntitle={{\\{}DDPNO{\\}}pt: Differential Dynamic Programming Neural Optimizer},\nauthor={Guan-Horng Liu and Tianrong Chen and Evangelos Theodorou},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6s7ME_X5_Un}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "7;7;7;8", "confidence": "3;4;3;2", "wc_review": "565;351;1258;447", "wc_reply_reviewers": "0;27;43;12", "wc_reply_authors": "684;470;2213;745", "reply_reviewers": "0;1;1;1", "reply_authors": "1;1;3;1", "rating_avg": [ 7.25, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 655.25, 356.1561279832203 ], "wc_reply_reviewers_avg": [ 20.5, 16.132265804901678 ], "wc_reply_authors_avg": [ 1028.0, 691.7394740796567 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14099754634482845412&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=6s7ME_X5_Un", "email": "gatech.edu;gatech.edu;gatech.edu", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Georgia Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.gatech.edu", "aff_unique_abbr": "Georgia Tech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2968", "id": "6t_dLShIUyZ", "poster": "", "openreview": "https://openreview.net/forum?id=6t_dLShIUyZ", "slides": "https://iclr.cc/virtual/2021/poster/2968", "video": "https://iclr.cc/virtual/2021/poster/2968", "author_site": "Shaocong Ma, Ziyi Chen, Yi Zhou, Shaofeng Zou", "tldr": "", "abstract": "Greedy-GQ is a value-based reinforcement learning (RL) algorithm for optimal control. Recently, the finite-time analysis of Greedy-GQ has been developed under linear function approximation and Markovian sampling, and the algorithm is shown to achieve an $\\epsilon$-stationary point with a sample complexity in the order of $\\mathcal{O}(\\epsilon^{-3})$. Such a high sample complexity is due to the large variance induced by the Markovian samples. In this paper, we propose a variance-reduced Greedy-GQ (VR-Greedy-GQ) algorithm for off-policy optimal control. In particular, the algorithm applies the SVRG-based variance reduction scheme to reduce the stochastic variance of the two time-scale updates. We study the finite-time convergence of VR-Greedy-GQ under linear function approximation and Markovian sampling and show that the algorithm achieves a much smaller bias and variance error than the original Greedy-GQ. In particular, we prove that VR-Greedy-GQ achieves an improved sample complexity that is in the order of $\\mathcal{O}(\\epsilon^{-2})$. We further compare the performance of VR-Greedy-GQ with that of Greedy-GQ in various RL experiments to corroborate our theoretical findings.", "keywords": "Optimization;Reinforcement Learning;Machine Learning", "primary_area": "", "supplementary_material": "/attachment/a31551141141459ca929b31cca4b76a14e9a8489.zip", "author": "Shaocong Ma;Ziyi Chen;Yi Zhou;Shaofeng Zou", "authorids": "~Shaocong_Ma1;~Ziyi_Chen2;~Yi_Zhou2;~Shaofeng_Zou1", "gender": "M;M;M;", "homepage": "https://mshaocong.github.io/;;https://sites.google.com/site/yizhouhomepage/home;", "dblp": "270/3742;37/1439-2;;", "google_scholar": ";zjSBVOIAAAAJ;4fK8bYIAAAAJ;", "orcid": ";;;", "linkedin": ";ziyi-chen-84616184/;;", "or_profile": "~Shaocong_Ma1;~Ziyi_Chen2;~Yi_Zhou2;~Shaofeng_Zou1", "aff": "University of Utah;University of Utah;University of Utah;", "aff_domain": "utah.edu;utah.edu;utah.edu;", "position": "PhD student;PhD student;Assistant Professor;", "bibtex": "@inproceedings{\nma2021greedygq,\ntitle={Greedy-{\\{}GQ{\\}} with Variance Reduction: Finite-time Analysis and Improved Complexity},\nauthor={Shaocong Ma and Ziyi Chen and Yi Zhou and Shaofeng Zou},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6t_dLShIUyZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "3;5;6;8;8", "confidence": "5;3;4;5;3", "wc_review": "452;383;270;430;503", "wc_reply_reviewers": "0;0;0;0;11", "wc_reply_authors": "561;586;572;413;501", "reply_reviewers": "0;0;0;0;1", "reply_authors": "1;1;2;1;1", "rating_avg": [ 6.0, 1.8973665961010275 ], "confidence_avg": [ 4.0, 0.8944271909999159 ], "wc_review_avg": [ 407.6, 78.8837118802101 ], "wc_reply_reviewers_avg": [ 2.2, 4.4 ], "wc_reply_authors_avg": [ 526.6, 63.770212482004474 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 1.2, 0.4 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.23570226039551587, "gs_citation": 19, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18438586002466946605&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=6t_dLShIUyZ", "email": "utah.edu;utah.edu;utah.edu;", "author_num": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Utah", "aff_unique_dep": "", "aff_unique_url": "https://www.utah.edu", "aff_unique_abbr": "Utah", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Domain Generalization with MixStyle", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2738", "id": "6xHJ37MVxxp", "poster": "", "openreview": "https://openreview.net/forum?id=6xHJ37MVxxp", "slides": "https://iclr.cc/virtual/2021/poster/2738", "video": "https://iclr.cc/virtual/2021/poster/2738", "author_site": "Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang", "tldr": "", "abstract": "Though convolutional neural networks (CNNs) have demonstrated remarkable ability in learning discriminative features, they often generalize poorly to unseen domains. Domain generalization aims to address this problem by learning from a set of source domains a model that is generalizable to any unseen domain. In this paper, a novel approach is proposed based on probabilistically mixing instance-level feature statistics of training samples across source domains. Our method, termed MixStyle, is motivated by the observation that visual domain is closely related to image style (e.g., photo vs.~sketch images). Such style information is captured by the bottom layers of a CNN where our proposed style-mixing takes place. Mixing styles of training instances results in novel domains being synthesized implicitly, which increase the domain diversity of the source domains, and hence the generalizability of the trained model. MixStyle fits into mini-batch training perfectly and is extremely easy to implement. The effectiveness of MixStyle is demonstrated on a wide range of tasks including category classification, instance retrieval and reinforcement learning.", "keywords": "Domain Generalization;Style Mixing", "primary_area": "", "supplementary_material": "", "author": "Kaiyang Zhou;Yongxin Yang;Yu Qiao;Tao Xiang", "authorids": "~Kaiyang_Zhou1;~Yongxin_Yang1;~Yu_Qiao1;~Tao_Xiang1", "gender": "M;;;M", "homepage": "https://kaiyangzhou.github.io/;;;https://www.surrey.ac.uk/people/tao-xiang", "dblp": "203/3155;;;22/4460-2.html", "google_scholar": "https://scholar.google.co.uk/citations?user=gRIejugAAAAJ;;;MeS5d4gAAAAJ", "orcid": ";;;0000-0002-2530-1059", "linkedin": ";;;", "or_profile": "~Kaiyang_Zhou1;~Yongxin_Yang1;~Yu_Qiao1;~Tao_Xiang1", "aff": ";;;University of Surrey", "aff_domain": ";;;surrey.ac.uk", "position": ";;;Full Professor", "bibtex": "@inproceedings{\nzhou2021domain,\ntitle={Domain Generalization with MixStyle},\nauthor={Kaiyang Zhou and Yongxin Yang and Yu Qiao and Tao Xiang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6xHJ37MVxxp}\n}", "github": "[![github](/images/github_icon.svg) KaiyangZhou/Dassl.pytorch](https://github.com/KaiyangZhou/Dassl.pytorch) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=6xHJ37MVxxp)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7", "confidence": "4;4;4", "wc_review": "307;504;338", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "485;337;334", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 383.0, 86.49084729996964 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 385.3333333333333, 70.48561713017928 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1006, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4489212027125038279&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=6xHJ37MVxxp", "email": ";;;surrey.ac.uk", "author_num": 4, "aff_unique_index": "0", "aff_unique_norm": "University of Surrey", "aff_unique_dep": "", "aff_unique_url": "https://www.surrey.ac.uk", "aff_unique_abbr": "Surrey", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "6y3-wzlGHkb", "title": "Non-robust Features through the Lens of Universal Perturbations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent work ties adversarial examples to existence of non-robust features: features which are susceptible to small perturbations and believed to be unintelligible to humans, but still useful for prediction. We study universal adversarial perturbations and demonstrate that the above picture is more nuanced. Specifically, even though universal perturbations---similarly to standard adversarial perturbations---do leverage non-robust features, these features tend to be fundamentally different from the ``standard'' ones and, in particular, non-trivially human-aligned. Namely, universal perturbations have more human-aligned locality and spatial invariance properties. However, we also show that these human-aligned non-robust features have much less predictive signal than general non-robust features. Our findings thus take a step towards improving our understanding of these previously unintelligible features.", "keywords": "adversarial examples;robustness;non-robust features", "primary_area": "", "supplementary_material": "", "author": "Sung Min Park;Kuo-An Wei;Kai Yuanqing Xiao;Jerry Li;Aleksander Madry", "authorids": "~Sung_Min_Park2;kuoanwei@mit.edu;~Kai_Yuanqing_Xiao1;~Jerry_Li1;~Aleksander_Madry1", "gender": ";;;M;M", "homepage": "https://sungminpark.com;;https://kaixiao.github.io/;https://jerryzli.github.io/;https://people.csail.mit.edu/madry/", "dblp": "28/157;;;;67/2454", "google_scholar": ";;xblGvQgAAAAJ;4zybTq4AAAAJ;SupjsEUAAAAJ", "orcid": ";;0000-0002-9496-3072;;", "linkedin": ";;kaixiao/;;", "or_profile": "~Sung_Min_Park2;kuoanwei@mit.edu;~Kai_Yuanqing_Xiao1;~Jerry_Li1;~Aleksander_Madry1", "aff": "Massachusetts Institute of Technology;;Massachusetts Institute of Technology;Microsoft;Massachusetts Institute of Technology", "aff_domain": "mit.edu;;mit.edu;microsoft.com;mit.edu", "position": "PhD student;;PhD student;Senior Researcher;Professor", "bibtex": "@misc{\npark2021nonrobust,\ntitle={Non-robust Features through the Lens of Universal Perturbations},\nauthor={Sung Min Park and Kuo-An Wei and Kai Yuanqing Xiao and Jerry Li and Aleksander Madry},\nyear={2021},\nurl={https://openreview.net/forum?id=6y3-wzlGHkb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=6y3-wzlGHkb", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;2;4;4", "wc_review": "233;292;277;539", "wc_reply_reviewers": "0;0;86;126", "wc_reply_authors": "642;100;335;774", "reply_reviewers": "0;0;1;1", "reply_authors": "1;1;2;2", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 335.25, 119.61683618955988 ], "wc_reply_reviewers_avg": [ 53.0, 54.85435260760991 ], "wc_reply_authors_avg": [ 462.75, 263.1134498652625 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:d5SIcex5JiAJ:scholar.google.com/&scioq=Non-robust+Features+through+the+Lens+of+Universal+Perturbations&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "Massachusetts Institute of Technology;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "https://web.mit.edu;https://www.microsoft.com", "aff_unique_abbr": "MIT;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "A Block Minifloat Representation for Training Deep Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2795", "id": "6zaTwpNSsQ2", "poster": "", "openreview": "https://openreview.net/forum?id=6zaTwpNSsQ2", "slides": "https://iclr.cc/virtual/2021/poster/2795", "video": "https://iclr.cc/virtual/2021/poster/2795", "author_site": "Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone, david boland, Philip Leong", "tldr": "", "abstract": "Training Deep Neural Networks (DNN) with high efficiency can be difficult to achieve with native floating-point representations and commercially available hardware. Specialized arithmetic with custom acceleration offers perhaps the most promising alternative. Ongoing research is trending towards narrow floating-point representations, called minifloats, that pack more operations for a given silicon area and consume less power. In this paper, we introduce Block Minifloat (BM), a new spectrum of minifloat formats capable of training DNNs end-to-end with only 4-8 bit weight, activation and gradient tensors. While standard floating-point representations have two degrees of freedom, via the exponent and mantissa, BM exposes the exponent bias as an additional field for optimization. Crucially, this enables training with fewer exponent bits, yielding dense integer-like hardware for fused multiply-add (FMA) operations. For ResNet trained on ImageNet, 6-bit BM achieves almost no degradation in floating-point accuracy with FMA units that are $4.1\\times(23.9\\times)$ smaller and consume $2.3\\times(16.1\\times)$ less energy than FP8 (FP32). Furthermore, our 8-bit BM format matches floating-point accuracy while delivering a higher computational density and faster expected training times.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Sean Fox;Seyedramin Rasoulinezhad;Julian Faraone;david boland;Philip Leong", "authorids": "~Sean_Fox2;seyedramin.rasoulinezhad@sydney.edu.au;~Julian_Faraone1;david.boland@sydney.edu.au;~Philip_Leong1", "gender": ";;M;;", "homepage": ";;;;", "dblp": ";;206/6357;;", "google_scholar": "lO8YJfkAAAAJ;;EXWzHR0AAAAJ;;", "orcid": ";;;;", "linkedin": "sean-fox-92707176/;;https://au.linkedin.com/in/julian-faraone-b4587393;;", "or_profile": "~Sean_Fox2;seyedramin.rasoulinezhad@sydney.edu.au;~Julian_Faraone1;david.boland@sydney.edu.au;~Philip_Leong1", "aff": ";;University of Sydney;;", "aff_domain": ";;sydney.edu.au;;", "position": ";;PhD student;;", "bibtex": "@inproceedings{\nfox2021a,\ntitle={A Block Minifloat Representation for Training Deep Neural Networks},\nauthor={Sean Fox and Seyedramin Rasoulinezhad and Julian Faraone and david boland and Philip Leong},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=6zaTwpNSsQ2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7", "confidence": "4;3;5", "wc_review": "369;501;652", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "615;461;432", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 507.3333333333333, 115.6210284603204 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 502.6666666666667, 80.30912498312726 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 54, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16976294661452027229&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=6zaTwpNSsQ2", "email": ";;sydney.edu.au;;", "author_num": 5, "aff_unique_index": "0", "aff_unique_norm": "University of Sydney", "aff_unique_dep": "", "aff_unique_url": "https://www.sydney.edu.au", "aff_unique_abbr": "USYD", "aff_country_unique_index": "0", "aff_country_unique": "Australia" }, { "title": "Individually Fair Rankings", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2627", "id": "71zCSP_HuBN", "poster": "", "openreview": "https://openreview.net/forum?id=71zCSP_HuBN", "slides": "https://iclr.cc/virtual/2021/poster/2627", "video": "https://iclr.cc/virtual/2021/poster/2627", "author_site": "Amanda Bower, Hamid Eftekhari, Mikhail Yurochkin, Yuekai Sun", "tldr": "", "abstract": "We develop an algorithm to train individually fair learning-to-rank (LTR) models. The proposed approach ensures items from minority groups appear alongside similar items from majority groups. This notion of fair ranking is based on the definition of individual fairness from supervised learning and is more nuanced than prior fair LTR approaches that simply ensure the ranking model provides underrepresented items with a basic level of exposure. The crux of our method is an optimal transport-based regularizer that enforces individual fairness and an efficient algorithm for optimizing the regularizer. We show that our approach leads to certifiably individually fair LTR models and demonstrate the efficacy of our method on ranking tasks subject to demographic biases.", "keywords": "algorithmic fairness;learning to rank;optimal transport", "primary_area": "", "supplementary_material": "/attachment/5827240815e92c318cf0d70ef1cb3471ea7d42fa.zip", "author": "Amanda Bower;Hamid Eftekhari;Mikhail Yurochkin;Yuekai Sun", "authorids": "~Amanda_Bower1;hamidef@umich.edu;~Mikhail_Yurochkin1;~Yuekai_Sun1", "gender": "F;;M;", "homepage": "http://amandarg.github.io;;https://moonfolk.github.io/;https://yuekai.github.io/", "dblp": "https://dblp.uni-trier.de/pers/hd/b/Bower:Amanda;;191/6719;", "google_scholar": "J3r0-xIAAAAJ;;QjBF9sUAAAAJ;6T1XtW8AAAAJ", "orcid": ";;;", "linkedin": "amanda-ruth-garcia-bower/;;mikhail-yurochkin-a45659114/;", "or_profile": "~Amanda_Bower1;hamidef@umich.edu;~Mikhail_Yurochkin1;~Yuekai_Sun1", "aff": "Twitter;;IBM Research;University of Michigan - Ann Arbor", "aff_domain": "twitter.com;;ibm.com;umich.edu", "position": "Researcher;;Researcher;Assistant \u2192 Associate Professor of Statistics", "bibtex": "@inproceedings{\nbower2021individually,\ntitle={Individually Fair Rankings},\nauthor={Amanda Bower and Hamid Eftekhari and Mikhail Yurochkin and Yuekai Sun},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=71zCSP_HuBN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "4;5;7;7", "confidence": "4;3;3;2", "wc_review": "532;357;388;519", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "914;805;211;456", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.75, 1.299038105676658 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 449.0, 77.41769823496433 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 596.5, 279.56618178885657 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 35, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17150896318776407988&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=71zCSP_HuBN", "email": "twitter.com;;ibm.com;umich.edu", "author_num": 4, "aff_unique_index": "0;1;2", "aff_unique_norm": "Twitter, Inc.;IBM;University of Michigan", "aff_unique_dep": ";IBM Research;", "aff_unique_url": "https://twitter.com;https://www.ibm.com/research;https://www.umich.edu", "aff_unique_abbr": "Twitter;IBM;UM", "aff_campus_unique_index": "1", "aff_campus_unique": ";Ann Arbor", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Net-DNF: Effective Deep Modeling of Tabular Data", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2539", "id": "73WTGs96kho", "poster": "", "openreview": "https://openreview.net/forum?id=73WTGs96kho", "slides": "https://iclr.cc/virtual/2021/poster/2539", "video": "https://iclr.cc/virtual/2021/poster/2539", "author_site": "Liran Katzir, Gal Elidan, Ran El-Yaniv", "tldr": "", "abstract": "A challenging open question in deep learning is how to handle tabular data. Unlike domains such as image and natural language processing, where deep architectures prevail, there is still no widely accepted neural architecture that dominates tabular data. As a step toward bridging this gap, we present Net-DNF a novel generic architecture whose inductive bias elicits models whose structure corresponds to logical Boolean formulas in disjunctive normal form (DNF) over affine soft-threshold decision terms. Net-DNFs also promote localized decisions that are taken over small subsets of the features. We present an extensive experiments showing that Net-DNFs significantly and consistently outperform fully connected networks over tabular data. With relatively few hyperparameters, Net-DNFs open the door to practical end-to-end handling of tabular data using neural networks. We present ablation studies, which justify the design choices of Net-DNF including the inductive bias elements, namely, Boolean formulation, locality, and feature selection. \n", "keywords": "Neural Networks;Architectures;Tabular Data;Predictive Modeling", "primary_area": "", "supplementary_material": "/attachment/8723f7c3a479e47f623f0d387da7816b88adb8c4.zip", "author": "Liran Katzir;Gal Elidan;Ran El-Yaniv", "authorids": "~Liran_Katzir1;~Gal_Elidan1;~Ran_El-Yaniv1", "gender": "M;M;M", "homepage": ";;http://www.cs.technion.ac.il/~rani/", "dblp": "20/3424-1;;04/1896", "google_scholar": ";;https://scholar.google.com.tw/citations?user=D9eVSd8AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Liran_Katzir1;~Gal_Elidan1;~Ran_El-Yaniv1", "aff": ";;Deci", "aff_domain": ";;deci.ai", "position": ";;Chief Scientist", "bibtex": "@inproceedings{\nkatzir2021netdnf,\ntitle={Net-{\\{}DNF{\\}}: Effective Deep Modeling of Tabular Data},\nauthor={Liran Katzir and Gal Elidan and Ran El-Yaniv},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=73WTGs96kho}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;3;2", "wc_review": "388;250;409", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "666;449;674", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 349.0, 70.52659073002182 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 596.3333333333334, 104.23157977418467 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 119, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3110919410651230026&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=73WTGs96kho", "email": ";;deci.ai", "author_num": 3, "aff_unique_index": "0", "aff_unique_norm": "Deci", "aff_unique_dep": "", "aff_unique_url": "https://www.deci.ai", "aff_unique_abbr": "Deci", "aff_country_unique_index": "0", "aff_country_unique": "Israel" }, { "id": "76M3pxkqRl", "title": "Status-Quo Policy Gradient in Multi-agent Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Individual rationality, which involves maximizing expected individual return, does not always lead to optimal individual or group outcomes in multi-agent problems. For instance, in social dilemma situations, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to mutual defection that is individually and socially sub-optimal. In contrast, humans evolve individual and socially optimal strategies in such social dilemmas. Inspired by ideas from human psychology that attribute this behavior in humans to the status-quo bias, we present a status-quo loss (SQLoss) and the corresponding policy gradient algorithm that incorporates this bias in an RL agent. We demonstrate that agents trained with SQLoss evolve individually as well as socially optimal behavior in several social dilemma matrix games. To apply SQLoss to games where cooperation and defection are determined by a sequence of non-trivial actions, we present GameDistill, an algorithm that reduces a multi-step game with visual input to a matrix game. We empirically show how agents trained with SQLoss on a GameDistill reduced version of the Coin Game evolve optimal policies. ", "keywords": "multi-agent rl;reinforcement learning;social dilemma;policy gradient;game theory", "primary_area": "", "supplementary_material": "/attachment/3577c7a2007e50c8ebb1101d0e2c02f50d26e18a.zip", "author": "Pinkesh Badjatiya;Mausoom Sarkar;Abhishek Sinha;Nikaash Puri;Jayakumar Subramanian;Siddharth Singh;Balaji Krishnamurthy", "authorids": "~Pinkesh_Badjatiya1;~Mausoom_Sarkar1;~Abhishek_Sinha1;~Nikaash_Puri1;~Jayakumar_Subramanian1;siddharth9820@gmail.com;~Balaji_Krishnamurthy1", "gender": "M;M;M;M;M;;M", "homepage": "http://pinkeshbadjatiya.github.io/;;https://a7b23.github.io/;;;;", "dblp": "198/5418;43/6264;47/9175;;202/5957;;79/1076", "google_scholar": "https://scholar.google.co.in/citations?user=9ICSXBsAAAAJ;N6J7J4IAAAAJ;https://scholar.google.com/citations?hl=en;;LewRar8AAAAJ;;n8iUBg8AAAAJ", "orcid": ";;;;0000-0003-4621-2677;;0000-0002-0366-2427", "linkedin": ";;abhisheksinha94/;;;;balaji-krishnamurthy-4241695/", "or_profile": "~Pinkesh_Badjatiya1;~Mausoom_Sarkar1;~Abhishek_Sinha1;~Nikaash_Puri1;~Jayakumar_Subramanian1;siddharth9820@gmail.com;~Balaji_Krishnamurthy1", "aff": "Adobe;Adobe;Stanford University;;Adobe Systems;;Adobe Systems", "aff_domain": "adobe.com;adobe.com;stanford.edu;;adobe.com;;adobe.com", "position": "Machine Learning Researcher and Engineer 2;Principal Researcher;MS student;;Senior Research Scientist;;Principal Scientist", "bibtex": "@misc{\nbadjatiya2021statusquo,\ntitle={Status-Quo Policy Gradient in Multi-agent Reinforcement Learning},\nauthor={Pinkesh Badjatiya and Mausoom Sarkar and Abhishek Sinha and Nikaash Puri and Jayakumar Subramanian and Siddharth Singh and Balaji Krishnamurthy},\nyear={2021},\nurl={https://openreview.net/forum?id=76M3pxkqRl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=76M3pxkqRl", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;3;5;2", "wc_review": "451;995;732;292", "wc_reply_reviewers": "149;495;0;0", "wc_reply_authors": "1503;1640;421;95", "reply_reviewers": "1;1;0;0", "reply_authors": "2;3;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 617.5, 268.9279643324584 ], "wc_reply_reviewers_avg": [ 161.0, 202.2016320408913 ], "wc_reply_authors_avg": [ 914.75, 668.5440804464579 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.39999999999999997, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2064012885745122570&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "aff_unique_index": "0;0;1;0;0", "aff_unique_norm": "Adobe;Stanford University", "aff_unique_dep": "Adobe Inc.;", "aff_unique_url": "https://www.adobe.com;https://www.stanford.edu", "aff_unique_abbr": "Adobe;Stanford", "aff_campus_unique_index": "1", "aff_campus_unique": ";Stanford", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "784_F-WCW46", "title": "Rethinking Sampling in 3D Point Cloud Generative Adversarial Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we examine the long-neglected yet important effects of point sam- pling patterns in point cloud GANs. Through extensive experiments, we show that sampling-insensitive discriminators (e.g. PointNet-Max) produce shape point clouds with point clustering artifacts while sampling-oversensitive discriminators (e.g. PointNet++, DGCNN, PointConv, KPConv) fail to guide valid shape generation. We propose the concept of sampling spectrum to depict the different sampling sensitivities of discriminators. We further study how different evaluation metrics weigh the sampling pattern against the geometry and propose several perceptual metrics forming a sampling spectrum of metrics. Guided by the proposed sampling spectrum, we discover a middle-point sampling-aware baseline discriminator, PointNet-Mix, which improves all existing point cloud generators by a large margin on sampling-related metrics. We point out that, given that recent research has been focused on the generator design, the discriminator design needs more attention. Our work provides both suggestions and tools for building future discriminators. We will release the code to facilitate future research.", "keywords": "3D point cloud;GAN;sampling pattern;evaluation metrics;discriminator", "primary_area": "", "supplementary_material": "/attachment/62cbd74795b34d2ca06a2c50efcab139588f0833.zip", "author": "He Wang;Zetian Jiang;Li Yi;Kaichun Mo;Hao Su;Leonidas Guibas", "authorids": "~He_Wang5;~Zetian_Jiang1;~Li_Yi2;~Kaichun_Mo1;~Hao_Su1;~Leonidas_Guibas1", "gender": "M;M;M;M;M;M", "homepage": "https://hughw19.github.io;http://thinklab.sjtu.edu.cn/member.html;https://ericyi.github.io/;https://cs.stanford.edu/~kaichun/;http://ai.ucsd.edu/~haosu;http://geometry.stanford.edu/", "dblp": "01/6368-10;;26/4239-1;172/1283;09/4945-1;g/LeonidasJGuibas", "google_scholar": "roCAWkoAAAAJ;;UyZL660AAAAJ;pL7JsOsAAAAJ;1P8Zu04AAAAJ;https://scholar.google.com.tw/citations?user=5JlEyTAAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~He_Wang5;~Zetian_Jiang1;~Li_Yi2;~Kaichun_Mo1;~Hao_Su1;~Leonidas_Guibas1", "aff": "Stanford University;Shanghai Jiaotong University;Google;Stanford University;University of California, San Diego;Stanford University", "aff_domain": "stanford.edu;sjtu.edu.cn;google.com;stanford.edu;ucsd.edu;stanford.edu", "position": "PhD student;PhD student;Researcher;PhD student;Assistant Professor;Full Professor", "bibtex": "@misc{\nwang2021rethinking,\ntitle={Rethinking Sampling in 3D Point Cloud Generative Adversarial Networks},\nauthor={He Wang and Zetian Jiang and Li Yi and Kaichun Mo and Hao Su and Leonidas Guibas},\nyear={2021},\nurl={https://openreview.net/forum?id=784_F-WCW46}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer5;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=784_F-WCW46", "pdf_size": 0, "rating": "4;5;6;6;7", "confidence": "4;3;4;5;5", "wc_review": "484;421;304;544;876", "wc_reply_reviewers": "0;0;0;0;359", "wc_reply_authors": "1408;483;528;1032;653", "reply_reviewers": "0;0;0;0;1", "reply_authors": "2;1;1;2;2", "rating_avg": [ 5.6, 1.0198039027185568 ], "confidence_avg": [ 4.2, 0.7483314773547882 ], "wc_review_avg": [ 525.8, 192.30018200719417 ], "wc_reply_reviewers_avg": [ 71.8, 143.6 ], "wc_reply_authors_avg": [ 820.8, 351.39914627101757 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 1.6, 0.4898979485566356 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.6289709020331511, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2669991396906652084&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;2;0;3;0", "aff_unique_norm": "Stanford University;Shanghai Jiao Tong University;Google;University of California, San Diego", "aff_unique_dep": ";;Google;", "aff_unique_url": "https://www.stanford.edu;https://www.sjtu.edu.cn;https://www.google.com;https://www.ucsd.edu", "aff_unique_abbr": "Stanford;SJTU;Google;UCSD", "aff_campus_unique_index": "0;2;0;3;0", "aff_campus_unique": "Stanford;;Mountain View;San Diego", "aff_country_unique_index": "0;1;0;0;0;0", "aff_country_unique": "United States;China" }, { "id": "78SlGFxtlM", "title": "Robust Meta-learning with Noise via Eigen-Reptile", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent years have seen a surge of interest in meta-learning techniques for tackling the few-shot learning (FSL) problem. However, the meta-learner's initial model is prone to meta-overfit, as there are only a few available samples with sampling noise. Besides, when handling the data sampled with label noise for FSL, meta-learner could be extremely sensitive to label noise. To address these two challenges that FSL with sampling and label noise. In particular, we first cast the meta-overfitting problem (overfitting on sampling and label noise) as a gradient noise problem since few available samples cause meta-learner to overfit on existing examples (clean or corrupted) of an individual task at every gradient step. We present Eigen-Reptile (ER) that updates the meta-parameters with the main direction of historical task-specific parameters to alleviate gradient noise. Specifically, the main direction is computed by a special mechanism for the parameter's large size. Furthermore, to obtain a more accurate main direction for Eigen-Reptile in the presence of label noise, we propose Introspective Self-paced Learning (ISPL) that constructs a plurality of prior models to determine which sample should be abandoned. We have proved the effectiveness of Eigen-Reptile and ISPL, respectively, theoretically and experimentally. Moreover, our experiments on different tasks demonstrate that the proposed methods outperform or achieve highly competitive performance compared with the state-of-the-art methods with or without noisy labels.", "keywords": "meta-learning;few-shot learning;generalization", "primary_area": "", "supplementary_material": "/attachment/fec1aaabee2e89559bddd5afbb33f27f42a1be2c.zip", "author": "Dong Chen;Lingfei Wu;Siliang Tang;Fangli Xu;Juncheng Li;Chang Zong;Chilie Tan;Yueting Zhuang", "authorids": "~Dong_Chen5;~Lingfei_Wu1;~Siliang_Tang1;lili@yixue.us;junchengli@zju.edu.cn;zongchang@zju.edu.cn;chilie.tan@tongdun.net;~Yueting_Zhuang1", "gender": "M;;M;;;;;M", "homepage": "https://anfeather.github.io;https://sites.google.com/view/teddy-lfwu/;https://person.zju.edu.cn/en/siliang;;;;;https://person.zju.edu.cn/yzhuang", "dblp": ";27/9060;44/5693;;;;;", "google_scholar": "yD-kDHEAAAAJ;https://scholar.google.com/citations?hl=en;8e7H3PcAAAAJ;;;;;1RD7UJAAAAAJ", "orcid": "0000-0002-4859-1757;;0000-0002-7356-9711;;;;;", "linkedin": ";;siliang-tang-4734272a/;;;;;", "or_profile": "~Dong_Chen5;~Lingfei_Wu1;~Siliang_Tang1;lili@yixue.us;junchengli@zju.edu.cn;zongchang@zju.edu.cn;chilie.tan@tongdun.net;~Yueting_Zhuang1", "aff": "Zhejiang University;International Business Machines;Zhejiang University;;;;;Zhejiang University", "aff_domain": "zju.edu;ibm.com;zju.edu.cn;;;;;zju.edu.cn", "position": "PhD student;Research Staff Member;Associate Professor;;;;;Full Professor", "bibtex": "@misc{\nchen2021robust,\ntitle={Robust Meta-learning with Noise via Eigen-Reptile},\nauthor={Dong Chen and Lingfei Wu and Siliang Tang and Fangli Xu and Juncheng Li and Chang Zong and Chilie Tan and Yueting Zhuang},\nyear={2021},\nurl={https://openreview.net/forum?id=78SlGFxtlM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=78SlGFxtlM", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "3;5;4;4", "wc_review": "287;1005;704;183", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "392;1635;819;302", "reply_reviewers": "0;0;0;0", "reply_authors": "1;3;2;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 544.75, 329.5788031715632 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 787.0, 527.109571151957 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9718823806704661852&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Zhejiang University;International Business Machines Corporation", "aff_unique_dep": ";", "aff_unique_url": "https://www.zju.edu.cn;https://www.ibm.com", "aff_unique_abbr": "ZJU;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "China;United States" }, { "id": "7AQUzh5ntX_", "title": "Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in First-person Simulated 3D Environments", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor virtual home-environment pose significant sample-efficiency challenges for reinforcement learning (RL) agents learning from sparse task rewards. To alleviate these challenges, prior work has provided extensive supervision via a combination of reward-shaping, ground-truth object-information, and expert demonstrations. In this work, we show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task during task learning with an object-centric relational RL agent. Our key insight is that learning an object-model that incorporates object-relationships into forward prediction provides a dense learning signal for unsupervised representation learning of both objects and their relationships. This, in turn, enables faster policy learning for an object-centric relational RL agent. We demonstrate our agent by introducing a set of challenging object-interaction tasks in the AI2Thor environment where learning with our attentive object-model is key to strong performance. Specifically, by comparing our agent and relational RL agents with alternative auxiliary tasks with a relational RL agent equipped with ground-truth object-information, we find that learning with our object-model best closes the performance gap in terms of both learning speed and maximum success rate. Additionally, we find that incorporating object-relationships into an object-model's forward predictions is key to learning representations that capture object-category and object-state.", "keywords": "object-centric;representation learning;reinforcement learning;sparse reward", "primary_area": "", "supplementary_material": "/attachment/639169bca46b9fa97a0ade27538c76cabfaa1aa0.zip", "author": "Wilka Torrico Carvalho;Anthony Liang;Kimin Lee;Sungryull Sohn;Honglak Lee;Richard Lewis;Satinder Singh", "authorids": "~Wilka_Torrico_Carvalho1;aliangdw@umich.edu;~Kimin_Lee1;~Sungryull_Sohn1;~Honglak_Lee2;~Richard_Lewis1;~Satinder_Singh2", "gender": "M;;M;M;;M;", "homepage": "https://wcarvalho.github.io/;;https://sites.google.com/view/kiminlee;;;;", "dblp": "230/3919;;183/6849;172/9884;;12/590;", "google_scholar": "tvJTXwoAAAAJ;;92M8xv4AAAAJ;https://scholar.google.com/citations?hl=en;;;", "orcid": ";;;;;;", "linkedin": "wilkacarvalho;;;;;;", "or_profile": "~Wilka_Torrico_Carvalho1;aliangdw@umich.edu;~Kimin_Lee1;~Sungryull_Sohn1;~Honglak_Lee2;~Richard_Lewis1;~Satinder_Singh2", "aff": "Google DeepMind;;University of California, Berkeley;University of Michigan;;University of Michigan - Ann Arbor;", "aff_domain": "deepmind.com;;berkeley.edu;umich.edu;;umich.edu;", "position": "Research Scientist Intern;;Postdoc;PhD student;;Full Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=7AQUzh5ntX_", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "4;3;4;3", "wc_review": "330;1075;425;461", "wc_reply_reviewers": "0;48;0;0", "wc_reply_authors": "191;1216;284;563", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 572.75, 293.89655918366924 ], "wc_reply_reviewers_avg": [ 12.0, 20.784609690826528 ], "wc_reply_authors_avg": [ 563.5, 400.8219679608392 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1743364810387561395&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "Google;University of California, Berkeley;University of Michigan", "aff_unique_dep": "Google DeepMind;;", "aff_unique_url": "https://deepmind.com;https://www.berkeley.edu;https://www.umich.edu", "aff_unique_abbr": "DeepMind;UC Berkeley;UM", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Berkeley;Ann Arbor", "aff_country_unique_index": "0;1;1;1", "aff_country_unique": "United Kingdom;United States" }, { "title": "SALD: Sign Agnostic Learning with Derivatives", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3221", "id": "7EDgLu9reQD", "poster": "", "openreview": "https://openreview.net/forum?id=7EDgLu9reQD", "slides": "https://iclr.cc/virtual/2021/poster/3221", "video": "https://iclr.cc/virtual/2021/poster/3221", "author_site": "Matan Atzmon, Yaron Lipman", "tldr": "", "abstract": "Learning 3D geometry directly from raw data, such as point clouds, triangle soups, or unoriented meshes is still a challenging task that feeds many downstream computer vision and graphics applications. \n\nIn this paper, we introduce SALD: a method for learning implicit neural representations of shapes directly from raw data. We generalize sign agnostic learning (SAL) to include derivatives: given an unsigned distance function to the input raw data, we advocate a novel sign agnostic regression loss, incorporating both pointwise values and gradients of the unsigned distance function. Optimizing this loss leads to a signed implicit function solution, the zero level set of which is a high quality and valid manifold approximation to the input 3D data. The motivation behind SALD is that incorporating derivatives in a regression loss leads to a lower sample complexity, and consequently better fitting. In addition, we provide empirical evidence, as well as theoretical motivation in 2D that SAL enjoys a minimal surface property, favoring minimal area solutions. More importantly, we are able to show that this property still holds for SALD, i.e., with derivatives included.\n\nWe demonstrate the efficacy of SALD for shape space learning on two challenging datasets: ShapeNet that contains inconsistent orientation and non-manifold meshes, and D-Faust that contains raw 3D scans (triangle soups). On both these datasets, we present state-of-the-art results.", "keywords": "implicit neural representations;3D shapes learning;sign agnostic learning", "primary_area": "", "supplementary_material": "", "author": "Matan Atzmon;Yaron Lipman", "authorids": "~Matan_Atzmon1;~Yaron_Lipman1", "gender": "M;", "homepage": "https://matanatz.github.io/;", "dblp": "217/2968;", "google_scholar": "BXNft08AAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Matan_Atzmon1;~Yaron_Lipman1", "aff": "Weizmann Institute;", "aff_domain": "weizmann.ac.il;", "position": "PhD student;", "bibtex": "@inproceedings{\natzmon2021sald,\ntitle={{\\{}SALD{\\}}: Sign Agnostic Learning with Derivatives},\nauthor={Matan Atzmon and Yaron Lipman},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7EDgLu9reQD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer5", "pdf_size": 0, "rating": "6;7;8;8", "confidence": "5;4;4;3", "wc_review": "375;716;588;279", "wc_reply_reviewers": "0;39;0;0", "wc_reply_authors": "648;323;667;144", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 7.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 489.5, 172.06466807569763 ], "wc_reply_reviewers_avg": [ 9.75, 16.887495373796554 ], "wc_reply_authors_avg": [ 445.5, 221.3464479046366 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 165, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14621796360268225520&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=7EDgLu9reQD", "email": "weizmann.ac.il;", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "Weizmann Institute of Science", "aff_unique_dep": "", "aff_unique_url": "https://www.weizmann.org.il", "aff_unique_abbr": "Weizmann", "aff_country_unique_index": "0", "aff_country_unique": "Israel" }, { "title": "On Data-Augmentation and Consistency-Based Semi-Supervised Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3176", "id": "7FNqrcPtieT", "poster": "", "openreview": "https://openreview.net/forum?id=7FNqrcPtieT", "slides": "https://iclr.cc/virtual/2021/poster/3176", "video": "https://iclr.cc/virtual/2021/poster/3176", "author_site": "Atin Ghosh, alexandre thiery", "tldr": "", "abstract": "Recently proposed consistency-based Semi-Supervised Learning (SSL) methods such as the Pi-model, temporal ensembling, the mean teacher, or the virtual adversarial training, achieve the state of the art results in several SSL tasks. These methods can typically reach performances that are comparable to their fully supervised counterparts while using only a fraction of labelled examples. Despite these methodological advances, the understanding of these methods is still relatively limited. To make progress, we analyse (variations of) the Pi-model in settings where analytically tractable results can be obtained. We establish links with Manifold Tangent Classifiers and demonstrate that the quality of the perturbations is key to obtaining reasonable SSL performances. Furthermore, we propose a simple extension of the Hidden Manifold Model that naturally incorporates data-augmentation schemes and offers a tractable framework for understanding SSL methods.", "keywords": "Semi-Supervised Learning;Regularization;Data augmentation", "primary_area": "", "supplementary_material": "", "author": "Atin Ghosh;Alexandre H. Thiery", "authorids": "~Atin_Ghosh1;~Alexandre_H._Thiery1", "gender": "M;M", "homepage": "https://www.linkedin.com/in/atin-ghosh-3a8b5831/;http://www.normalesup.org/~athiery/", "dblp": ";203/7143", "google_scholar": ";https://scholar.google.com.sg/citations?user=szBOsCgAAAAJ", "orcid": ";", "linkedin": ";alexandre-thiery-2981686/", "or_profile": "~Atin_Ghosh1;~Alexandre_Hoang_THIERY1", "aff": "National University of Singapore;National University of Singapore", "aff_domain": "nus.edu.sg;nus.edu.sg", "position": "PhD student;Associate Professor", "bibtex": "@inproceedings{\nghosh2021on,\ntitle={On Data-Augmentation and Consistency-Based Semi-Supervised Learning},\nauthor={Atin Ghosh and Alexandre H. Thiery},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7FNqrcPtieT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;6", "confidence": "2;3;4", "wc_review": "230;855;419", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "226;784;322", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 501.3333333333333, 261.7127347982049 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 444.0, 243.58981916328113 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 31, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14251319210408685280&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=7FNqrcPtieT", "email": "nus.edu.sg;nus.edu.sg", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "National University of Singapore", "aff_unique_dep": "", "aff_unique_url": "https://www.nus.edu.sg", "aff_unique_abbr": "NUS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Singapore" }, { "title": "ANOCE: Analysis of Causal Effects with Multiple Mediators via Constrained Structural Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3019", "id": "7I12hXRi8F", "poster": "", "openreview": "https://openreview.net/forum?id=7I12hXRi8F", "slides": "https://iclr.cc/virtual/2021/poster/3019", "video": "https://iclr.cc/virtual/2021/poster/3019", "author_site": "Hengrui Cai, Rui Song, Wenbin Lu", "tldr": "", "abstract": "In the era of causal revolution, identifying the causal effect of an exposure on the outcome of interest is an important problem in many areas, such as epidemics, medicine, genetics, and economics. Under a general causal graph, the exposure may have a direct effect on the outcome and also an indirect effect regulated by a set of mediators. An analysis of causal effects that interprets the causal mechanism contributed through mediators is hence challenging but on demand. To the best of our knowledge, there are no feasible algorithms that give an exact decomposition of the indirect effect on the level of individual mediators, due to common interaction among mediators in the complex graph. In this paper, we establish a new statistical framework to comprehensively characterize causal effects with multiple mediators, namely, ANalysis Of Causal Effects (ANOCE), with a newly introduced definition of the mediator effect, under the linear structure equation model. We further propose a constrained causal structure learning method by incorporating a novel identification constraint that specifies the temporal causal relationship of variables. The proposed algorithm is applied to investigate the causal effects of 2020 Hubei lockdowns on reducing the spread of the coronavirus in Chinese major cities out of Hubei. ", "keywords": "Causal network;Constrained optimization;COVID-19;Individual mediation effects;Structure learning", "primary_area": "", "supplementary_material": "/attachment/13883bcb1aa8fd0f6a80af5e3a5e62d67ee13d45.zip", "author": "Hengrui Cai;Rui Song;Wenbin Lu", "authorids": "~Hengrui_Cai1;~Rui_Song2;wlu4@ncsu.edu", "gender": "F;;", "homepage": "https://hengruicai.github.io/;https://song-ray.github.io/;", "dblp": "277/5831;01/2743-6.html;", "google_scholar": ";;", "orcid": ";0000-0003-1875-2115;", "linkedin": "hengrui-cai-b1a6a5b9/;;", "or_profile": "~Hengrui_Cai1;~Rui_Song2;wlu4@ncsu.edu", "aff": "North Carolina State University;North Carolina State University;", "aff_domain": "ncsu.edu;ncsu.edu;", "position": "PhD student;Full Professor;", "bibtex": "@inproceedings{\ncai2021anoce,\ntitle={{\\{}ANOCE{\\}}: Analysis of Causal Effects with Multiple Mediators via Constrained Structural Learning},\nauthor={Hengrui Cai and Rui Song and Wenbin Lu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7I12hXRi8F}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "5;6;6;8", "confidence": "4;2;3;4", "wc_review": "860;234;378;149", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 6.25, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 405.25, 275.0139769175378 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.20751433915982243, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2683053315455345350&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=7I12hXRi8F", "email": "ncsu.edu;ncsu.edu;", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "North Carolina State University", "aff_unique_dep": "", "aff_unique_url": "https://www.ncsu.edu", "aff_unique_abbr": "NCSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "7IDIy7Jb00l", "title": "Offline Meta Learning of Exploration", "track": "main", "status": "Reject", "tldr": "", "abstract": "Consider the following problem: given the complete training histories of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution. In particular, while each conventional RL agent explored and exploited its own different task, the meta-agent must identify regularities in the data that lead to effective exploration/exploitation in the unseen task. This meta-learning problem is an instance of a setting we term Offline Meta Reinforcement Learning (OMRL). To solve our challenge, we take a Bayesian RL (BRL) view, and seek to learn a Bayes-optimal policy from the offline data. We extend the recently proposed VariBAD BRL algorithm to the off-policy setting, and demonstrate learning of approximately Bayes-optimal exploration strategies from offline data using deep neural networks. For the particular problem described above, our method learns effective exploration behavior that is qualitatively different from the exploration used by any RL agent in the data. Furthermore, we find that when applied to the online meta-RL setting (agent simultaneously collects data and improves its meta-RL policy), our method is significantly more sample efficient than the state-of-the-art VariBAD.", "keywords": "Meta-RL;Offline RL;Bayesian RL", "primary_area": "", "supplementary_material": "", "author": "Ron Dorfman;Aviv Tamar", "authorids": "~Ron_Dorfman2;~Aviv_Tamar2", "gender": "M;M", "homepage": ";https://avivt.github.io/avivt/", "dblp": "271/8319;49/10622", "google_scholar": "baGUoIEAAAAJ;https://scholar.google.co.il/citations?user=kppa2vgAAAAJ", "orcid": ";", "linkedin": "ron-dorfman-756b9a13a/;", "or_profile": "~Ron_Dorfman2;~Aviv_Tamar2", "aff": "Technion, Technion;Technion, Technion", "aff_domain": "technion.ac.il;technion.ac.il", "position": "PhD student;Assistant Professor", "bibtex": "@misc{\ndorfman2021offline,\ntitle={Offline Meta Learning of Exploration},\nauthor={Ron Dorfman and Aviv Tamar},\nyear={2021},\nurl={https://openreview.net/forum?id=7IDIy7Jb00l}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=7IDIy7Jb00l", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;3;4;4", "wc_review": "528;236;713;827", "wc_reply_reviewers": "347;0;0;0", "wc_reply_authors": "1051;356;402;108", "reply_reviewers": "2;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 576.0, 223.42448388661435 ], "wc_reply_reviewers_avg": [ 86.75, 150.25540755660012 ], "wc_reply_authors_avg": [ 479.25, 348.5264516503733 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.7071067811865476, "gs_citation": 43, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4349092782616653428&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "Technion - Israel Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.technion.ac.il/en/", "aff_unique_abbr": "Technion", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Israel" }, { "id": "7IElVSrNm54", "title": "Zero-shot Fairness with Invisible Demographics", "track": "main", "status": "Reject", "tldr": "", "abstract": "In a statistical notion of algorithmic fairness, we partition individuals into groups based on some key demographic factors such as race and gender, and require that some statistics of a classifier be approximately equalized across those groups. Current approaches require complete annotations for demographic factors, or focus on an abstract worst-off group rather than demographic groups. In this paper, we consider the setting where the demographic factors are only partially available. For example, we have training examples for white-skinned and dark-skinned males, and white-skinned females, but we have zero examples for dark-skinned females. We could also have zero examples for females regardless of their skin colors. Without additional knowledge, it is impossible to directly control the discrepancy of the classifier's statistics for those invisible groups. We develop a disentanglement algorithm that splits a representation of data into a component that captures the demographic factors and another component that is invariant to them based on a context dataset. The context dataset is much like the deployment dataset, it is unlabeled but it contains individuals from all demographics including the invisible. We cluster the context set, equalize the cluster size to form a \"perfect batch\", and use it as a supervision signal for the disentanglement. We propose a new discriminator loss based on a learnable attention mechanism to distinguish a perfect batch from a non-perfect one. We evaluate our approach on standard classification benchmarks and show that it is indeed possible to protect invisible demographics.", "keywords": "fairness;missing data;adversary;classification;disentanglement", "primary_area": "", "supplementary_material": "", "author": "Thomas Kehrenberg;Viktoriia Sharmanska;Myles Scott Bartlett;Novi Quadrianto", "authorids": "~Thomas_Kehrenberg1;~Viktoriia_Sharmanska1;~Myles_Scott_Bartlett1;~Novi_Quadrianto1", "gender": "M;F;M;M", "homepage": ";https://www.imperial.ac.uk/people/sharmanska.v;;http://www.sussex.ac.uk/profiles/335583", "dblp": ";119/1466;;http://dblp.uni-trier.de/pers/hd/q/Quadrianto:Novi", "google_scholar": "vQ_8c2cAAAAJ;https://scholar.google.co.uk/citations?user=8TDBdicAAAAJ;;I-rLzGcAAAAJ", "orcid": ";;0000-0002-1318-1395;", "linkedin": ";viktoriiasharmanska;;", "or_profile": "~Thomas_Kehrenberg1;~Viktoriia_Sharmanska1;~Myles_Scott_Bartlett1;~Novi_Quadrianto1", "aff": "University of Sussex;University of Sussex;University of Sussex;Monash Indonesia", "aff_domain": "sussex.ac.uk;sussex.ac.uk;sussex.ac.uk;monash.edu", "position": "PhD student;Lecturer;PhD student;Full Professor", "bibtex": "@misc{\nkehrenberg2021zeroshot,\ntitle={Zero-shot Fairness with Invisible Demographics},\nauthor={Thomas Kehrenberg and Viktoriia Sharmanska and Myles Scott Bartlett and Novi Quadrianto},\nyear={2021},\nurl={https://openreview.net/forum?id=7IElVSrNm54}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=7IElVSrNm54", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "3;4;3;5", "wc_review": "527;697;753;1253", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "587;506;731;536", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 807.5, 270.338214094863 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 590.0, 86.40312494348801 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.8528028654224418, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:NfpHybgzlgIJ:scholar.google.com/&scioq=Zero-shot+Fairness+with+Invisible+Demographics&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "University of Sussex;Monash University", "aff_unique_dep": ";", "aff_unique_url": "https://www.sussex.ac.uk;https://www.monash.edu.id", "aff_unique_abbr": "Sussex;Monash", "aff_campus_unique_index": "1", "aff_campus_unique": ";Indonesia", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "United Kingdom;Indonesia" }, { "id": "7JSTDTZtn7-", "title": "Byzantine-Robust Learning on Heterogeneous Datasets via Resampling", "track": "main", "status": "Reject", "tldr": "", "abstract": "In Byzantine-robust distributed optimization, a central server wants to train a machine learning model over data distributed across multiple workers. However, a fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages to the server. While this problem has received significant attention recently, most current defenses assume that the workers have identical data distribution. For realistic cases when the data across workers are heterogeneous (non-iid), we design new attacks that circumvent these defenses leading to significant loss of performance. We then propose a universal resampling scheme that addresses data heterogeneity at a negligible computational cost. We theoretically and experimentally validate our approach, showing that combining resampling with existing robust algorithms is effective against challenging attacks.\n", "keywords": "Byzantine robustness;distributed training;heterogeneous dataset", "primary_area": "", "supplementary_material": "/attachment/665bd99b62cea566046deb150999a8df1083a468.zip", "author": "Lie He;Sai Praneeth Karimireddy;Martin Jaggi", "authorids": "~Lie_He1;~Sai_Praneeth_Karimireddy1;~Martin_Jaggi1", "gender": "M;M;M", "homepage": "https://liehe.github.io/;https://spkreddy.org;https://mlo.epfl.ch", "dblp": "225/5245;217/3342;17/4402", "google_scholar": "rIAYxaMAAAAJ;wKJeOQoAAAAJ;https://scholar.google.ch/citations?user=r1TJBr8AAAAJ", "orcid": ";;0000-0003-1579-5558", "linkedin": ";;", "or_profile": "~Lie_He1;~Sai_Praneeth_Karimireddy1;~Martin_Jaggi1", "aff": "EPFL - EPF Lausanne;Swiss Federal Institute of Technology Lausanne;EPFL", "aff_domain": "epfl.ch;epfl.ch;epfl.ch", "position": "PhD student;PhD student;Assistant Professor", "bibtex": "@misc{\nhe2021byzantinerobust,\ntitle={Byzantine-Robust Learning on Heterogeneous Datasets via Resampling},\nauthor={Lie He and Sai Praneeth Karimireddy and Martin Jaggi},\nyear={2021},\nurl={https://openreview.net/forum?id=7JSTDTZtn7-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=7JSTDTZtn7-", "pdf_size": 0, "rating": "5;6;7", "confidence": "3;4;3", "wc_review": "264;623;199", "wc_reply_reviewers": "0;53;0", "wc_reply_authors": "378;398;84", "reply_reviewers": "0;1;0", "reply_authors": "1;3;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 362.0, 186.45285373698806 ], "wc_reply_reviewers_avg": [ 17.666666666666668, 24.984439601924677 ], "wc_reply_authors_avg": [ 286.6666666666667, 143.53938677434692 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 35, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15774920083213249913&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "EPFL;Swiss Federal Institute of Technology Lausanne", "aff_unique_dep": ";", "aff_unique_url": "https://www.epfl.ch;https://www.epfl.ch", "aff_unique_abbr": "EPFL;EPFL", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Lausanne;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Switzerland" }, { "id": "7K0UUL9y9lE", "title": "You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Transformer-based models have come to dominate the landscape in a wide range of natural language processing (NLP) applications. The heart of the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and consequently, depends quadratically on the input sequence length. It is known that training such models on longer sequences is quite expensive, and often, prohibitively so. We show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hashing (LSH), decreases the quadratic complexity to linear. We bypass the quadratic cost by considering self-attention as a sum of individual tokens associated with Bernoulli random variables that can, in principle, be sampled at once by a single hash (although in practice, this number may be a small constant). This leads to an efficient sampling scheme to estimate self-attention which relies on specific modifications of LSH (based on feasibility of deployment on GPU architectures). We evaluate our proposed algorithm on the GLUE benchmark with standard 512 sequence length and our method achieves comparable or even slightly better performance than a standard pretrained Transformer. To evaluate whether our method can indeed handle longer sequences, we conduct experiments on long sequence (4096) language model pretraining and achieve consistent results as standard self-attention, while observing sizable inference speed-ups and memory savings.", "keywords": "self-attention;efficient;linear complexity;language model;transformer;BERT", "primary_area": "", "supplementary_material": "", "author": "Zhanpeng Zeng;Yunyang Xiong;Sathya N. Ravi;Shailesh Acharya;Glenn Fung;Vikas Singh", "authorids": "~Zhanpeng_Zeng1;~Yunyang_Xiong2;~Sathya_N._Ravi1;sachary1@amfam.com;~Glenn_Fung2;~Vikas_Singh1", "gender": "M;M;M;;M;M", "homepage": ";;http://sathyaravi.com;;https://www.ai-ml-amfam.com/;http://vsingh-www.cs.wisc.edu/", "dblp": "284/9150;140/7645;159/2123;;https://dblp.uni-trier.de/pers/f/Fung:Glenn.html;", "google_scholar": "P9ctuRUAAAAJ;k5FaRwcAAAAJ;FW-0thoAAAAJ;;AWAcQaAAAAAJ;d32BmwcAAAAJ", "orcid": ";;0000-0003-3881-6323;;;", "linkedin": ";;sathya-narayanan-ravi-74a5a128/;;glenn-fung/;", "or_profile": "~Zhanpeng_Zeng1;~Yunyang_Xiong2;~Sathya_N._Ravi1;sachary1@amfam.com;~Glenn_Fung2;~Vikas_Singh1", "aff": "University of Wisconsin, Madison;University of Wisconsin, Madison;University of Illinois, Chicago;;;University of Wisconsin, Madison", "aff_domain": "wisc.edu;wisc.edu;uic.edu;;;wisc.edu", "position": "PhD student;PhD student;Assistant Professor;;;Professor", "bibtex": "@misc{\nzeng2021you,\ntitle={You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling},\nauthor={Zhanpeng Zeng and Yunyang Xiong and Sathya N. Ravi and Shailesh Acharya and Glenn Fung and Vikas Singh},\nyear={2021},\nurl={https://openreview.net/forum?id=7K0UUL9y9lE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=7K0UUL9y9lE", "pdf_size": 0, "rating": "2;5;6;6", "confidence": "4;4;5;3", "wc_review": "378;247;323;198", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "855;881;904;706", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;2", "rating_avg": [ 4.75, 1.6393596310755 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 286.5, 69.09594778277523 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 836.5, 77.31267683892467 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 27, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11877607783928250360&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "University of Wisconsin;University of Illinois at Chicago", "aff_unique_dep": ";", "aff_unique_url": "https://www.wisc.edu;https://www.uic.edu", "aff_unique_abbr": "UW;UIC", "aff_campus_unique_index": "0;0;1;0", "aff_campus_unique": "Madison;Chicago", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "7MjfPd-Irao", "title": "Impact-driven Exploration with Contrastive Unsupervised Representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Procedurally-generated sparse reward environments pose significant challenges for many RL algorithms. The recently proposed impact-driven exploration method (RIDE) by Raileanu & Rockt\u00e4schel (2020), which rewards actions that lead to large changes (measured by $\\ell_2$-distance) in the observation embedding, achieves state-of-the-art performance on such procedurally-generated MiniGrid tasks. Yet, the definition of \"impact\" in RIDE is not conceptually clear because its learned embedding space is not inherently equipped with any similarity measure, let alone $\\ell_2$-distance. We resolve this issue in RIDE via contrastive learning. That is, we train the embedding with respect to cosine similarity, where we define two observations to be similar if the agent can reach one observation from the other within a few steps, and define impact in terms of this similarity measure. Experimental results show that our method performs similarly to RIDE on the MiniGrid benchmarks while learning a conceptually clear embedding space equipped with the cosine similarity measure. Our modification of RIDE also provides a new perspective which connects RIDE and episodic curiosity (Savinov et al., 2019), a different exploration method which rewards the agent for visiting states that are unfamiliar to the agent's episodic memory. By incorporating episodic memory into our method, we outperform RIDE on the MiniGrid benchmarks.", "keywords": "reinforcement learning;exploration;curiosity;episodic memory", "primary_area": "", "supplementary_material": "", "author": "Min Jae Song;Dan Kushnir", "authorids": "~Min_Jae_Song1;~Dan_Kushnir1", "gender": "M;M", "homepage": "https://mjsong32.github.io/;", "dblp": "169/9994;87/231", "google_scholar": "6TIktJgAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Min_Jae_Song1;~Dan_Kushnir1", "aff": "New York University;Nokia networks GmbH", "aff_domain": "nyu.edu;nokia-bell-labs.com", "position": "PhD student;Researcher", "bibtex": "@misc{\nsong2021impactdriven,\ntitle={Impact-driven Exploration with Contrastive Unsupervised Representations},\nauthor={Min Jae Song and Dan Kushnir},\nyear={2021},\nurl={https://openreview.net/forum?id=7MjfPd-Irao}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=7MjfPd-Irao", "pdf_size": 0, "rating": "4;4;4;7", "confidence": "4;4;3;3", "wc_review": "512;645;603;870", "wc_reply_reviewers": "0;633;212;0", "wc_reply_authors": "926;1292;1657;1094", "reply_reviewers": "0;1;1;0", "reply_authors": "2;2;3;2", "rating_avg": [ 4.75, 1.299038105676658 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 657.5, 131.76968543637037 ], "wc_reply_reviewers_avg": [ 211.25, 258.42153064324964 ], "wc_reply_authors_avg": [ 1242.25, 272.2520661078626 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1097334777115033886&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "New York University;Nokia Networks", "aff_unique_dep": ";", "aff_unique_url": "https://www.nyu.edu;https://networks.nokia.com", "aff_unique_abbr": "NYU;Nokia", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Germany" }, { "id": "7ODIasgLJlU", "title": "Deep Q-Learning with Low Switching Cost", "track": "main", "status": "Reject", "tldr": "", "abstract": "We initiate the study on deep reinforcement learning problems that require low switching cost, i.e., small number of policy switches during training. Such a requirement is ubiquitous in many applications, such as medical domains, recommendation systems, education, robotics, dialogue agents, etc, where the deployed policy that actually interacts with the environment cannot change frequently. Our paper investigates different policy switching criteria based on deep Q-networks and further proposes an adaptive approach based on the feature distance between the deployed Q-network and the underlying learning Q-network. Through extensive experiments on a medical treatment environment and a collection of the Atari games, we find our feature-switching criterion substantially decreases the switching cost while maintains a similar sample efficiency to the case without the low-switching-cost constraint. We also complement this empirical finding with a theoretical justification from a representation learning perspective.", "keywords": "deep Q-network;DQN;switching cost;deep Q-learning", "primary_area": "", "supplementary_material": "/attachment/05ea8287ef5a76378ef5bd667148706272f3fdd3.zip", "author": "Shusheng Xu;Simon Shaolei Du;Yi Wu", "authorids": "~Shusheng_Xu1;~Simon_Shaolei_Du1;~Yi_Wu1", "gender": "M;M;M", "homepage": ";http://simonshaoleidu.com;https://jxwuyi.weebly.com", "dblp": "121/0926;176/5602;", "google_scholar": "2J051LYAAAAJ;OttawxUAAAAJ;dusV5HMAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Shusheng_Xu1;~Simon_Shaolei_Du1;~Yi_Wu1", "aff": "Tsinghua University;Meta Facebook;Tsinghua University", "aff_domain": "tsinghua.edu.cn;fb.com;tsinghua.edu.cn", "position": "PhD student;Visiting Professor;Assistant Professor", "bibtex": "@misc{\nxu2021deep,\ntitle={Deep Q-Learning with Low Switching Cost},\nauthor={Shusheng Xu and Simon Shaolei Du and Yi Wu},\nyear={2021},\nurl={https://openreview.net/forum?id=7ODIasgLJlU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=7ODIasgLJlU", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "5;4;3;3", "wc_review": "123;299;372;375", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 292.25, 102.34592077850489 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:_mQtPhu4tvUJ:scholar.google.com/&scioq=Deep+Q-Learning+with+Low+Switching+Cost&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Tsinghua University;Meta", "aff_unique_dep": ";Meta Platforms, Inc.", "aff_unique_url": "https://www.tsinghua.edu.cn;https://meta.com", "aff_unique_abbr": "THU;Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "China;United States" }, { "title": "Optimal Regularization can Mitigate Double Descent", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2540", "id": "7R7fAoUygoa", "poster": "", "openreview": "https://openreview.net/forum?id=7R7fAoUygoa", "slides": "https://iclr.cc/virtual/2021/poster/2540", "video": "https://iclr.cc/virtual/2021/poster/2540", "author_site": "Preetum Nakkiran, Prayaag Venkat, Sham M Kakade, Tengyu Ma", "tldr": "", "abstract": "Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size. This striking phenomenon, often referred to as \"double descent\", has raised questions of if we need to re-think our current understanding of generalization. In this work, we study whether the double-descent phenomenon can be avoided by using optimal regularization. Theoretically, we prove that for certain linear regression models with isotropic data distribution, optimally-tuned $\\ell_2$ regularization achieves monotonic test performance as we grow either the sample size or the model size.\nWe also demonstrate empirically that optimally-tuned $\\ell_2$ regularization can mitigate double descent for more general models, including neural networks.\nOur results suggest that it may also be informative to study the test risk scalings of various algorithms in the context of appropriately tuned regularization.", "keywords": "double descent;generalization;regularization;regression;monotonicity", "primary_area": "", "supplementary_material": "", "author": "Preetum Nakkiran;Prayaag Venkat;Sham M. Kakade;Tengyu Ma", "authorids": "~Preetum_Nakkiran1;pvenkat@g.harvard.edu;~Sham_M._Kakade1;~Tengyu_Ma1", "gender": ";;M;M", "homepage": "http://preetum.nakkiran.org;;https://shamulent.github.io;http://ai.stanford.edu/~tengyuma/", "dblp": "151/6343;;s/SMKakade;54/9061", "google_scholar": "zithBbUAAAAJ;;https://scholar.google.com.tw/citations?user=wb-DKCIAAAAJ;i38QlUwAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Preetum_Nakkiran1;pvenkat@g.harvard.edu;~Sham_M._Kakade1;~Tengyu_Ma1", "aff": "Harvard University;;;Facebook AI Research", "aff_domain": "harvard.edu;;;fb.com", "position": "PhD student;;;Visiting Scientist", "bibtex": "@inproceedings{\nnakkiran2021optimal,\ntitle={Optimal Regularization can Mitigate Double Descent},\nauthor={Preetum Nakkiran and Prayaag Venkat and Sham M. Kakade and Tengyu Ma},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7R7fAoUygoa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "3;3;4;3", "wc_review": "334;286;404;525", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "430;101;351;289", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 387.25, 89.9204509552749 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 292.75, 121.46270003585462 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 152, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15740064171488920158&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=7R7fAoUygoa", "email": "harvard.edu;;;fb.com", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Harvard University;Meta", "aff_unique_dep": ";Facebook AI Research", "aff_unique_url": "https://www.harvard.edu;https://research.facebook.com", "aff_unique_abbr": "Harvard;FAIR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "7TBP8k7TLFA", "title": "Universal Approximation Theorem for Equivariant Maps by Group CNNs", "track": "main", "status": "Reject", "tldr": "", "abstract": "Group symmetry is inherent in a wide variety of data distributions. Data processing that preserves symmetry is described as an equivariant map and often effective in achieving high performance. Convolutional neural networks (CNNs) have been known as models with equivariance and shown to approximate equivariant maps for some specific groups. However, universal approximation theorems for CNNs have been separately derived with individual techniques according to each group and setting. This paper provides a unified method to obtain universal approximation theorems for equivariant maps by CNNs in various settings. As its significant advantage, we can handle non-linear equivariant maps between infinite-dimensional spaces for non-compact groups.", "keywords": "Universal Approximation Theorem;CNN;Deep Learning;Symmetry", "primary_area": "", "supplementary_material": "/attachment/51ba4ea25257c84a1e4736f051679a2d3de00006.zip", "author": "Wataru Kumagai;Akiyoshi Sannai", "authorids": "~Wataru_Kumagai2;~Akiyoshi_Sannai1", "gender": "M;M", "homepage": "https://sites.google.com/site/watarukumagaiswebpage/;https://sites.google.com/view/akiyoshisannai/%E3%83%9B%E3%83%BC%E3%83%A0", "dblp": ";220/5533", "google_scholar": "https://scholar.google.co.jp/citations?user=rd5MEO8AAAAJ;https://scholar.google.com/citations?hl=ja", "orcid": ";", "linkedin": ";", "or_profile": "~Wataru_Kumagai2;~Akiyoshi_Sannai1", "aff": "Omron Sinic X;RIKEN", "aff_domain": "sinicx.com;riken.jp", "position": "Researcher;Researcher", "bibtex": "@misc{\nkumagai2021universal,\ntitle={Universal Approximation Theorem for Equivariant Maps by Group {\\{}CNN{\\}}s},\nauthor={Wataru Kumagai and Akiyoshi Sannai},\nyear={2021},\nurl={https://openreview.net/forum?id=7TBP8k7TLFA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=7TBP8k7TLFA", "pdf_size": 0, "rating": "5;5;7", "confidence": "3;3;3", "wc_review": "542;955;375", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "783;1662;197", "reply_reviewers": "0;0;0", "reply_authors": "1;3;1", "rating_avg": [ 5.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 624.0, 243.7799554242856 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 880.6666666666666, 602.0577676233033 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 20, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10936423900069358066&as_sdt=5,33&sciodt=0,33&hl=en&oe=ASCII", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "OMRON Corporation;RIKEN", "aff_unique_dep": "Sinic X Division;", "aff_unique_url": "https://www.omron.com;https://www.riken.jp", "aff_unique_abbr": "Omron;RIKEN", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "id": "7UyqgFhPqAd", "title": "Connection- and Node-Sparse Deep Learning: Statistical Guarantees", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural networks are becoming increasingly popular in applications, but a comprehensive mathematical understanding of their potentials and limitations is still missing. In this paper, we study the prediction accuracies of neural networks from a statistical point of view. In particular, we establish statistical prediction guarantees for deep learning with different types of sparsity-inducing regularization. Our bounds feature a mild dependence on network widths and depths, and, therefore, support the current trend toward wide and deep networks. The tools that we use in our derivations are uncommon in deep learning and, hence, might be of additional interest.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Johannes Lederer", "authorids": "~Johannes_Lederer1", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@misc{\nlederer2021connection,\ntitle={Connection- and Node-Sparse Deep Learning: Statistical Guarantees},\nauthor={Johannes Lederer},\nyear={2021},\nurl={https://openreview.net/forum?id=7UyqgFhPqAd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=7UyqgFhPqAd", "pdf_size": 0, "rating": "4;5;6", "confidence": "4;3;3", "wc_review": "359;310;217", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 295.3333333333333, 58.891614192703386 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:fFQtW900DooJ:scholar.google.com/&scioq=Connection-+and+Node-Sparse+Deep+Learning:+Statistical+Guarantees&hl=en&as_sdt=0,5", "gs_version_total": 0 }, { "id": "7WwYBADS3E_", "title": "Learning Lagrangian Fluid Dynamics with Graph Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present a data-driven model for fluid simulation under Lagrangian representation. Our model uses graphs to describe the fluid field, where physical quantities are encoded as node and edge features. Instead of directly predicting the acceleration or position correction given the current state, we decompose the simulation scheme into separate parts - advection, collision, and pressure projection. For these different reasoning tasks, we propose two kinds of graph neural network structures, node-focused networks, and edge-focused networks. By introducing physics prior knowledge, our model can be efficient in terms of training and inference. Our tests show that the learned model can produce accurate results and remain stable in scenarios with a large amount of particles and different geometries. Unlike many previous works, further tests demonstrate that our model is able to retain many important physical properties of incompressible fluids, such as minor divergence and reasonable pressure distribution. Additionally, our model can adopt a range of time step sizes different from ones using in the training set, which indicates its robust generalization capability.", "keywords": "particle hydrodynamics;graph neural networks;Lagrangian fluids", "primary_area": "", "supplementary_material": "", "author": "Zijie Li;Amir Barati Farimani", "authorids": "~Zijie_Li2;barati@cmu.edu", "gender": ";", "homepage": ";", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Zijie_Li2;barati@cmu.edu", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@misc{\nli2021learning,\ntitle={Learning Lagrangian Fluid Dynamics with Graph Neural Networks},\nauthor={Zijie Li and Amir Barati Farimani},\nyear={2021},\nurl={https://openreview.net/forum?id=7WwYBADS3E_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=7WwYBADS3E_", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;4;4;4", "wc_review": "443;1120;1371;742", "wc_reply_reviewers": "0;0;1020;241", "wc_reply_authors": "447;1135;1767;1120", "reply_reviewers": "0;0;2;1", "reply_authors": "1;2;3;2", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 919.0, 354.474963854995 ], "wc_reply_reviewers_avg": [ 315.25, 418.6140077684931 ], "wc_reply_authors_avg": [ 1117.25, 466.83314738780064 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13384487668204787923&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0 }, { "id": "7YctWnyhjpL", "title": "Multi-Task Learning by a Top-Down Control Network", "track": "main", "status": "Reject", "tldr": "", "abstract": "As the range of tasks performed by a general vision system expands, executing multiple tasks accurately and ef\ufb01ciently in a single network has become an important and still open problem. Recent computer vision approaches address this problem by branching networks, or by a channel-wise modulation of the network feature-maps with task speci\ufb01c vectors. We present a novel architecture that uses a dedicated top-down control network to modify the activation of all the units in the main recognition network in a manner that depends on the selected task, image content, and spatial location. We show the effectiveness of our scheme by achieving signi\ufb01cantly better results than alternative state-of-the-art approaches on four datasets. We further demonstrate our advantages in terms of task selectivity, scaling the number of tasks and interpretability. \nCode is supplied in the supplementary materials and will be publicly available.", "keywords": "multi task learning;computer vision", "primary_area": "", "supplementary_material": "/attachment/e0e5898604fed68e05c19105e20a905c5638593e.zip", "author": "Hila Levi;Shimon Ullman", "authorids": "~Hila_Levi1;~Shimon_Ullman1", "gender": "F;M", "homepage": ";http://www.weizmann.ac.il/math/shimon/", "dblp": "226/2659;93/2158", "google_scholar": ";XOfA8ckAAAAJ", "orcid": ";0000-0003-4331-298X", "linkedin": ";", "or_profile": "~Hila_Levi1;~Shimon_Ullman1", "aff": "Weizmann Institute;Weizmann Institute of Science", "aff_domain": "weizmann.ac.il;weizmann.ac.il", "position": "PhD student;Emeritus", "bibtex": "@misc{\nlevi2021multitask,\ntitle={Multi-Task Learning by a Top-Down Control Network},\nauthor={Hila Levi and Shimon Ullman},\nyear={2021},\nurl={https://openreview.net/forum?id=7YctWnyhjpL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=7YctWnyhjpL", "pdf_size": 0, "rating": "5;5;7", "confidence": "5;3;3", "wc_review": "243;248;248", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "277;476;376", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 246.33333333333334, 2.3570226039551585 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 376.3333333333333, 81.24175171808041 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9003912532157944068&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0", "aff_unique_norm": "Weizmann Institute of Science", "aff_unique_dep": "", "aff_unique_url": "https://www.weizmann.org.il", "aff_unique_abbr": "Weizmann", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Israel" }, { "id": "7Yhok3vJpU", "title": "High-Likelihood Area Matters --- Rewarding Correct,Rare Predictions Under Imbalanced Distributions", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning from natural datasets poses significant challenges for traditional classification methods based on the cross-entropy objective due to imbalanced class distributions. It is intuitive to assume that the examples from rare classes are harder to learn so that the classifier is uncertain of the prediction, which establishes the low-likelihood area. Based on this, existing approaches drive the classifier actively to correctly predict those incorrect, rare examples. However, this assumption is one-sided and could be misleading. We find in practice that the high-likelihood area contains correct predictions for rare class examples and it plays a vital role in learning imbalanced class distributions. In light of this finding, we propose the Eureka Loss, which rewards the classifier when examples belong to rare classes in the high-likelihood area are correctly predicted. Experiments on the large-scale long-tailed iNaturalist 2018 classification dataset and the ImageNet-LT benchmark both validate the proposed approach. We further analyze the influence of the Eureka Loss in detail on diverse data distributions.", "keywords": "classification;imbalance;long-tailed;likelihood;focal loss", "primary_area": "", "supplementary_material": "/attachment/3b8719a7b3e04b1b92a18a36ef61bf5fdfd678f1.zip", "author": "guangxiang zhao;Lei Li;Xuancheng Ren;Xu Sun;Bin He", "authorids": "~guangxiang_zhao2;lilei@stu.pku.edu.cn;~Xuancheng_Ren1;~Xu_Sun1;hebin.nlp@huawei.com", "gender": "M;;;M;", "homepage": "https://zhaoguangxiang.github.io/;;;https://xusun.org/;", "dblp": "225/5457;;;37/1971-1;", "google_scholar": "https://scholar.google.co.jp/citations?hl=CN;;;https://scholar.google.com/citations?hl=en;", "orcid": "0000-0002-3046-512X;;;;", "linkedin": "guangxiang-zhao-11990913b/;;;;", "or_profile": "~guangxiang_zhao2;lilei@stu.pku.edu.cn;~Xuancheng_Ren1;~Xu_Sun1;hebin.nlp@huawei.com", "aff": "Peking University;;;Peking University;", "aff_domain": "pku.edu.cn;;;pku.edu.cn;", "position": "Ph.D student;;;Associate Professor;", "bibtex": "@misc{\nzhao2021highlikelihood,\ntitle={High-Likelihood Area Matters --- Rewarding Correct,Rare Predictions Under Imbalanced Distributions},\nauthor={guangxiang zhao and Lei Li and Xuancheng Ren and Xu Sun and Bin He},\nyear={2021},\nurl={https://openreview.net/forum?id=7Yhok3vJpU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=7Yhok3vJpU", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "3;5;3;3", "wc_review": "174;261;256;694", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "392;458;227;868", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 346.25, 203.7233111354712 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 486.25, 235.91563640420276 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:5Qvh3lk1z7AJ:scholar.google.com/&scioq=High-Likelihood+Area+Matters+---+Rewarding+Correct,Rare+Predictions+Under+Imbalanced+Distributions&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Peking University", "aff_unique_dep": "", "aff_unique_url": "http://www.pku.edu.cn", "aff_unique_abbr": "Peking U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "7Z29QbHxIL", "title": "FTSO: Effective NAS via First Topology Second Operator", "track": "main", "status": "Reject", "tldr": "", "abstract": "Existing one-shot neural architecture search (NAS) methods generally contain a giant supernet, which leads to heavy computational cost. Our method, named FTSO, separates the whole architecture search into two sub-steps. In the first step, we only search for the topology, and in the second step, we only search for the operators. FTSO not only reduces NAS\u2019s search time from days to 0.68 seconds, but also significantly improves the accuracy. Specifically, our experiments on ImageNet show that within merely 18 seconds, FTSO can achieve 76.4% testing accuracy, 1.5% higher than the baseline, PC-DARTS. In addition, FTSO can reach 97.77% testing accuracy, 0.27% higher than the baseline, with 99.8% of search time saved on CIFAR10.", "keywords": "Neural Architecture Search;DARTS", "primary_area": "", "supplementary_material": "/attachment/82b9237bb29fa642fb044c00e6a19d573516fdef.zip", "author": "Likang Wang;Lei Chen", "authorids": "~Likang_Wang1;~Lei_Chen7", "gender": ";M", "homepage": "https://github.com/;http://www.cs.ust.hk/~leichen/", "dblp": "210/6148;c/LeiChen0002", "google_scholar": ";gtglwgYAAAAJ", "orcid": ";0000-0002-8257-5806", "linkedin": ";", "or_profile": "~Likang_Wang1;~Lei_Chen7", "aff": "HKUST;Hong Kong University of Science and Technology", "aff_domain": "hkust.edu;hkust.edu", "position": "PhD student;Full Professor", "bibtex": "@misc{\nwang2021ftso,\ntitle={{\\{}FTSO{\\}}: Effective {\\{}NAS{\\}} via First Topology Second Operator},\nauthor={Likang Wang and Lei Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=7Z29QbHxIL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=7Z29QbHxIL", "pdf_size": 0, "rating": "3;4;5", "confidence": "5;4;3", "wc_review": "301;600;590", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "277;1501;1399", "reply_reviewers": "0;0;0", "reply_authors": "1;3;2", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 497.0, 138.6530442026668 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1059.0, 554.5232186302031 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.816496580927726 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10980357696203120309&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0", "aff_unique_norm": "Hong Kong University of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ust.hk", "aff_unique_abbr": "HKUST", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hong Kong SAR", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "7ZJPhriEdRQ", "title": "AR-ELBO: Preventing Posterior Collapse Induced by Oversmoothing in Gaussian VAE", "track": "main", "status": "Reject", "tldr": "", "abstract": "Variational autoencoders (VAEs) often suffer from posterior collapse, which is a phenomenon that the learned latent space becomes uninformative. This is related to local optima introduced by a fixed hyperparameter resembling the data variance in the objective function. We suggest that this variance parameter regularizes the VAE and affects its smoothness, which is the magnitude of its gradient. An inappropriate choice of this parameter causes oversmoothness and leads to posterior collapse. This is shown theoretically by analysis on the linear approximated objective function and empirically in general cases. We propose AR-ELBO, which stands for adaptively regularized ELBO~(Evidence Lower BOund). It controls the strength of regularization by adapting the variance parameter, and thus avoids oversmoothing the model. Generation models trained by proposed objectives show improved Fr\u00e9chet inception distance~(FID) of images generated from the MNIST and CelebA datasets.", "keywords": "Generative model;variational autoencoders;posterior collapse;regularization", "primary_area": "", "supplementary_material": "", "author": "Yuhta Takida;Wei-Hsiang Liao;Toshimitsu Uesaka;Shusuke Takahashi;Yuki Mitsufuji", "authorids": "~Yuhta_Takida1;weihsiang.liao@sony.com;toshimitsu.uesaka@sony.com;shusuke.takahashi@sony.com;~Yuki_Mitsufuji1", "gender": "M;;;;M", "homepage": ";;;;https://www.yukimitsufuji.com/", "dblp": "225/9928;;;;136/5043", "google_scholar": "https://scholar.google.co.jp/citations?user=ahqdEYUAAAAJ;;;;https://scholar.google.com/citations?hl=en", "orcid": ";;;;0000-0002-6806-6140", "linkedin": ";;;;mittu1204", "or_profile": "~Yuhta_Takida1;weihsiang.liao@sony.com;toshimitsu.uesaka@sony.com;shusuke.takahashi@sony.com;~Yuki_Mitsufuji1", "aff": "Sony Corporation;;;;", "aff_domain": "sony.com;;;;", "position": "Engineer;;;;", "bibtex": "@misc{\ntakida2021arelbo,\ntitle={{\\{}AR{\\}}-{\\{}ELBO{\\}}: Preventing Posterior Collapse Induced by Oversmoothing in Gaussian {\\{}VAE{\\}}},\nauthor={Yuhta Takida and Wei-Hsiang Liao and Toshimitsu Uesaka and Shusuke Takahashi and Yuki Mitsufuji},\nyear={2021},\nurl={https://openreview.net/forum?id=7ZJPhriEdRQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=7ZJPhriEdRQ", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "3;4;4;4", "wc_review": "712;240;339;356", "wc_reply_reviewers": "0;0;21;0", "wc_reply_authors": "328;178;566;295", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 411.75, 178.91950005519243 ], "wc_reply_reviewers_avg": [ 5.25, 9.093266739736606 ], "wc_reply_authors_avg": [ 341.75, 140.9581054781881 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.9271726499455306, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:NzIxVA3dZgwJ:scholar.google.com/&scioq=AR-ELBO:+Preventing+Posterior+Collapse+Induced+by+Oversmoothing+in+Gaussian+VAE&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Sony Corporation", "aff_unique_dep": "", "aff_unique_url": "https://www.sony.com", "aff_unique_abbr": "Sony", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "title": "Monte-Carlo Planning and Learning with Language Action Value Estimates", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3128", "id": "7_G8JySGecm", "poster": "", "openreview": "https://openreview.net/forum?id=7_G8JySGecm", "slides": "https://iclr.cc/virtual/2021/poster/3128", "video": "https://iclr.cc/virtual/2021/poster/3128", "author_site": "Youngsoo Jang, Seokin Seo, Jongmin Lee, Kee-Eung Kim", "tldr": "", "abstract": "Interactive Fiction (IF) games provide a useful testbed for language-based reinforcement learning agents, posing significant challenges of natural language understanding, commonsense reasoning, and non-myopic planning in the combinatorial search space. Agents based on standard planning algorithms struggle to play IF games due to the massive search space of language actions. Thus, language-grounded planning is a key ability of such agents, since inferring the consequence of language action based on semantic understanding can drastically improve search. In this paper, we introduce Monte-Carlo planning with Language Action Value Estimates (MC-LAVE) that combines a Monte-Carlo tree search with language-driven exploration. MC-LAVE invests more search effort into semantically promising language actions using locally optimistic language value estimates, yielding a significant reduction in the effective search space of language actions. We then present a reinforcement learning approach via MC-LAVE, which alternates between MC-LAVE planning and supervised learning of the self-generated language actions. In the experiments, we demonstrate that our method achieves new high scores in various IF games.", "keywords": "natural language processing;Monte-Carlo tree search;reinforcement learning;interactive fiction", "primary_area": "", "supplementary_material": "", "author": "Youngsoo Jang;Seokin Seo;Jongmin Lee;Kee-Eung Kim", "authorids": "~Youngsoo_Jang2;~Seokin_Seo1;~Jongmin_Lee1;~Kee-Eung_Kim4", "gender": ";;M;M", "homepage": "http://www.ysjang.me;https://sites.google.com/view/siseo0;https://www.jmlee.kr;http://ailab.kaist.ac.kr", "dblp": "195/0471;231/7699;68/222-4.html;35/6703", "google_scholar": "6EoBBggAAAAJ;https://scholar.google.com/citations?hl=en;https://scholar.google.co.kr/citations?user=rFcK8EEAAAAJ;https://scholar.google.com/citations?hl=ko", "orcid": ";;;", "linkedin": ";seokin-seo-026ab4150/;jmlee123/;", "or_profile": "~Youngsoo_Jang2;~Seokin_Seo1;~Jongmin_Lee1;~Kee-Eung_Kim2", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "position": "PhD student;PhD student;PhD student;Full Professor", "bibtex": "@inproceedings{\njang2021montecarlo,\ntitle={Monte-Carlo Planning and Learning with Language Action Value Estimates},\nauthor={Youngsoo Jang and Seokin Seo and Jongmin Lee and Kee-Eung Kim},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7_G8JySGecm}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer5", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "4;4;4;4", "wc_review": "674;210;435;654", "wc_reply_reviewers": "327;12;0;102", "wc_reply_authors": "801;316;512;789", "reply_reviewers": "1;1;0;1", "reply_authors": "2;1;1;2", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 493.25, 188.5038129587834 ], "wc_reply_reviewers_avg": [ 110.25, 131.20284867334246 ], "wc_reply_authors_avg": [ 604.5, 202.75662751190157 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7320603322907339308&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=7_G8JySGecm", "email": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "author_num": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "7_MJnN-U9hm", "title": "PHEW: Paths with Higher Edge-Weights give ''winning tickets'' without training data", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Sparse neural networks have generated substantial interest recently because they can be more efficient in learning and inference, without any significant drop in performance. The \"lottery ticket hypothesis\" has showed the existence of such sparse subnetworks at initialization. Given a fully-connected initialized architecture, our aim is to find such \"winning ticket\" networks, without any training data. We first show the advantages of forming input-output paths, over pruning individual connections, to avoid bottlenecks in gradient propagation. Then, we show that Paths with Higher Edge-Weights (PHEW) at initialization have higher loss gradient magnitude, resulting in more efficient training. Selecting such paths can be performed without any data. We empirically validate the effectiveness of the proposed approach against pruning-before-training methods on CIFAR10, CIFAR100 and Tiny-ImageNet for VGG-Net and ResNet. PHEW achieves significant improvements on the current state-of-the-art methods at 10%, 5% and 2% network density. We also evaluate the structural similarity relationship between PHEW networks and pruned networks constructed through Iterated Magnitude Pruning (IMP), concluding that the former belong in the family of winning tickets networks.", "keywords": "Sparse Neural Networks;Pruning", "primary_area": "", "supplementary_material": "/attachment/68d8b6fa63f8fa4d8e14a6b38507377000589e28.zip", "author": "Shreyas Malakarjun Patil;Constantine Dovrolis", "authorids": "~Shreyas_Malakarjun_Patil1;~Constantine_Dovrolis1", "gender": "M;M", "homepage": ";http://www.cc.gatech.edu/~dovrolis/", "dblp": "206/6500;d/ConstantinosDovrolis", "google_scholar": "https://scholar.google.co.in/citations?user=iKVRiDsAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";", "linkedin": "shreyas-malakarjun-patil;", "or_profile": "~Shreyas_Malakarjun_Patil1;~Constantine_Dovrolis1", "aff": "Georgia Institute of Technology;College of Computing, Georgia Institute of Technology", "aff_domain": "gatech.edu;cc.gatech.edu", "position": "PhD student;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer5;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=7_MJnN-U9hm", "pdf_size": 0, "rating": "3;5;5;5;7", "confidence": "4;5;5;5;4", "wc_review": "3423;388;723;228;622", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 5.0, 1.2649110640673518 ], "confidence_avg": [ 4.6, 0.4898979485566356 ], "wc_review_avg": [ 1076.8, 1185.8818490895287 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:u4g6DnzXwkYJ:scholar.google.com/&scioq=PHEW:+Paths+with+Higher+Edge-Weights+give+%27%27winning+tickets%27%27+without+training+data&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Georgia Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.gatech.edu", "aff_unique_abbr": "Georgia Tech", "aff_campus_unique_index": "1", "aff_campus_unique": ";Atlanta", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "A Learning Theoretic Perspective on Local Explainability", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2706", "id": "7aL-OtQrBWD", "poster": "", "openreview": "https://openreview.net/forum?id=7aL-OtQrBWD", "slides": "https://iclr.cc/virtual/2021/poster/2706", "video": "https://iclr.cc/virtual/2021/poster/2706", "author_site": "Jeffrey Li, Vaishnavh Nagarajan, Gregory Plumb, Ameet Talwalkar", "tldr": "", "abstract": "In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations. First, we tackle the traditional problem of performance generalization and bound the test-time predictive accuracy of a model using a notion of how locally explainable it is. Second, we explore the novel problem of explanation generalization which is an important concern for a growing class of finite sample-based local approximation explanations. Finally, we validate our theoretical results empirically and show that they reflect what can be seen in practice.", "keywords": "Interpretability;Learning Theory;Local Explanations;Generalization", "primary_area": "", "supplementary_material": "/attachment/7063b19c3915db57adaddf3841e3343cc963dc7f.zip", "author": "Jeffrey Li;Vaishnavh Nagarajan;Gregory Plumb;Ameet Talwalkar", "authorids": "~Jeffrey_Li1;~Vaishnavh_Nagarajan3;~Gregory_Plumb2;~Ameet_Talwalkar1", "gender": "M;;M;M", "homepage": ";https://gdplumb.github.io;http://www.cs.cmu.edu/~atalwalk/;https://vaishnavh.github.io/", "dblp": ";;56/5528;161/0079", "google_scholar": "JDS2BnIAAAAJ;_f4rfHYAAAAJ;https://scholar.google.com.tw/citations?user=TW7U1W0AAAAJ;https://scholar.google.nl/citations?user=LrsjJfwAAAAJ", "orcid": ";;;", "linkedin": "jeffrey-li-a78684111/;;;", "or_profile": "~Jeffrey_Li1;~Gregory_Plumb2;~Ameet_Talwalkar1;~Vaishnavh_Nagarajan1", "aff": "Department of Computer Science, University of Washington;Carnegie Mellon University;Carnegie Mellon University;School of Computer Science, Carnegie Mellon University", "aff_domain": "cs.washington.edu;cmu.edu;cmu.edu;cs.cmu.edu", "position": "PhD student;PhD student;Associate Professor;PhD student", "bibtex": "@inproceedings{\nli2021a,\ntitle={A Learning Theoretic Perspective on Local Explainability},\nauthor={Jeffrey Li and Vaishnavh Nagarajan and Gregory Plumb and Ameet Talwalkar},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7aL-OtQrBWD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "5;7;7", "confidence": "3;3;4", "wc_review": "213;556;587", "wc_reply_reviewers": "0;62;0", "wc_reply_authors": "224;761;817", "reply_reviewers": "0;1;0", "reply_authors": "1;1;1", "rating_avg": [ 6.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 452.0, 169.47172822233998 ], "wc_reply_reviewers_avg": [ 20.666666666666668, 29.227080289043965 ], "wc_reply_authors_avg": [ 600.6666666666666, 267.3229424414514 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.49999999999999983, "gs_citation": 20, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10742482946384056752&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=7aL-OtQrBWD", "email": "cs.washington.edu;cmu.edu;cmu.edu;cs.cmu.edu", "author_num": 4, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "University of Washington;Carnegie Mellon University", "aff_unique_dep": "Department of Computer Science;", "aff_unique_url": "https://www.washington.edu;https://www.cmu.edu", "aff_unique_abbr": "UW;CMU", "aff_campus_unique_index": "0;2", "aff_campus_unique": "Seattle;;Pittsburgh", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2834", "id": "7aogOj_VYO0", "poster": "", "openreview": "https://openreview.net/forum?id=7aogOj_VYO0", "slides": "https://iclr.cc/virtual/2021/poster/2834", "video": "https://iclr.cc/virtual/2021/poster/2834", "author_site": "Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu", "tldr": "", "abstract": "The privacy leakage of the model about the training data can be bounded in the differential privacy mechanism. However, for meaningful privacy parameters, a differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters. In this paper, we propose an algorithm \\emph{Gradient Embedding Perturbation (GEP)} towards training differentially private deep models with decent accuracy. Specifically, in each gradient descent step, GEP first projects individual private gradient into a non-sensitive anchor subspace, producing a low-dimensional gradient embedding and a small-norm residual gradient. Then, GEP perturbs the low-dimensional embedding and the residual gradient separately according to the privacy budget. Such a decomposition permits a small perturbation variance, which greatly helps to break the dimensional barrier of private learning. With GEP, we achieve decent accuracy with low computational cost and modest privacy guarantee for deep models. Especially, with privacy bound $\\epsilon=8$, we achieve $74.9\\%$ test accuracy on CIFAR10 and $95.1\\%$ test accuracy on SVHN, significantly improving over existing results.", "keywords": "privacy preserving machine learning;differentially private deep learning;gradient redundancy", "primary_area": "", "supplementary_material": "/attachment/a704a237827eca58f295c5d793a24711c745fbde.zip", "author": "Da Yu;Huishuai Zhang;Wei Chen;Tie-Yan Liu", "authorids": "~Da_Yu1;~Huishuai_Zhang3;~Wei_Chen1;~Tie-Yan_Liu1", "gender": "M;F;M;M", "homepage": ";https://weichen-cas.github.io/;http://member.acm.org/~tieyanliu;https://huishuai-git.github.io", "dblp": "48/8545;;l/TieYanLiu;144/7537", "google_scholar": "FcRGdiwAAAAJ;https://scholar.google.com/citations?hl=en;Nh832fgAAAAJ;w1srHyIAAAAJ", "orcid": ";;0000-0002-0476-8020;", "linkedin": ";;;", "or_profile": "~Da_Yu1;~Wei_Chen1;~Tie-Yan_Liu1;~Huishuai_Zhang2", "aff": "Microsoft;;Microsoft;Microsoft Research Asia", "aff_domain": "microsoft.com;;microsoft.com;microsoft.com", "position": "Research intern;;Distinguished Scientist;Researcher", "bibtex": "@inproceedings{\nyu2021do,\ntitle={Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning},\nauthor={Da Yu and Huishuai Zhang and Wei Chen and Tie-Yan Liu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7aogOj_VYO0}\n}", "github": "[![github](/images/github_icon.svg) dayu11/Differentially-Private-Deep-Learning](https://github.com/dayu11/Differentially-Private-Deep-Learning) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=7aogOj_VYO0)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;9", "confidence": "5;4;3;4", "wc_review": "242;322;165;638", "wc_reply_reviewers": "0;37;18;131", "wc_reply_authors": "944;698;387;738", "reply_reviewers": "0;1;1;1", "reply_authors": "2;2;3;2", "rating_avg": [ 6.75, 1.479019945774904 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 341.75, 179.8226556916564 ], "wc_reply_reviewers_avg": [ 46.5, 50.50990001969911 ], "wc_reply_authors_avg": [ 691.75, 199.17376207723748 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.47809144373375745, "gs_citation": 132, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6640760596984073226&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=7aogOj_VYO0", "email": "microsoft.com;;microsoft.com;microsoft.com", "author_num": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Microsoft Corporation", "aff_unique_url": "https://www.microsoft.com", "aff_unique_abbr": "Microsoft", "aff_campus_unique_index": "1", "aff_campus_unique": ";Asia", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;China" }, { "id": "7apQQsbahFz", "title": "Intention Propagation for Multi-agent Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "A hallmark of an AI agent is to mimic human beings to understand and interact with others. In this paper, we propose a \\emph{collaborative} multi-agent reinforcement learning algorithm to learn a \\emph{joint} policy through the interactions over agents. To make a joint decision over the group, each agent makes an initial decision and tells its policy to its neighbors. Then each agent modifies its own policy properly based on received messages and spreads out its plan. As this intention propagation procedure goes on, we prove that it converges to a mean-field approximation of the joint policy with the framework of neural embedded probabilistic inference. We evaluate our algorithm on several large scale challenging tasks and demonstrate that it outperforms previous state-of-the-arts.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/36bc034c05ec2d2ee50111f30ffc9e95d3b500fe.zip", "author": "Chao Qu;Hui Li;Chang Liu;Junwu Xiong;james zhang;Wei Chu;Weiqiang Wang;Yuan Qi;Le Song", "authorids": "~Chao_Qu2;~Hui_Li2;~Chang_Liu7;junwu.xjw@antgroup.com;james.z@antgroup.com;~Wei_Chu1;~Weiqiang_Wang4;alan.qi@gmail.com;~Le_Song1", "gender": ";;M;;;M;M;;M", "homepage": ";;https://only-changer.github.io/;;;http://weichu.github.io;https://www.linkedin.com/in/weiqiang-wang-489b925/;;http://www.cc.gatech.edu/~lsong", "dblp": ";;52/5716;;;;;;94/3481", "google_scholar": ";;BTu8eaQAAAAJ;;;3J4zb7gAAAAJ;;;Xl4E0CsAAAAJ", "orcid": ";;;;;;0000-0002-6159-619X;;", "linkedin": ";;;;;;weiqiang-wang-489b925/;;", "or_profile": "~Chao_Qu2;~Hui_Li2;~Chang_Liu7;junwu.xjw@antgroup.com;james.z@antgroup.com;~Wei_Chu1;~Weiqiang_Wang4;alan.qi@gmail.com;~Le_Song1", "aff": ";;Shanghai Jiaotong University;;;Ant Group;Ant Group;;College of Computing, Georgia Institute of Technology", "aff_domain": ";;sjtu.edu.cn;;;antgroup.com;antgroup.com;;cc.gatech.edu", "position": ";;PhD student;;;Researcher;Researcher;;Associate Professor", "bibtex": "@misc{\nqu2021intention,\ntitle={Intention Propagation for Multi-agent Reinforcement Learning},\nauthor={Chao Qu and Hui Li and Chang Liu and Junwu Xiong and james zhang and Wei Chu and Weiqiang Wang and Yuan Qi and Le Song},\nyear={2021},\nurl={https://openreview.net/forum?id=7apQQsbahFz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=7apQQsbahFz", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;2;4;4", "wc_review": "231;562;322;476", "wc_reply_reviewers": "122;474;0;0", "wc_reply_authors": "1188;899;386;419", "reply_reviewers": "1;2;0;0", "reply_authors": "3;2;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 397.75, 129.0782224079647 ], "wc_reply_reviewers_avg": [ 149.0, 194.1365498817778 ], "wc_reply_authors_avg": [ 723.0, 336.59545451476316 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18289578787555283449&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "Shanghai Jiao Tong University;Ant Group;Georgia Institute of Technology", "aff_unique_dep": ";;College of Computing", "aff_unique_url": "https://www.sjtu.edu.cn;https://www.antgroup.com;https://www.gatech.edu", "aff_unique_abbr": "SJTU;Ant Group;Georgia Tech", "aff_campus_unique_index": "1", "aff_campus_unique": ";Atlanta", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "China;United States" }, { "id": "7dmdzJz42Ro", "title": "A Simple Framework for Uncertainty in Contrastive Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Contrastive approaches to representation learning have recently shown great promise. In contrast to generative approaches, these contrastive models learn a deterministic encoder with no notion of uncertainty or confidence. In this paper, we introduce a simple approach based on \"contrasting distributions\" that learns to assign uncertainty for pretrained contrastive representations. In particular, we train a deep network from a representation to a distribution in representation space, whose variance can be used as a measure of confidence. In our experiments, we show that this deep uncertainty model can be used (1) to visually interpret model behavior, (2) to detect new noise in the input to deployed models, (3) to detect anomalies, where we outperform 10 baseline methods across 11 tasks with improvements of up to 14% absolute, and (4) to classify out-of-distribution examples where our fully unsupervised model is competitive with supervised methods.", "keywords": "uncertainty;contrastive learning;unsupervised learning;anomaly detection;out of distribution;corruption", "primary_area": "", "supplementary_material": "/attachment/e720d8f87d7be0e5761d3f4f2ced8e4c219c1dd1.zip", "author": "Mike Wu;Noah Goodman", "authorids": "~Mike_Wu1;~Noah_Goodman1", "gender": "M;", "homepage": "https://www.mikehwu.com/;https://cocolab.stanford.edu/", "dblp": "77/1432;96/1216", "google_scholar": "yVmdPsPEIFIC;OUpIbcQAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Mike_Wu1;~Noah_Goodman1", "aff": "Stanford University;Stanford University", "aff_domain": "stanford.edu;stanford.edu", "position": "PhD student;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=7dmdzJz42Ro", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "3;4;3;5", "wc_review": "1134;774;378;337", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 655.75, 324.5969616308816 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4545454545454545, "gs_citation": 20, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1944390063375319850&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Stanford University", "aff_unique_dep": "", "aff_unique_url": "https://www.stanford.edu", "aff_unique_abbr": "Stanford", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Stanford", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3266", "id": "7dpmlkBuJFC", "poster": "", "openreview": "https://openreview.net/forum?id=7dpmlkBuJFC", "slides": "https://iclr.cc/virtual/2021/poster/3266", "video": "https://iclr.cc/virtual/2021/poster/3266", "author_site": "Yingxue Zhou, Steven Wu, Arindam Banerjee", "tldr": "", "abstract": "Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM). Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. Such dependence can be problematic for over-parameterized models where $p \\gg n$, the number of training samples. Existing lower bounds on private ERM show that such dependence on $p$ is inevitable in the worst case. In this paper, we circumvent the dependence on the ambient dimension by leveraging a low-dimensional structure of gradient space in deep networks---that is, the stochastic gradients for deep nets usually stay in a low dimensional subspace in the training process. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace, which is given by the top gradient eigenspace on a small public dataset. We provide a general sample complexity analysis on the public dataset for the gradient subspace identification problem and demonstrate that under certain low-dimensional assumptions the public sample complexity only grows logarithmically in $p$. Finally, we provide a theoretical analysis and empirical evaluations to show that our method can substantially improve the accuracy of DP-SGD in the high privacy regime (corresponding to low privacy loss $\\epsilon$).\n\n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Yingxue Zhou;Steven Wu;Arindam Banerjee", "authorids": "~Yingxue_Zhou1;~Steven_Wu1;~Arindam_Banerjee1", "gender": "F;M;M", "homepage": "https://sites.google.com/umn.edu/zhou0877/home;http://umn.edu/home/baner029/;https://zstevenwu.com/", "dblp": ";82/4807;137/8350", "google_scholar": "EEm_z9YAAAAJ;https://scholar.google.com.tw/citations?user=RY7cuPAAAAAJ;MbF6rTEAAAAJ", "orcid": ";;", "linkedin": ";;zstevenwu/", "or_profile": "~Yingxue_Zhou1;~Arindam_Banerjee1;~Zhiwei_Steven_Wu1", "aff": "University of Minnesota, Minneapolis;University of Minnesota, Minneapolis;Carnegie Mellon University", "aff_domain": "umn.edu;umn.edu;cmu.edu", "position": "PhD student;Full Professor;Assistant Professor", "bibtex": "@inproceedings{\nzhou2021bypassing,\ntitle={Bypassing the Ambient Dimension: Private {\\{}SGD{\\}} with Gradient Subspace Identification},\nauthor={Yingxue Zhou and Steven Wu and Arindam Banerjee},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7dpmlkBuJFC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7", "confidence": "3;3;3", "wc_review": "276;569;544", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "741;816;458", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 463.0, 132.62227062852855 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 671.6666666666666, 154.15648615034732 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 128, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4692140054354270782&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=7dpmlkBuJFC", "email": "umn.edu;umn.edu;cmu.edu", "author_num": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of Minnesota;Carnegie Mellon University", "aff_unique_dep": ";", "aff_unique_url": "https://www.minnesota.edu;https://www.cmu.edu", "aff_unique_abbr": "UMN;CMU", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Minneapolis;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "7eD88byszZ", "title": "A Unified Spectral Sparsification Framework for Directed Graphs", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent spectral graph sparsification research allows constructing nearly-linear-sized subgraphs that can well preserve the spectral (structural) properties of the original graph, such as the first few eigenvalues and eigenvectors of the graph Laplacian, leading to the development of a variety of nearly-linear time numerical and graph algorithms. However, there is not a unified approach that allows for truly scalable spectral sparsification of both directed and undirected graphs. For the first time, we prove the existence of linear-sized spectral sparsifiers for general directed\ngraphs and introduce a practically-efficient and unified spectral graph sparsification approach that allows sparsifying real-world, large-scale directed and undirected graphs with guaranteed preservation of the original graph spectra. By exploiting a highly-scalable (nearly-linear complexity) spectral matrix perturbation analysis framework for constructing nearly-linear sized (directed) subgraphs, it enables us to well preserve the key eigenvalues and eigenvectors of the original (directed) graph\nLaplacians. The proposed method has been validated using various kinds of directed graphs obtained from public domain sparse matrix collections, showing promising results for solving directed graph Laplacians, spectral embedding, and partitioning of general directed graphs, as well as approximately computing (personalized) PageRank vectors.", "keywords": "Spectral Graph Theory;Spectral Sparsification;Directed Graphs;Laplacian Solver;PageRank Vectors", "primary_area": "", "supplementary_material": "/attachment/e51b49c6f0a96a75d62126267a7552a235bbe8f6.zip", "author": "ying zhang;Zhiqiang Zhao;Zhuo Feng", "authorids": "yzhan232@stevens.edu;~Zhiqiang_Zhao1;~Zhuo_Feng3", "gender": ";M;M", "homepage": ";;https://web.stevens.edu/facultyprofile/?id=2371", "dblp": ";;81/4441.html", "google_scholar": ";;", "orcid": ";0000-0001-7239-6604;", "linkedin": ";;", "or_profile": "yzhan232@stevens.edu;~Zhiqiang_Zhao1;~Zhuo_Feng3", "aff": ";Stevens Institute of Technology;", "aff_domain": ";stevens.edu;", "position": ";Postdoc;", "bibtex": "@misc{\nzhang2021a,\ntitle={A Unified Spectral Sparsification Framework for Directed Graphs},\nauthor={ying zhang and Zhiqiang Zhao and Zhuo Feng},\nyear={2021},\nurl={https://openreview.net/forum?id=7eD88byszZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=7eD88byszZ", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "3;2;4;4", "wc_review": "549;233;139;337", "wc_reply_reviewers": "168;0;0;0", "wc_reply_authors": "1201;360;243;95", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 314.5, 152.42949189707352 ], "wc_reply_reviewers_avg": [ 42.0, 72.74613391789285 ], "wc_reply_authors_avg": [ 474.75, 429.6873136363232 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.6625413488689132, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:kCyk4xMJoQAJ:scholar.google.com/&scioq=A+Unified+Spectral+Sparsification+Framework+for+Directed+Graphs&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Stevens Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.stevens.edu", "aff_unique_abbr": "SIT", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "7ehDLD1yoE0", "title": "STRATA: Simple, Gradient-free Attacks for Models of Code", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adversarial examples are imperceptible perturbations in the input to a neural model that result in misclassification. Generating adversarial examples for source code poses an additional challenge compared to the domains of images and natural language, because source code perturbations must adhere to strict semantic guidelines so the resulting programs retain the functional meaning of the code. We propose a simple and efficient gradient-free method for generating state-of-the-art adversarial examples on models of code that can be applied in a white-box or black-box setting. Our method generates untargeted and targeted attacks, and empirically outperforms competing gradient-based methods with less information and less computational effort.", "keywords": "Deep Learning;Models of Code;Black-box Adversarial Attacks;Adversarial Robustness", "primary_area": "", "supplementary_material": "", "author": "Jacob M. Springer;Bryn Marie Reinstadler;Una-May O'Reilly", "authorids": "~Jacob_M._Springer1;~Bryn_Marie_Reinstadler1;~Una-May_O'Reilly1", "gender": "F;F;M", "homepage": ";https://alfagroup.csail.mit.edu/unamay;https://sprin.xyz", "dblp": ";o/UnaMayOReilly;", "google_scholar": ";https://scholar.google.com/citations?hl=en;niZiN38AAAAJ", "orcid": ";0000-0001-6923-8445;", "linkedin": "bryn-reinstadler/;;", "or_profile": "~Bryn_Marie_Reinstadler1;~Una-May_O'Reilly1;~Jacob_Mitchell_Springer1", "aff": "Massachusetts Institute of Technology;Massachusetts Institute of Technology;Los Alamos National Laboratory", "aff_domain": "mit.edu;mit.edu;lanl.gov", "position": "PhD student;Principal Researcher;Researcher", "bibtex": "@misc{\nspringer2021strata,\ntitle={{\\{}STRATA{\\}}: Simple, Gradient-free Attacks for Models of Code},\nauthor={Jacob M. Springer and Bryn Marie Reinstadler and Una-May O'Reilly},\nyear={2021},\nurl={https://openreview.net/forum?id=7ehDLD1yoE0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=7ehDLD1yoE0", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;4;5;3", "wc_review": "519;430;174;289", "wc_reply_reviewers": "0;38;0;0", "wc_reply_authors": "311;445;489;333", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 353.0, 131.92990563174067 ], "wc_reply_reviewers_avg": [ 9.5, 16.454482671904334 ], "wc_reply_authors_avg": [ 394.5, 74.55702515524611 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1219305469740476054&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Massachusetts Institute of Technology;Los Alamos National Laboratory", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://www.lanl.gov", "aff_unique_abbr": "MIT;LANL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "7hMenh--8g", "title": "Uncertainty Weighted Offline Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that models the epistemic uncertainty to detect OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.", "keywords": "reinforcement learning;offline;batch reinforcement learning;off-policy;uncertainty estimation;dropout;actor-critic;bootstrap error", "primary_area": "", "supplementary_material": "/attachment/e754165a040f3dd47068c7b84be2b14e9d0929fd.zip", "author": "Yue Wu;Shuangfei Zhai;Nitish Srivastava;Joshua M. Susskind;Jian Zhang;Ruslan Salakhutdinov;Hanlin Goh", "authorids": "~Yue_Wu17;~Shuangfei_Zhai3;~Nitish_Srivastava1;~Joshua_M._Susskind1;~Jian_Zhang23;~Ruslan_Salakhutdinov1;~Hanlin_Goh2", "gender": "M;M;M;M;M;M;M", "homepage": "https://www.yuewu.ml;http://cs.binghamton.edu/~szhai2;http://www.cs.toronto.edu/~nitish;http://www.apple.com;;;https://www.cs.cmu.edu/~rsalakhu/", "dblp": "41/5979;;00/11304.html;132/7797;;96/4057;", "google_scholar": "LcrSIhgAAAAJ;G6vdBYsAAAAJ;https://scholar.google.ca/citations?user=s1PgoeUAAAAJ;Sv2TGqsAAAAJ;;;", "orcid": ";;;;;;", "linkedin": ";;;joshua-susskind-8ab2ab5/;jianzhangpurdue/;;", "or_profile": "~Yue_Wu17;~Shuangfei_Zhai3;~Nitish_Srivastava1;~Joshua_M._Susskind1;~Jian_Zhang23;~Hanlin_Goh2;~Russ_Salakhutdinov1", "aff": "Apple;Apple;Apple Inc;Apple;Apple;Apple;School of Computer Science, Carnegie Mellon University", "aff_domain": "apple.com;apple.com;apple.com;apple.com;apple.com;apple.com;cs.cmu.edu", "position": "Intern;Research Scientist;Researcher;Researcher;AIML;Research Scientist;Full Professor", "bibtex": "@misc{\nwu2021uncertainty,\ntitle={Uncertainty Weighted Offline Reinforcement Learning},\nauthor={Yue Wu and Shuangfei Zhai and Nitish Srivastava and Joshua M. Susskind and Jian Zhang and Ruslan Salakhutdinov and Hanlin Goh},\nyear={2021},\nurl={https://openreview.net/forum?id=7hMenh--8g}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=7hMenh--8g", "pdf_size": 0, "rating": "4;5;6;7;8", "confidence": "4;4;4;4;4", "wc_review": "760;647;315;319;208", "wc_reply_reviewers": "0;0;0;215;0", "wc_reply_authors": "693;585;253;479;41", "reply_reviewers": "0;0;0;1;0", "reply_authors": "1;1;1;2;1", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 449.8, 213.9433569896481 ], "wc_reply_reviewers_avg": [ 43.0, 86.0 ], "wc_reply_authors_avg": [ 410.2, 235.06799016454795 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 1.2, 0.4 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8912890377668370803&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;0;0;1", "aff_unique_norm": "Apple;Carnegie Mellon University", "aff_unique_dep": "Apple Inc.;School of Computer Science", "aff_unique_url": "https://www.apple.com;https://www.cmu.edu", "aff_unique_abbr": "Apple;CMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "7nfCtKep-v", "title": "EXPLORING VULNERABILITIES OF BERT-BASED APIS", "track": "main", "status": "Reject", "tldr": "", "abstract": "Natural language processing (NLP) tasks, ranging from text classification to text\ngeneration, have been revolutionised by pretrained BERT models. This allows\ncorporations to easily build powerful APIs by encapsulating fine-tuned BERT\nmodels. These BERT-based APIs are often designed to not only provide reliable\nservice but also protect intellectual properties or privacy-sensitive information of\nthe training data. However, a series of privacy and robustness issues may still exist\nwhen a fine-tuned BERT model is deployed as a service. In this work, we first\npresent an effective model extraction attack, where the adversary can practically\nsteal a BERT-based API (the target/victim model). We then demonstrate: (1)\nhow the extracted model can be further exploited to develop effective attribute\ninference attack to expose sensitive information of the training data of the victim\nmodel; (2) how the extracted model can lead to highly transferable adversarial\nattacks against the victim model. Extensive experiments on multiple benchmark\ndatasets under various realistic settings validate the potential privacy and adversarial\nvulnerabilities of BERT-based APIs.", "keywords": "BERT-based models;vulnerabilities;attribute inference;transferability", "primary_area": "", "supplementary_material": "", "author": "Xuanli He;Lingjuan Lyu;Lichao Sun;Xiaojun Chang;Jun Zhao", "authorids": "~Xuanli_He2;~Lingjuan_Lyu1;~Lichao_Sun1;~Xiaojun_Chang3;~Jun_Zhao1", "gender": "M;F;M;;M", "homepage": ";https://sites.google.com/view/lingjuan-lyu;https://lichao-sun.github.io/;https://www.xiaojun.ai;https://personal.ntu.edu.sg/JunZhao", "dblp": "182/1859;178/9876;121/0780-1.html;;47/2026-7", "google_scholar": "TU8t0iAAAAAJ;;WhGUE7AAAAAJ;;C5_aIrgAAAAJ", "orcid": ";;;;0000-0002-3004-7091", "linkedin": ";;lichao-sun-b273a290/;;junzhaocmu/", "or_profile": "~Xuanli_He2;~Lingjuan_Lyu1;~Lichao_Sun1;~Xiaojun_Chang3;~Jun_Zhao1", "aff": "Monash University;Sony;Lehigh University;Monash University;Nanyang Technological University (NTU), Singapore", "aff_domain": "monash.edu.au;sony.com;lehigh.edu;monash.edu;ntu.edu.sg", "position": "PhD student;scientist;Assistant Professor;Senior Lecturer;Assistant Professor", "bibtex": "@misc{\nhe2021exploring,\ntitle={{\\{}EXPLORING{\\}} {\\{}VULNERABILITIES{\\}} {\\{}OF{\\}} {\\{}BERT{\\}}-{\\{}BASED{\\}} {\\{}APIS{\\}}},\nauthor={Xuanli He and Lingjuan Lyu and Lichao Sun and Xiaojun Chang and Jun Zhao},\nyear={2021},\nurl={https://openreview.net/forum?id=7nfCtKep-v}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=7nfCtKep-v", "pdf_size": 0, "rating": "4;6;6;6", "confidence": "3;5;3;4", "wc_review": "499;1340;399;357", "wc_reply_reviewers": "0;795;0;147", "wc_reply_authors": "749;2308;705;252", "reply_reviewers": "0;2;0;1", "reply_authors": "1;4;1;1", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 648.75, 402.4129564266041 ], "wc_reply_reviewers_avg": [ 235.5, 328.55478995138697 ], "wc_reply_authors_avg": [ 1003.5, 777.8729009291942 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:k7eri5AVCuoJ:scholar.google.com/&scioq=EXPLORING+VULNERABILITIES+OF+BERT-BASED+APIS&hl=en&as_sdt=0,14", "gs_version_total": 0, "aff_unique_index": "0;1;2;0;3", "aff_unique_norm": "Monash University;Sony Corporation;Lehigh University;Nanyang Technological University", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.monash.edu;https://www.sony.com;https://www.lehigh.edu;https://www.ntu.edu.sg", "aff_unique_abbr": "Monash;Sony;Lehigh;NTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2;0;3", "aff_country_unique": "Australia;Japan;United States;Singapore" }, { "title": "Class Normalization for (Continual)? Generalized Zero-Shot Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3288", "id": "7pgFL2Dkyyy", "poster": "", "openreview": "https://openreview.net/forum?id=7pgFL2Dkyyy", "slides": "https://iclr.cc/virtual/2021/poster/3288", "video": "https://iclr.cc/virtual/2021/poster/3288", "author_site": "Ivan Skorokhodov, Mohamed Elhoseiny", "tldr": "", "abstract": "Normalization techniques have proved to be a crucial ingredient of successful training in a traditional supervised learning regime. However, in the zero-shot learning (ZSL) world, these ideas have received only marginal attention. This work studies normalization in ZSL scenario from both theoretical and practical perspectives. First, we give a theoretical explanation to two popular tricks used in zero-shot learning: normalize+scale and attributes normalization and show that they help training by preserving variance during a forward pass. Next, we demonstrate that they are insufficient to normalize a deep ZSL model and propose Class Normalization (CN): a normalization scheme, which alleviates this issue both provably and in practice. Third, we show that ZSL models typically have more irregular loss surface compared to traditional classifiers and that the proposed method partially remedies this problem. Then, we test our approach on 4 standard ZSL datasets and outperform sophisticated modern SotA with a simple MLP optimized without any bells and whistles and having ~50 times faster training speed. Finally, we generalize ZSL to a broader problem \u2014 continual ZSL, and introduce some principled metrics and rigorous baselines for this new setup. The source code is available at https://github.com/universome/class-norm.", "keywords": "zero-shot learning;normalization;continual learning;initialization", "primary_area": "", "supplementary_material": "", "author": "Ivan Skorokhodov;Mohamed Elhoseiny", "authorids": "~Ivan_Skorokhodov1;~Mohamed_Elhoseiny1", "gender": "M;M", "homepage": "https://universome.github.io/;http://www.mohamed-elhoseiny.com", "dblp": "223/0010;125/2894", "google_scholar": "https://scholar.google.com/citations?hl=en;iRBUTOAAAAAJ", "orcid": "0000-0002-7611-9310;0000-0001-9659-1551", "linkedin": "ivan-skorokhodov;mohamed-elhoseiny-8a836215/", "or_profile": "~Ivan_Skorokhodov1;~Mohamed_Elhoseiny1", "aff": "YSDA;KAUST", "aff_domain": "yandexdataschool.com;kaust.edu.sa", "position": "MS student;Associate Professor", "bibtex": "@inproceedings{\nskorokhodov2021class,\ntitle={Class Normalization for (Continual)? Generalized Zero-Shot Learning},\nauthor={Ivan Skorokhodov and Mohamed Elhoseiny},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7pgFL2Dkyyy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "3;7;7;8", "confidence": "4;5;5;5", "wc_review": "1240;518;740;304", "wc_reply_reviewers": "291;262;207;39", "wc_reply_authors": "5212;934;2029;419", "reply_reviewers": "1;1;1;1", "reply_authors": "8;2;5;1", "rating_avg": [ 6.25, 1.920286436967152 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 700.5, 347.54100477497616 ], "wc_reply_reviewers_avg": [ 199.75, 97.58938210686652 ], "wc_reply_authors_avg": [ 2148.5, 1861.8198758204296 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 4.0, 2.7386127875258306 ], "replies_avg": [ 26, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.9771398364036774, "gs_citation": 69, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12819058346113139372&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=7pgFL2Dkyyy", "email": "yandexdataschool.com;kaust.edu.sa", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Yandex School of Data Analysis;King Abdullah University of Science and Technology", "aff_unique_dep": ";", "aff_unique_url": "https://ysda.yandex.ru;https://www.kaust.edu.sa", "aff_unique_abbr": "YSDA;KAUST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Russian Federation;Saudi Arabia" }, { "id": "7qmQNB6Wn_B", "title": "Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration", "track": "main", "status": "Reject", "tldr": "", "abstract": "Policy entropy regularization is commonly used for better exploration in deep reinforcement learning (RL). However, policy entropy regularization is sample-inefficient in off-policy learning since it does not take the distribution of previous samples stored in the replay buffer into account. In order to take advantage of the previous sample distribution from the replay buffer for sample-efficient exploration, we propose sample-aware entropy regularization which maximizes the entropy of weighted sum of the policy action distribution and the sample action distribution from the replay buffer. We formulate the problem of sample-aware entropy regularized policy iteration, prove its convergence, and provide a practical algorithm named diversity actor-critic (DAC) which is a generalization of soft actor-critic (SAC). Numerical results show that DAC significantly outperforms SAC baselines and other state-of-the-art RL algorithms.", "keywords": "Reinforcement Learning;Entropy Regularization;Exploration", "primary_area": "", "supplementary_material": "/attachment/075eba0e4565263e43fe22edbb425de13ccff42a.zip", "author": "Seungyul Han;Youngchul Sung", "authorids": "~Seungyul_Han1;~Youngchul_Sung1", "gender": "M;M", "homepage": "https://mllab.unist.ac.kr;https://sites.google.com/view/youngchulsung", "dblp": "183/6417;17/6798", "google_scholar": "https://scholar.google.com/citations?hl=ko;-9D2k3UAAAAJ", "orcid": ";0000-0003-4536-6690", "linkedin": ";", "or_profile": "~Seungyul_Han1;~Youngchul_Sung1", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr", "position": "PhD student;Full Professor", "bibtex": "@misc{\nhan2021diversity,\ntitle={Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration},\nauthor={Seungyul Han and Youngchul Sung},\nyear={2021},\nurl={https://openreview.net/forum?id=7qmQNB6Wn_B}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=7qmQNB6Wn_B", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "4;4;4;3", "wc_review": "819;366;918;405", "wc_reply_reviewers": "478;246;111;0", "wc_reply_authors": "3095;1977;1489;26", "reply_reviewers": "2;1;1;0", "reply_authors": "4;3;3;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 627.0, 244.41256105200486 ], "wc_reply_reviewers_avg": [ 208.75, 178.19564388615115 ], "wc_reply_authors_avg": [ 1646.75, 1102.0672336568218 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.75, 1.0897247358851685 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 33, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1891726031922597340&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "title": "Neural Networks for Learning Counterfactual G-Invariances from Single Environments", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3239", "id": "7t1FcJUWhi3", "poster": "", "openreview": "https://openreview.net/forum?id=7t1FcJUWhi3", "slides": "https://iclr.cc/virtual/2021/poster/3239", "video": "https://iclr.cc/virtual/2021/poster/3239", "author_site": "S Chandra Mouli, Bruno Ribeiro", "tldr": "", "abstract": "Despite \u2014or maybe because of\u2014 their astonishing capacity to fit data, neural networks are believed to have difficulties extrapolating beyond training data distribution. This work shows that, for extrapolations based on finite transformation groups, a model\u2019s inability to extrapolate is unrelated to its capacity. Rather, the shortcoming is inherited from a learning hypothesis: Examples not explicitly observed with infinitely many training examples have underspecified outcomes in the learner\u2019s model. In order to endow neural networks with the ability to extrapolate over group transformations, we introduce a learning framework counterfactually-guided by the learning hypothesis that any group invariance to (known) transformation groups is mandatory even without evidence, unless the learner deems it inconsistent with the training data. Unlike existing invariance-driven methods for (counterfactual) extrapolations, this framework allows extrapolations from a single environment. Finally, we introduce sequence and image extrapolation tasks that validate our framework and showcase the shortcomings of traditional approaches.", "keywords": "Extrapolation;G-invariance regularization;Counterfactual inference;Invariant subspaces", "primary_area": "", "supplementary_material": "", "author": "S Chandra Mouli;Bruno Ribeiro", "authorids": "~S_Chandra_Mouli1;~Bruno_Ribeiro1", "gender": "M;M", "homepage": "https://www.cs.purdue.edu/homes/chandr/;https://www.cs.purdue.edu/homes/ribeirob/", "dblp": "167/6021;15/606", "google_scholar": "https://scholar.google.com/citations?hl=en;KIEleCsAAAAJ", "orcid": ";0000-0002-3527-6192", "linkedin": ";", "or_profile": "~S_Chandra_Mouli1;~Bruno_Ribeiro1", "aff": "Purdue University;Purdue University", "aff_domain": "purdue.edu;purdue.edu", "position": "PhD student;Assistant Professor", "bibtex": "@inproceedings{\nmouli2021neural,\ntitle={Neural Networks for Learning Counterfactual G-Invariances from Single Environments},\nauthor={S Chandra Mouli and Bruno Ribeiro},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7t1FcJUWhi3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "5;7;7", "confidence": "4;3;2", "wc_review": "1053;1214;526", "wc_reply_reviewers": "703;283;0", "wc_reply_authors": "3699;2322;697", "reply_reviewers": "2;4;0", "reply_authors": "7;6;1", "rating_avg": [ 6.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 931.0, 293.8242104842055 ], "wc_reply_reviewers_avg": [ 328.6666666666667, 288.8094335178283 ], "wc_reply_authors_avg": [ 2239.3333333333335, 1226.954585775511 ], "reply_reviewers_avg": [ 2.0, 1.632993161855452 ], "reply_authors_avg": [ 4.666666666666667, 2.6246692913372702 ], "replies_avg": [ 25, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11398104939483895599&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=7t1FcJUWhi3", "email": "purdue.edu;purdue.edu", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Purdue University", "aff_unique_dep": "", "aff_unique_url": "https://www.purdue.edu", "aff_unique_abbr": "Purdue", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3200", "id": "7uVcpu-gMD", "poster": "", "openreview": "https://openreview.net/forum?id=7uVcpu-gMD", "slides": "https://iclr.cc/virtual/2021/poster/3200", "video": "https://iclr.cc/virtual/2021/poster/3200", "author_site": "Robert Csordas, Sjoerd van Steenkiste, J\u00fcrgen Schmidhuber", "tldr": "", "abstract": "Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to link modules to their functionality. In this paper, we present a novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions. Using this powerful tool, we contribute an extensive study of emerging modularity in NNs that covers several standard architectures and datasets. We demonstrate how common NNs fail to reuse submodules and offer new insights into the related issue of systematic generalization on language tasks.", "keywords": "modularity;systematic generalization;compositionality", "primary_area": "", "supplementary_material": "/attachment/3a6bf28669b154c67882adb9eb58639387181087.zip", "author": "R\u00f3bert Csord\u00e1s;Sjoerd van Steenkiste;J\u00fcrgen Schmidhuber", "authorids": "~R\u00f3bert_Csord\u00e1s1;~Sjoerd_van_Steenkiste1;~J\u00fcrgen_Schmidhuber1", "gender": "M;M;M", "homepage": "https://robertcsordas.github.io/;http://www.sjoerdvansteenkiste.com/;http://people.idsia.ch/~juergen/", "dblp": "166/4773.html;183/9326;s/JurgenSchmidhuber", "google_scholar": "av1lplwAAAAJ;i-AStBYAAAAJ;https://scholar.google.ch/citations?user=gLnCTgIAAAAJ", "orcid": ";;", "linkedin": "robertcsordas/;;", "or_profile": "~R\u00f3bert_Csord\u00e1s1;~Sjoerd_van_Steenkiste1;~J\u00fcrgen_Schmidhuber1", "aff": "IDSIA;Dalle Molle Institute for Artificial Intelligence Research (IDSIA);IDSIA", "aff_domain": "idsia.ch;idsia.ch;idsia.ch", "position": "PhD student;Postdoc;Scientific Director", "bibtex": "@inproceedings{\ncsord{\\'a}s2021are,\ntitle={Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks},\nauthor={R{\\'o}bert Csord{\\'a}s and Sjoerd van Steenkiste and J{\\\"u}rgen Schmidhuber},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7uVcpu-gMD}\n}", "github": "[![github](/images/github_icon.svg) RobertCsordas/modules](https://github.com/RobertCsordas/modules)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "5;3;4;3", "wc_review": "672;350;608;1229", "wc_reply_reviewers": "860;0;0;388", "wc_reply_authors": "1870;1076;1393;1454", "reply_reviewers": "2;0;0;1", "reply_authors": "4;3;2;3", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 714.75, 320.4367137205099 ], "wc_reply_reviewers_avg": [ 312.0, 353.82481541011225 ], "wc_reply_authors_avg": [ 1448.25, 282.63437069825744 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 3.0, 0.7071067811865476 ], "replies_avg": [ 23, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 110, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5376725240371408845&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=7uVcpu-gMD", "email": "idsia.ch;idsia.ch;idsia.ch", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "Institute of Digital Technologies;Dalle Molle Institute for Artificial Intelligence Research", "aff_unique_dep": ";Artificial Intelligence Research", "aff_unique_url": "https://www.idsia.ch;https://www.idsia.ch/", "aff_unique_abbr": "IDSIA;IDSIA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Switzerland" }, { "title": "Nearest Neighbor Machine Translation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2532", "id": "7wCBOfJ8hJM", "poster": "", "openreview": "https://openreview.net/forum?id=7wCBOfJ8hJM", "slides": "https://iclr.cc/virtual/2021/poster/2532", "video": "https://iclr.cc/virtual/2021/poster/2532", "author_site": "Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis", "tldr": "", "abstract": "We introduce $k$-nearest-neighbor machine translation ($k$NN-MT), which predicts tokens with a nearest-neighbor classifier over a large datastore of cached examples, using representations from a neural translation model for similarity search. This approach requires no additional training and scales to give the decoder direct access to billions of examples at test time, resulting in a highly expressive model that consistently improves performance across many settings. Simply adding nearest-neighbor search improves a state-of-the-art German-English translation model by 1.5 BLEU. $k$NN-MT allows a single model to be adapted to diverse domains by using a domain-specific datastore, improving results by an average of 9.2 BLEU over zero-shot transfer, and achieving new state-of-the-art results---without training on these domains. A massively multilingual model can also be specialized for particular language pairs, with improvements of 3 BLEU for translating from English into German and Chinese. Qualitatively, $k$NN-MT is easily interpretable; it combines source and target context to retrieve highly relevant examples.", "keywords": "nearest neighbors;machine translation", "primary_area": "", "supplementary_material": "", "author": "Urvashi Khandelwal;Angela Fan;Dan Jurafsky;Luke Zettlemoyer;Mike Lewis", "authorids": "~Urvashi_Khandelwal1;~Angela_Fan2;~Dan_Jurafsky1;~Luke_Zettlemoyer1;~Mike_Lewis1", "gender": "F;;M;M;M", "homepage": ";;http://web.stanford.edu/~jurafsky/;https://www.cs.washington.edu/people/faculty/lsz/;", "dblp": "135/6699;192/1872;31/985;21/6793;19/6214", "google_scholar": "2ITGSdgAAAAJ;TLZR9zgAAAAJ;uZg9l58AAAAJ;https://scholar.google.com.tw/citations?user=UjpbO6IAAAAJ;SnQnQicAAAAJ", "orcid": ";;;;", "linkedin": ";;;luke-zettlemoyer-a0109b226/;", "or_profile": "~Urvashi_Khandelwal1;~Angela_Fan2;~Dan_Jurafsky1;~Luke_Zettlemoyer1;~Mike_Lewis1", "aff": "Stanford University;Meta Facebook;Stanford University;Meta;Facebook AI Research", "aff_domain": "stanford.edu;facebook.com;stanford.edu;meta.com;fb.com", "position": "PhD student;Research Engineer;Full Professor;Researcher;Research Scientist", "bibtex": "@inproceedings{\nkhandelwal2021nearest,\ntitle={Nearest Neighbor Machine Translation},\nauthor={Urvashi Khandelwal and Angela Fan and Dan Jurafsky and Luke Zettlemoyer and Mike Lewis},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=7wCBOfJ8hJM}\n}", "github": "[![github](/images/github_icon.svg) urvashik/knnlm](https://github.com/urvashik/knnlm) + [![Papers with Code](/images/pwc_icon.svg) 4 community implementations](https://paperswithcode.com/paper/?openreview=7wCBOfJ8hJM)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "4;4;6;8", "confidence": "5;3;5;4", "wc_review": "481;220;323;311", "wc_reply_reviewers": "263;0;0;0", "wc_reply_authors": "352;21;190;108", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.5, 1.6583123951777 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 333.75, 93.88124147027456 ], "wc_reply_reviewers_avg": [ 65.75, 113.88234059765368 ], "wc_reply_authors_avg": [ 167.75, 122.01306282525654 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0909090909090909, "gs_citation": 329, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6208883901750253359&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=7wCBOfJ8hJM", "email": "stanford.edu;facebook.com;stanford.edu;meta.com;fb.com", "author_num": 5, "aff_unique_index": "0;1;0;1;1", "aff_unique_norm": "Stanford University;Meta", "aff_unique_dep": ";Meta Platforms, Inc.", "aff_unique_url": "https://www.stanford.edu;https://meta.com", "aff_unique_abbr": "Stanford;Meta", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Stanford;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "7xArdn_FKtV", "title": "Heterogeneous Model Transfer between Different Neural Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We propose an effective heterogeneous model transfer (HMT) method that can transfer the knowledge from one pretrained neural network to another neural network. Most of the existing deep learning methods depend much on a pretraining-finetuning strategy, i.e., pretraining a deep model on a large task-related (source) dataset and finetuning it on a small target dataset. Pretraining provides a universal feature representation for the target learning task and thus reduces the overfitting on a small target dataset. However, it is often assumed that the pretrained model and the target model share an identical backbone, which significantly limits the scalability of pretrained deep models. This paper relaxes this limitation and generalizes to heterogeneous model transfer between two different neural networks. Specifically, we select the longest chain from the source model and transfer it to the longest chain of the target model. Motivated by one-shot neural architecture search methods, the longest chain inherits merits from the source model and also serves as a weight-sharing path of the target model, thus provides a good initialization. With the longest chains, the layer-to-layer weight transfer is then transformed by bilinear interpolation and cyclic stack. HMT opens a new window for the pretraining-finetuning strategy and significantly improves the reuse efficiency of pretrained models without re-pretraining on the large source dataset. Experiments on several datasets show the effectiveness of HMT. Anonymous code is at: https://anonymous.4open.science/r/6ab184dc-3c64-4fdd-ba6d-1e5097623dfd/", "keywords": "Heterogeneous model transfer;pretraining-finetuning", "primary_area": "", "supplementary_material": "", "author": "Guangcong Wang;Jianhuang Lai;Wenqi Liang;Guangrun Wang", "authorids": "~Guangcong_Wang1;~Jianhuang_Lai1;liangwq8@mail2.sysu.edu.cn;~Guangrun_Wang1", "gender": "M;M;;M", "homepage": "https://wanggcong.github.io/;https://cse.sysu.edu.cn/content/2498;;https://wanggrun.github.io", "dblp": "211/7260;78/1117;;165/1374.html", "google_scholar": "dk8EnkoAAAAJ;;;nuHIZx0AAAAJ", "orcid": "0000-0002-6627-814X;0000-0003-3883-2024;;", "linkedin": ";;;", "or_profile": "~Guangcong_Wang1;~Jianhuang_Lai1;liangwq8@mail2.sysu.edu.cn;~Guangrun_Wang1", "aff": "Nanyang Technological University;SUN YAT-SEN UNIVERSITY;;University of Oxford", "aff_domain": "ntu.edu.sg;sysu.edu.cn;;ox.ac.uk", "position": "Postdoc;Full Professor;;Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=7xArdn_FKtV", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "5;4;5;5", "wc_review": "193;360;360;376", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 322.25, 74.9078600682198 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9236356333350488237&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Nanyang Technological University;Sun Yat-sen University;University of Oxford", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ntu.edu.sg;http://www.sysu.edu.cn;https://www.ox.ac.uk", "aff_unique_abbr": "NTU;SYSU;Oxford", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2", "aff_country_unique": "Singapore;China;United Kingdom" }, { "id": "8-sxWOto_iI", "title": "Introducing Sample Robustness", "track": "main", "status": "Reject", "tldr": "", "abstract": "Choosing the right data and model for a pre-defined task is one of the critical competencies in machine learning. Investigating what features of a dataset and its underlying distribution a model decodes may enlighten the mysterious \"black box\" and guide us to a deeper and more profound understanding of the ongoing processes. Furthermore, it will help to improve the quality of models which directly depend on data or learn from it through training. In this work, we introduce the dataset-dependent concept of sample robustness, which is based on a point-wise Lipschitz constant of the label map. For a particular sample, it measures how small of a perturbation is required to cause a label-change relative to the magnitude of the label map. We introduce theory to motivate the concept and to analyse the effects of having similar robustness distributions for the training- and test data. Afterwards, we conduct various experiments using different datasets and (non-)deterministic models. In some cases, we can boost performance by choosing specifically tailored training(sub)sets and hyperparameters depending on the robustness distribution of the test(sub)sets.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Monty Maximilian Z\u00fchlke", "authorids": "~Monty_Maximilian_Z\u00fchlke1", "gender": "M", "homepage": "", "dblp": "375/6617", "google_scholar": "Fv4Gf-wAAAAJ", "orcid": "", "linkedin": "", "or_profile": "~Monty_Maximilian_Z\u00fchlke1", "aff": "Leibniz Universit\u00e4t Hannover", "aff_domain": "uni-hannover.de", "position": "PhD student", "bibtex": "@misc{\nz{\\\"u}hlke2021introducing,\ntitle={Introducing Sample Robustness},\nauthor={Monty Maximilian Z{\\\"u}hlke},\nyear={2021},\nurl={https://openreview.net/forum?id=8-sxWOto_iI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=8-sxWOto_iI", "pdf_size": 0, "rating": "2;4;4;5", "confidence": "4;3;4;3", "wc_review": "291;385;409;242", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "227;233;399;265", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 331.75, 68.04180700128414 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 281.0, 69.6419413859206 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:2-fPwsUAC_AJ:scholar.google.com/&scioq=Introducing+Sample+Robustness&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Leibniz Universit\u00e4t Hannover", "aff_unique_dep": "", "aff_unique_url": "https://www.leibniz.uni-hannover.de/", "aff_unique_abbr": "LUH", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "title": "Noise against noise: stochastic label noise helps combat inherent label noise", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2606", "id": "80FMcTSZ6J0", "poster": "", "openreview": "https://openreview.net/forum?id=80FMcTSZ6J0", "slides": "https://iclr.cc/virtual/2021/poster/2606", "video": "https://iclr.cc/virtual/2021/poster/2606", "author_site": "Pengfei Chen, Guangyong Chen, Junjie Ye, jingwei zhao, Pheng-Ann Heng", "tldr": "", "abstract": "The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization effect, previously studied in optimization by analyzing the dynamics of parameter updates. In this paper, we are interested in learning with noisy labels, where we have a collection of samples with potential mislabeling. We show that a previously rarely discussed SGD noise, induced by stochastic label noise (SLN), mitigates the effects of inherent label noise. In contrast, the common SGD noise directly applied to model parameters does not. We formalize the differences and connections of SGD noise variants, showing that SLN induces SGD noise dependent on the sharpness of output landscape and the confidence of output probability, which may help escape from sharp minima and prevent overconfidence. SLN not only improves generalization in its simplest form but also boosts popular robust training methods, including sample selection and label correction. Specifically, we present an enhanced algorithm by applying SLN to label correction. Our code is released.", "keywords": "Noisy Labels;Robust Learning;SGD noise;Regularization", "primary_area": "", "supplementary_material": "/attachment/a58f0d8e9866d6236fbf23e91a23f210accdaec5.zip", "author": "Pengfei Chen;Guangyong Chen;Junjie Ye;jingwei zhao;Pheng-Ann Heng", "authorids": "~Pengfei_Chen1;gy.chen@siat.ac.cn;kourenmu@gmail.com;~jingwei_zhao1;~Pheng-Ann_Heng1", "gender": "M;;;M;M", "homepage": ";;;;http://www.cse.cuhk.edu.hk/~pheng", "dblp": ";;;;52/2889", "google_scholar": "Xvj_5xYAAAAJ;;;;https://scholar.google.com/citations?sortby=pubdate", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Pengfei_Chen1;gy.chen@siat.ac.cn;kourenmu@gmail.com;~jingwei_zhao1;~Pheng-Ann_Heng1", "aff": "The Chinese University of Hong Kong;;;;The Chinese University of Hong Kong", "aff_domain": "cuhk.edu.hk;;;;cuhk.edu.hk", "position": "PhD student;;;;Full Professor", "bibtex": "@inproceedings{\nchen2021noise,\ntitle={Noise against noise: stochastic label noise helps combat inherent label noise},\nauthor={Pengfei Chen and Guangyong Chen and Junjie Ye and jingwei zhao and Pheng-Ann Heng},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=80FMcTSZ6J0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer5", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;5;4;3", "wc_review": "382;340;1071;390", "wc_reply_reviewers": "159;0;0;121", "wc_reply_authors": "443;370;582;355", "reply_reviewers": "1;0;0;2", "reply_authors": "3;1;1;2", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 545.75, 303.8473095158159 ], "wc_reply_reviewers_avg": [ 70.0, 71.27762622310033 ], "wc_reply_authors_avg": [ 437.5, 89.8234379212909 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.42640143271122083, "gs_citation": 48, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2403051965179627923&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=80FMcTSZ6J0", "email": "cuhk.edu.hk;;;;cuhk.edu.hk", "author_num": 5, "aff_unique_index": "0;0", "aff_unique_norm": "Chinese University of Hong Kong", "aff_unique_dep": "", "aff_unique_url": "https://www.cuhk.edu.hk", "aff_unique_abbr": "CUHK", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hong Kong SAR", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "title": "Linear Convergent Decentralized Optimization with Compression", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2693", "id": "84gjULz1t5", "poster": "", "openreview": "https://openreview.net/forum?id=84gjULz1t5", "slides": "https://iclr.cc/virtual/2021/poster/2693", "video": "https://iclr.cc/virtual/2021/poster/2693", "author_site": "Xiaorui Liu, Yao Li, Rongrong Wang, Jiliang Tang, Ming Yan", "tldr": "", "abstract": "Communication compression has become a key strategy to speed up distributed optimization. However, existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms. They are unsatisfactory in terms of convergence rate, stability, and the capability to handle heterogeneous data. Motivated by primal-dual algorithms, this paper proposes the first \\underline{L}in\\underline{EA}r convergent \\underline{D}ecentralized algorithm with compression, LEAD. Our theory describes the coupled dynamics of the inexact primal and dual update as well as compression error, and we provide the first consensus error bound in such settings without assuming bounded gradients. Experiments on convex problems validate our theoretical analysis, and empirical study on deep neural nets shows that LEAD is applicable to non-convex problems.", "keywords": "Decentralized Optimization;Communication Compression;Linear Convergence;Heterogeneous data", "primary_area": "", "supplementary_material": "", "author": "Xiaorui Liu;Yao Li;Rongrong Wang;Jiliang Tang;Ming Yan", "authorids": "~Xiaorui_Liu1;liyao6@msu.edu;wangron6@msu.edu;~Jiliang_Tang1;myan@msu.edu", "gender": "M;;;M;", "homepage": "https://sites.google.com/ncsu.edu/xiaorui/;;;https://www.cse.msu.edu/~tangjili/;", "dblp": "172/0995;;;64/10812;", "google_scholar": "NhvN1KoAAAAJ;;;WtzKMWAAAAAJ;", "orcid": "0000-0001-8217-5688;;;0000-0001-7125-3898;", "linkedin": ";;;;", "or_profile": "~Xiaorui_Liu1;liyao6@msu.edu;wangron6@msu.edu;~Jiliang_Tang1;myan@msu.edu", "aff": "Michigan State University;;;Michigan State University;", "aff_domain": "msu.edu;;;msu.edu;", "position": "PhD student;;;Assistant Professor;", "bibtex": "@inproceedings{\nliu2021linear,\ntitle={Linear Convergent Decentralized Optimization with Compression},\nauthor={Xiaorui Liu and Yao Li and Rongrong Wang and Jiliang Tang and Ming Yan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=84gjULz1t5}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "7;7;7", "confidence": "4;4;4", "wc_review": "635;283;386", "wc_reply_reviewers": "51;38;0", "wc_reply_authors": "422;525;204", "reply_reviewers": "1;1;0", "reply_authors": "1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 434.6666666666667, 147.76633205466285 ], "wc_reply_reviewers_avg": [ 29.666666666666668, 21.638443156156644 ], "wc_reply_authors_avg": [ 383.6666666666667, 133.82160596190073 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 61, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11729413520005201494&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 15, "pdf": "https://openreview.net/pdf?id=84gjULz1t5", "email": "msu.edu;;;msu.edu;", "author_num": 5, "aff_unique_index": "0;0", "aff_unique_norm": "Michigan State University", "aff_unique_dep": "", "aff_unique_url": "https://www.msu.edu", "aff_unique_abbr": "MSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "85d8bg9RvDT", "title": "Deep Retrieval: An End-to-End Structure Model for Large-Scale Recommendations", "track": "main", "status": "Reject", "tldr": "", "abstract": "One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model and then use maximum inner product search (MIPS) algorithms to search top candidates, leading to potential loss of retrieval accuracy. In this paper, we present Deep Retrieval (DR), an end-to-end learnable structure model for large-scale recommendations. DR encodes all candidates into a discrete latent space. Those latent codes for the candidates are model parameters and to be learnt together with other neural network parameters to maximize the same objective function. With the model learnt, a beam search over the latent codes is performed to retrieve the top candidates. Empirically, we showed that DR, with sub-linear computational complexity, can achieve almost the same accuracy as the brute-force baseline.", "keywords": "Large-scale recommendation system;End-to-end training", "primary_area": "", "supplementary_material": "/attachment/55736f81424c35929e923459e9d127fe1e9bb045.zip", "author": "Weihao Gao;Xiangjun Fan;Jiankai Sun;Kai Jia;Wenzhi Xiao;Chong Wang;Xiaobing Liu", "authorids": "~Weihao_Gao1;xiangjun.fan@bytedance.com;jiankai.sun@bytedance.com;jiakai@bytedance.com;xiaowenzhi@bytedance.com;~Chong_Wang8;~Xiaobing_Liu1", "gender": "M;;;;;;M", "homepage": "https://wgao9.github.io/;;;;;;", "dblp": "https://dblp.uni-trier.de/pers/hd/g/Gao:Weihao;;;;;;", "google_scholar": "E__5Lr0AAAAJ;;;;;;1ypDmDwAAAAJ", "orcid": ";;;;;;", "linkedin": "weihao-gao-6517b3ab/;;;;;;", "or_profile": "~Weihao_Gao1;xiangjun.fan@bytedance.com;jiankai.sun@bytedance.com;jiakai@bytedance.com;xiaowenzhi@bytedance.com;~Chong_Wang8;~Xiaobing_Liu1", "aff": ";;;;;;ByteDance Inc.", "aff_domain": ";;;;;;bytedance.com", "position": ";;;;;;Researcher", "bibtex": "@misc{\ngao2021deep,\ntitle={Deep Retrieval: An End-to-End Structure Model for Large-Scale Recommendations},\nauthor={Weihao Gao and Xiangjun Fan and Jiankai Sun and Kai Jia and Wenzhi Xiao and Chong Wang and Xiaobing Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=85d8bg9RvDT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=85d8bg9RvDT", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "5;4;3;3", "wc_review": "1009;456;393;453", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 577.75, 250.24725273217285 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.8528028654224418, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:uz6ep_nDV6cJ:scholar.google.com/&scioq=Deep+Retrieval:+An+End-to-End+Structure+Model+for+Large-Scale+Recommendations&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "ByteDance", "aff_unique_dep": "", "aff_unique_url": "https://www.bytedance.com", "aff_unique_abbr": "ByteDance", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "86PW5gch8VZ", "title": "DQSGD: DYNAMIC QUANTIZED STOCHASTIC GRADIENT DESCENT FOR COMMUNICATION-EFFICIENT DISTRIBUTED LEARNING", "track": "main", "status": "Reject", "tldr": "", "abstract": "Gradient quantization is widely adopted to mitigate communication costs in distributed learning systems. Existing gradient quantization algorithms often rely on design heuristics and/or empirical evidence to tune the quantization strategy for different learning problems. To the best of our knowledge, there is no theoretical framework characterizing the trade-off between communication cost and model accuracy under dynamic gradient quantization strategies. This paper addresses this issue by proposing a novel dynamic quantized SGD (DQSGD) framework, which enables us to optimize the quantization strategy for each gradient descent step by exploring the trade-off between communication cost and modeling error. In particular, we derive an upper bound, tight in some cases, of the modeling error for arbitrary dynamic quantization strategy. By minimizing this upper bound, we obtain an enhanced quantization algorithm with significantly improved modeling error under given communication overhead constraints. Besides, we show that our quantization scheme achieves a strengthened communication cost and model accuracy trade-off in a wide range of optimization models. Finally, through extensive experiments on large-scale computer vision and natural language processing tasks on CIFAR-10, CIFAR-100, and AG-News datasets, respectively. we demonstrate that our quantization scheme significantly outperforms the state-of-the-art gradient quantization methods in terms of communication costs.", "keywords": "Distributed Learning;Communication;Gradient Quantization", "primary_area": "", "supplementary_material": "/attachment/104113cd6f85bf99c5394aba9aee2fe453baa16a.zip", "author": "Guangfeng Yan;Shao-Lun Huang;Tian Lan;Linqi Song", "authorids": "~Guangfeng_Yan1;~Shao-Lun_Huang1;~Tian_Lan4;~Linqi_Song1", "gender": "M;;M;M", "homepage": ";;https://www2.seas.gwu.edu/~tlan/;https://sites.google.com/site/aisquaredlab/", "dblp": ";;;137/7963.html", "google_scholar": "Htbmp-MAAAAJ;;;UcGN3MoAAAAJ", "orcid": ";;;0000-0003-2756-4984", "linkedin": ";;;", "or_profile": "~Guangfeng_Yan1;~Shao-Lun_Huang1;~Tian_Lan4;~Linqi_Song1", "aff": "City University of Hong Kong;Tsinghua University;George Washington University;City University of Hong Kong", "aff_domain": "cityu.edu.hk;tsinghua.edu.cn;gwu.edu;cityu.edu.hk", "position": "PhD student;Assistant Professor;Full Professor;Assistant Professor", "bibtex": "@misc{\nyan2021dqsgd,\ntitle={{\\{}DQSGD{\\}}: {\\{}DYNAMIC{\\}} {\\{}QUANTIZED{\\}} {\\{}STOCHASTIC{\\}} {\\{}GRADIENT{\\}} {\\{}DESCENT{\\}} {\\{}FOR{\\}} {\\{}COMMUNICATION{\\}}-{\\{}EFFICIENT{\\}} {\\{}DISTRIBUTED{\\}} {\\{}LEARNING{\\}}},\nauthor={Guangfeng Yan and Shao-Lun Huang and Tian Lan and Linqi Song},\nyear={2021},\nurl={https://openreview.net/forum?id=86PW5gch8VZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=86PW5gch8VZ", "pdf_size": 0, "rating": "2;2;4;4", "confidence": "4;5;5;4", "wc_review": "234;521;436;367", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "200;276;135;331", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.0, 1.0 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 389.5, 105.04879818446283 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 235.5, 74.36565067287451 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "City University of Hong Kong;Tsinghua University;George Washington University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.cityu.edu.hk;https://www.tsinghua.edu.cn;https://www.gwu.edu", "aff_unique_abbr": "CityU;THU;GWU", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hong Kong SAR;", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "China;United States" }, { "id": "86t2GlfzFo", "title": "Deep Curvature Suite", "track": "main", "status": "Reject", "tldr": "", "abstract": "The curvature of the loss, provides rich information on the geometry underlying neural networks, with applications in second order optimisation and Bayesian deep learning. However, accessing curvature information is still a daunting engineering challenge, inaccessible to most practitioners. We hence provide a software package the \\textbf{Deep Curvature Suite}, which allows easy curvature evaluation for large modern neural networks. Beyond the calculation of a highly accurate moment matched approximation of the Hessian spectrum using Lanczos, our package provides: extensive \\emph{loss surface visualisation}, the calculation of the \\emph{Hessian variance} and \\emph{stochastic second order optimisers}. We further address and disprove many common misconceptions in the literature about the Lanczos algorithm, namely that it learns eigenvalues from the top down. We prove using high dimensional concentration inequalities that for specific matrices a single random vector is sufficient for accurate spectral estimation, informing our spectral visualisation method. We showcase our package practical utility on a series of examples based on realistic modern neural networks such as the VGG-$16$ and Preactivated ResNets on the CIFAR-$10$/$100$ datasets. We further detail $3$ specific potential use cases enabled by our software: research in stochastic second order optimisation for deep learning, learning rate scheduling using known optimality formulae for convex surfaces and empirical verification of deep learning theory based on comparing empirical and theoretically implied spectra.", "keywords": "Hessian computation;Deep Learning;Loss Curvature;Lanczos", "primary_area": "", "supplementary_material": "/attachment/ab11d8d035b3858dfff1c165d43dd5c6f92814db.zip", "author": "Diego Granziol;Xingchen Wan;Timur Garipov", "authorids": "~Diego_Granziol1;~Xingchen_Wan1;~Timur_Garipov1", "gender": "M;M;M", "homepage": "https://xingchen.one;https://timgaripov.github.io/;", "dblp": "255/7214;190/7045;", "google_scholar": "6KkohssAAAAJ;gWQzBQMAAAAJ;https://scholar.google.co.uk/citations?user=-MuqKlIAAAAJ", "orcid": "0000-0003-0074-0597;;0000-0003-3169-2081", "linkedin": ";timur-garipov-5a133a24b/;", "or_profile": "~Xingchen_Wan1;~Timur_Garipov1;~Diego_Marco_Granziol1", "aff": "University of Oxford;Massachusetts Institute of Technology;", "aff_domain": "robots.ox.ac.uk;mit.edu;", "position": "PhD student;PhD student;", "bibtex": "@misc{\ngranziol2021deep,\ntitle={Deep Curvature Suite},\nauthor={Diego Granziol and Xingchen Wan and Timur Garipov},\nyear={2021},\nurl={https://openreview.net/forum?id=86t2GlfzFo}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=86t2GlfzFo", "pdf_size": 0, "rating": "3;4;6;7", "confidence": "4;4;3;4", "wc_review": "512;396;338;233", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.5811388300841898 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 369.75, 100.7878340872548 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.36514837167011077, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8649967998279151091&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of Oxford;Massachusetts Institute of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://web.mit.edu", "aff_unique_abbr": "Oxford;MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;United States" }, { "id": "87Ti3dufEv", "title": "A Half-Space Stochastic Projected Gradient Method for Group Sparsity Regularization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Optimizing with group sparsity is significant in enhancing model interpretability in machining learning applications, e.g., feature selection, compressed sensing and model compression. However, for large-scale stochastic training problems, effective group-sparsity exploration are typically hard to achieve. Particularly, the state-of-the-art stochastic optimization algorithms usually generate merely dense solutions. To overcome this shortage, we propose a stochastic method\u2014Half-space Stochastic Projected Gradient method (HSPG) to search solutions of high group sparsity while maintain the convergence. Initialized by a simple Prox-SG Step, the HSPG method relies on a novel Half-Space Step to substantially boosts the sparsity level. Numerically, HSPG demonstrates its superiority in deep neural networks, e.g., VGG16, ResNet18 and MobileNetV1, by computing solutions of higher group sparsity, competitive objective values and generalization accuracy.", "keywords": "Group Sparsity;Stochastic Learning;Half-Space Projection;Group-Sparsity Identification", "primary_area": "", "supplementary_material": "/attachment/3a6a97a8bc4e09a7e22b15c3aed2a43162aa7744.zip", "author": "Tianyi Chen;Guanyi Wang;Tianyu DING;Bo Ji;Sheng Yi;Zhihui Zhu", "authorids": "tiachen@microsoft.com;~Guanyi_Wang1;~Tianyu_DING2;~Bo_Ji2;shengyi@microsoft.com;~Zhihui_Zhu1", "gender": ";M;M;;;M", "homepage": ";https://sites.google.com/view/guanyiwang;https://www.tianyuding.com;;;https://zhihuizhu.github.io/", "dblp": ";;134/4796;;;71/8081", "google_scholar": ";EmqEodUAAAAJ;Qi7zTOcAAAAJ;;;gmSwszcAAAAJ", "orcid": ";;0000-0001-8445-4330;;;", "linkedin": ";;tianyuding/;;;", "or_profile": "tiachen@microsoft.com;~Guanyi_Wang1;~Tianyu_DING2;~Bo_Ji2;shengyi@microsoft.com;~Zhihui_Zhu1", "aff": ";Georgia Institute of Technology;Johns Hopkins University;;;University of Denver", "aff_domain": ";gatech.edu;jhu.edu;;;du.edu", "position": ";PhD student;PhD student;;;Assistant Professor", "bibtex": "@misc{\nchen2021a,\ntitle={A Half-Space Stochastic Projected Gradient Method for Group Sparsity Regularization},\nauthor={Tianyi Chen and Guanyi Wang and Tianyu DING and Bo Ji and Sheng Yi and Zhihui Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=87Ti3dufEv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=87Ti3dufEv", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "3;2;3;3", "wc_review": "226;176;238;180", "wc_reply_reviewers": "0;0;0;28", "wc_reply_authors": "1259;433;763;405", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;2;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 205.0, 27.367864366808018 ], "wc_reply_reviewers_avg": [ 7.0, 12.12435565298214 ], "wc_reply_authors_avg": [ 715.0, 344.1889016223504 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16679928658593363552&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;2", "aff_unique_norm": "Georgia Institute of Technology;Johns Hopkins University;University of Denver", "aff_unique_dep": ";;", "aff_unique_url": "https://www.gatech.edu;https://www.jhu.edu;https://www.du.edu", "aff_unique_abbr": "Georgia Tech;JHU;DU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "CPT: Efficient Deep Neural Network Training via Cyclic Precision", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2552", "id": "87ZwsaQNHPZ", "poster": "", "openreview": "https://openreview.net/forum?id=87ZwsaQNHPZ", "slides": "https://iclr.cc/virtual/2021/poster/2552", "video": "https://iclr.cc/virtual/2021/poster/2552", "author_site": "Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin", "tldr": "", "abstract": "Low-precision deep neural network (DNN) training has gained tremendous attention as reducing precision is one of the most effective knobs for boosting DNNs' training time/energy efficiency. In this paper, we attempt to explore low-precision training from a new perspective as inspired by recent findings in understanding DNN training: we conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training, and advocate dynamic precision along the training trajectory for further boosting the time/energy efficiency of DNN training. Specifically, we propose Cyclic Precision Training (CPT) to cyclically vary the precision between two boundary values which can be identified using a simple precision range test within the first few training epochs. Extensive simulations and ablation studies on five datasets and eleven models demonstrate that CPT's effectiveness is consistent across various models/tasks (including classification and language modeling). Furthermore, through experiments and visualization we show that CPT helps to (1) converge to a wider minima with a lower generalization error and (2) reduce training variance which we believe opens up a new design knob for simultaneously improving the optimization and efficiency of DNN training.", "keywords": "Efficient training;low precision training", "primary_area": "", "supplementary_material": "", "author": "Yonggan Fu;Han Guo;Meng Li;Xin Yang;Yining Ding;Vikas Chandra;Yingyan Lin", "authorids": "~Yonggan_Fu1;hg31@rice.edu;meng.li@fb.com;xy33@rice.edu;yd31@rice.edu;~Vikas_Chandra2;~Yingyan_Lin1", "gender": "M;;;;;M;F", "homepage": "https://www.yongganfu.com/;;;;;https://v-chandra.github.io/;https://eiclab.scs.gatech.edu/", "dblp": "244/8166;;;;;57/5163;120/6981", "google_scholar": "https://scholar.google.com/citations?hl=en;;;;;p-h_BvcAAAAJ;dio8IesAAAAJ", "orcid": ";;;;;;", "linkedin": "yonggan-fu-b211831b0;;;;;vchandra/;yingyan-celine-lin-a281211a/", "or_profile": "~Yonggan_Fu1;hg31@rice.edu;meng.li@fb.com;xy33@rice.edu;yd31@rice.edu;~Vikas_Chandra2;~Yingyan_Lin1", "aff": "Rice University;;;;;Meta;Rice University", "aff_domain": "rice.edu;;;;;meta.com;rice.edu", "position": "PhD student;;;;;Director, AI;Assistant Professor", "bibtex": "@inproceedings{\nfu2021cpt,\ntitle={{CPT}: Efficient Deep Neural Network Training via Cyclic Precision},\nauthor={Yonggan Fu and Han Guo and Meng Li and Xin Yang and Yining Ding and Vikas Chandra and Yingyan Lin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=87ZwsaQNHPZ}\n}", "github": "[![github](/images/github_icon.svg) RICE-EIC/CPT](https://github.com/RICE-EIC/CPT)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "3;3;5;5", "wc_review": "259;353;397;379", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1042;991;649;818", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 347.0, 53.16013544000805 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 875.0, 154.65283702538406 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 47, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3211001313795403006&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=87ZwsaQNHPZ", "email": "rice.edu;;;;;meta.com;rice.edu", "author_num": 7, "aff_unique_index": "0;1;0", "aff_unique_norm": "Rice University;Meta", "aff_unique_dep": ";Meta Platforms, Inc.", "aff_unique_url": "https://www.rice.edu;https://meta.com", "aff_unique_abbr": "Rice;Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "88_MfcJoJlS", "title": "Guided Exploration with Proximal Policy Optimization using a Single Demonstration", "track": "main", "status": "Reject", "tldr": "", "abstract": " Solving sparse reward tasks through exploration is one of the major challenges in deep reinforcement learning, especially in three-dimensional, partially-observable environments. Critically, the algorithm proposed in this article uses a single human demonstration to solve hard-exploration problems. We train an agent on a combination of demonstrations and own experience to solve problems with variable initial conditions. We adapt this idea and integrate it with the proximal policy optimization (PPO). The agent is able to increase its performance and to tackle harder problems by replaying its own past trajectories prioritizing them based on the obtained reward and the maximum value of the trajectory.\nWe compare variations of this algorithm to different imitation learning algorithms on a set of hard-exploration tasks in the Animal-AI Olympics environment.\nTo the best of our knowledge, learning a task in a three-dimensional environment with comparable difficulty has never been considered before using only one human demonstration.", "keywords": "PPO;sparse rewards;single demonstration;3D environment", "primary_area": "", "supplementary_material": "", "author": "Gabriele Libardi;Gianni De Fabritiis", "authorids": "~Gabriele_Libardi1;~Gianni_De_Fabritiis1", "gender": ";M", "homepage": ";https://www.compscience.org", "dblp": ";29/605", "google_scholar": ";-_kX4kMAAAAJ", "orcid": ";", "linkedin": ";gdefabritiis/", "or_profile": "~Gabriele_Libardi1;~Gianni_De_Fabritiis1", "aff": ";Universitat Pompeu Fabra", "aff_domain": ";upf.edu", "position": ";Full Professor", "bibtex": "@misc{\nlibardi2021guided,\ntitle={Guided Exploration with Proximal Policy Optimization using a Single Demonstration},\nauthor={Gabriele Libardi and Gianni De Fabritiis},\nyear={2021},\nurl={https://openreview.net/forum?id=88_MfcJoJlS}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=88_MfcJoJlS", "pdf_size": 0, "rating": "4;6;6", "confidence": "5;3;4", "wc_review": "285;438;549", "wc_reply_reviewers": "0;0;89", "wc_reply_authors": "578;799;1303", "reply_reviewers": "0;0;1", "reply_authors": "1;1;2", "rating_avg": [ 5.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 424.0, 108.23123393919151 ], "wc_reply_reviewers_avg": [ 29.666666666666668, 41.95500235040182 ], "wc_reply_authors_avg": [ 893.3333333333334, 303.403288636685 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 25, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1058578842192260735&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "Universitat Pompeu Fabra", "aff_unique_dep": "", "aff_unique_url": "https://www.upf.edu/", "aff_unique_abbr": "UPF", "aff_country_unique_index": "0", "aff_country_unique": "Spain" }, { "id": "8CCwiOHx_17", "title": "Adversarial Environment Generation for Learning to Navigate the Web", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning to autonomously navigate the web is a difficult sequential decision making task. The state and action spaces are large and combinatorial in nature, and successful navigation may require traversing several partially-observed pages. One of the bottlenecks of training web navigation agents is providing a learnable curriculum of training environments that can cover the large variety of real-world websites. Therefore, we propose using Adversarial Environment Generation (AEG) to generate challenging web environments in which to train reinforcement learning (RL) agents. We introduce a new benchmarking environment, gMiniWoB, which enables an RL adversary to use compositional primitives to learn to generate complex websites. To train the adversary, we present a new decoder-like architecture that can directly control the difficulty of the environment, and a new training technique Flexible b-PAIRED. Flexible b-PAIRED jointly trains the adversary and a population of navigator agents and incentivizes the adversary to generate \u201djust-the-right-challenge\u201d environments by simultaneously learning two policies encoded in the adversary\u2019s architecture. First, for its environment complexity choice (difficulty budget), the adversary is rewarded with the performance of the best-performing agent in the population. Second, for selecting the design elements the adversary learns to maximize the regret using the difference in capabilities of navigator agents in population (flexible regret). The results show that the navigator agent trained with Flexible b-PAIRED generalizes to new environments, significantly outperforms competitive automatic curriculum generation baselines\u2014including a state-of-the-art RL web navigation approach and prior methods for minimax regret AEG\u2014on a set of challenging unseen test environments that are order of magnitude more complex than the previous benchmarks. The navigator agent achieves more than 75% success rate on all tasks, yielding 4x higher success rate that the strongest baseline.", "keywords": "Web Navigation;Adversarial Environment Generation;Web Environment Design;Minimax Regret Adversary;Auto Curriculum", "primary_area": "", "supplementary_material": "", "author": "Izzeddin Gur;Natasha Jaques;Kevin Malta;Manoj Tiwari;Honglak Lee;Aleksandra Faust", "authorids": "~Izzeddin_Gur1;~Natasha_Jaques1;kmalta@google.com;mjtiwari@google.com;~Honglak_Lee2;~Aleksandra_Faust1", "gender": ";F;;;;F", "homepage": ";https://natashajaques.ai/;;;;http://www.afaust.info", "dblp": "188/9027;145/7732;;;;135/8420", "google_scholar": "qS_ugJAAAAAJ;8iCb2TwAAAAJ;;;;RK72t68AAAAJ", "orcid": ";;;;;0000-0002-3268-8685", "linkedin": ";natashajaques;;;;aleksandrafaust", "or_profile": "~Izzeddin_Gur1;~Natasha_Jaques1;kmalta@google.com;mjtiwari@google.com;~Honglak_Lee2;~Aleksandra_Faust1", "aff": "Google;University of California, Berkeley;;;;Google Brain", "aff_domain": "google.com;berkeley.edu;;;;google.com", "position": "Research Scientist;Postdoc;;;;Principal Researcher", "bibtex": "@misc{\ngur2021adversarial,\ntitle={Adversarial Environment Generation for Learning to Navigate the Web},\nauthor={Izzeddin Gur and Natasha Jaques and Kevin Malta and Manoj Tiwari and Honglak Lee and Aleksandra Faust},\nyear={2021},\nurl={https://openreview.net/forum?id=8CCwiOHx_17}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=8CCwiOHx_17", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "3;3;1;3", "wc_review": "246;451;254;283", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "890;892;247;209", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 2.5, 0.8660254037844386 ], "wc_review_avg": [ 308.5, 83.41612553937038 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 559.5, 331.7728891877695 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.2581988897471611, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=680200578998634814&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "Google;University of California, Berkeley", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.berkeley.edu", "aff_unique_abbr": "Google;UC Berkeley", "aff_campus_unique_index": "0;1;0", "aff_campus_unique": "Mountain View;Berkeley", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "8CjVaaSSVxg", "title": "Learning Predictive Communication by Imagination in Networked System Control", "track": "main", "status": "Reject", "tldr": "", "abstract": "Dealing with multi-agent control in networked systems is one of the biggest challenges in Reinforcement Learning (RL) and limited success has been presented compared to recent deep reinforcement learning in single-agent domain. However, obstacles remain in addressing the delayed global information where each agent learns a decentralized control policy based on local observations and messages from connected neighbors. This paper first considers delayed global information sharing by combining the delayed global information and latent imagination of farsighted states in differentiable communication. Our model allows an agent to imagine its future states and communicate that with its neighbors. The predictive message sent to the connected neighbors reduces the delay in global information. On the tasks of networked multi-agent traffic control, experimental results show that our model helps stabilize the training of each local agent and outperforms existing algorithms for networked system control.", "keywords": "Reinforcement Learning;Multi-agent Reinforcement Learning;Networked System Control", "primary_area": "", "supplementary_material": "", "author": "Yali Du;Yifan Zhao;Meng Fang;Jun Wang;Gangyan Xu;Haifeng Zhang", "authorids": "~Yali_Du1;zhaoyifan@stu.hit.edu.cn;~Meng_Fang1;~Jun_Wang2;gangyan@hit.edu.cn;haifeng.zhang@ia.ac.cn", "gender": ";;M;M;;", "homepage": ";;;http://www0.cs.ucl.ac.uk/staff/jun.wang/;;", "dblp": ";;67/463;w/JunWang12;;", "google_scholar": ";;IcNYP1oAAAAJ;https://scholar.google.co.uk/citations?user=wIE1tY4AAAAJ;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Yali_Du1;zhaoyifan@stu.hit.edu.cn;~Meng_Fang1;~Jun_Wang2;gangyan@hit.edu.cn;haifeng.zhang@ia.ac.cn", "aff": ";;Eindhoven University of Technology;University College London;;", "aff_domain": ";;tue.nl;ucl.ac.uk;;", "position": ";;Assistant Professor;Professor;;", "bibtex": "@misc{\ndu2021learning,\ntitle={Learning Predictive Communication by Imagination in Networked System Control},\nauthor={Yali Du and Yifan Zhao and Meng Fang and Jun Wang and Gangyan Xu and Haifeng Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=8CjVaaSSVxg}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=8CjVaaSSVxg", "pdf_size": 0, "rating": "4;4;5", "confidence": "2;4;4", "wc_review": "189;353;807", "wc_reply_reviewers": "0;0;551", "wc_reply_authors": "266;331;661", "reply_reviewers": "0;0;1", "reply_authors": "1;1;2", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.9428090415820634 ], "wc_review_avg": [ 449.6666666666667, 261.3928503655412 ], "wc_reply_reviewers_avg": [ 183.66666666666666, 259.74389095585843 ], "wc_reply_authors_avg": [ 419.3333333333333, 172.93222821543577 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13591542415576331535&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Eindhoven University of Technology;University College London", "aff_unique_dep": ";", "aff_unique_url": "https://www.tue.nl;https://www.ucl.ac.uk", "aff_unique_abbr": "TU/e;UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Netherlands;United Kingdom" }, { "title": "Combining Label Propagation and Simple Models out-performs Graph Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3274", "id": "8E1-f3VhX1o", "poster": "", "openreview": "https://openreview.net/forum?id=8E1-f3VhX1o", "slides": "https://iclr.cc/virtual/2021/poster/3274", "video": "https://iclr.cc/virtual/2021/poster/3274", "author_site": "Qian Huang, Horace He, Abhay Singh, Ser-Nam Lim, Austin Benson", "tldr": "", "abstract": "Graph Neural Networks (GNNs) are a predominant technique for learning over graphs. However, there is relatively little understanding of why GNNs are successful in practice and whether they are necessary for good performance. Here, we show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs by combining shallow models that ignore the graph structure with two simple post-processing steps that exploit correlation in the label structure: (i) an \u201cerror correlation\u201d that spreads residual errors in training data to correct errors in test data and (ii) a \u201cprediction correlation\u201d that smooths the predictions on the test data. We call this overall procedure Correct and Smooth (C&S), and the post-processing steps are implemented via simple modifications to standard label propagation techniques that have long been used in graph-based semi-supervised learning. Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks, with just a small fraction of the parameters and orders of magnitude faster runtime. For instance, we exceed the best-known GNN performance on the OGB-Products dataset with 137 times fewer parameters and greater than 100 times less training time. The performance of our methods highlights how directly incorporating label information into the learning algorithm (as is common in traditional methods) yields easy and substantial performance gains. We can also incorporate our techniques into big GNN models, providing modest gains in some cases.", "keywords": "graphs;graph neural networks;label propagation;simple;residual", "primary_area": "", "supplementary_material": "/attachment/7b64325976716ec7699819e1af80b5849afd2b4e.zip", "author": "Qian Huang;Horace He;Abhay Singh;Ser-Nam Lim;Austin Benson", "authorids": "qh53@cornell.edu;~Horace_He1;as2626@cornell.edu;~Ser-Nam_Lim3;~Austin_Benson1", "gender": ";;;;M", "homepage": ";http://horace.io/;;;https://www.cs.cornell.edu/~arb/", "dblp": ";230/4428;;;https://dblp.uni-trier.de/pers/b/Benson:Austin_R=.html", "google_scholar": ";exzHWOwAAAAJ;;;BzOqNoQAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "qh53@cornell.edu;~Horace_He1;as2626@cornell.edu;~Ser-Nam_Lim3;~Austin_Benson1", "aff": ";Meta;;;Cornell University", "aff_domain": ";meta.com;;;cornell.edu", "position": ";Researcher;;;Assistant Professor", "bibtex": "@inproceedings{\nhuang2021combining,\ntitle={Combining Label Propagation and Simple Models out-performs Graph Neural Networks},\nauthor={Qian Huang and Horace He and Abhay Singh and Ser-Nam Lim and Austin Benson},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8E1-f3VhX1o}\n}", "github": "[![github](/images/github_icon.svg) CUAI/CorrectAndSmooth](https://github.com/CUAI/CorrectAndSmooth) + [![Papers with Code](/images/pwc_icon.svg) 6 community implementations](https://paperswithcode.com/paper/?openreview=8E1-f3VhX1o)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "3;4;4;4", "wc_review": "843;272;629;522", "wc_reply_reviewers": "213;0;125;0", "wc_reply_authors": "754;553;1472;285", "reply_reviewers": "2;0;1;0", "reply_authors": "3;1;3;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 566.5, 205.59000462084725 ], "wc_reply_reviewers_avg": [ 84.5, 90.04582166874819 ], "wc_reply_authors_avg": [ 766.0, 440.2584468241353 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 373, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3392954372444403130&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=8E1-f3VhX1o", "email": ";meta.com;;;cornell.edu", "author_num": 5, "aff_unique_index": "0;1", "aff_unique_norm": "Meta;Cornell University", "aff_unique_dep": "Meta Platforms, Inc.;", "aff_unique_url": "https://meta.com;https://www.cornell.edu", "aff_unique_abbr": "Meta;Cornell", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "8EGmvcCVrmZ", "title": "Deep Learning is Singular, and That's Good", "track": "main", "status": "Reject", "tldr": "", "abstract": "In singular models, the optimal set of parameters forms an analytic set with singularities and classical statistical inference cannot be applied to such models. This is significant for deep learning as neural networks are singular and thus ``dividing\" by the determinant of the Hessian or employing the Laplace approximation are not appropriate. Despite its potential for addressing fundamental issues in deep learning, singular learning theory appears to have made little inroads into the developing canon of deep learning theory. Via a mix of theory and experiment, we present an invitation to singular learning theory as a vehicle for understanding deep learning and suggest important future work to make singular learning theory directly applicable to how deep learning is performed in practice. ", "keywords": "deep learning theory;effective degrees of freedom;generalisation;posterior predictive distribution;real log canonical threshold;singular learning theory", "primary_area": "", "supplementary_material": "/attachment/cdea96d33b68519a8fb04c435d1de90312a35fae.zip", "author": "Daniel Murfet;Susan Wei;Mingming Gong;Hui Li;Jesse Gell-Redman;Thomas Quella", "authorids": "~Daniel_Murfet1;~Susan_Wei1;~Mingming_Gong1;huli2@student.unimelb.edu.au;j.gell@unimelb.edu.au;thomas.quella@unimelb.edu.au", "gender": "M;F;M;;;", "homepage": "http://therisingsea.org;https://www.suswei.com/;https://mingming-gong.github.io/;;;", "dblp": ";203/8878;98/8479;;;", "google_scholar": ";Udv9jsIAAAAJ;https://scholar.google.com.au/citations?user=6BmiCJIAAAAJ;;;", "orcid": ";0000-0002-6842-2352;0000-0001-7147-5589;;;", "linkedin": ";;;;;", "or_profile": "~Daniel_Murfet1;~Susan_Wei1;~Mingming_Gong1;huli2@student.unimelb.edu.au;j.gell@unimelb.edu.au;thomas.quella@unimelb.edu.au", "aff": "The University of Melbourne;The University of Melbourne;University of Melbourne;;;", "aff_domain": "unimelb.edu.au;unimelb.edu.au;unimelb.edu.au;;;", "position": "Assistant Professor;Assistant Professor;Assistant Professor;;;", "bibtex": "@misc{\nmurfet2021deep,\ntitle={Deep Learning is Singular, and That's Good},\nauthor={Daniel Murfet and Susan Wei and Mingming Gong and Hui Li and Jesse Gell-Redman and Thomas Quella},\nyear={2021},\nurl={https://openreview.net/forum?id=8EGmvcCVrmZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=8EGmvcCVrmZ", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "1;3;5;3", "wc_review": "159;385;351;230", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "181;239;507;362", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.0, 1.4142135623730951 ], "wc_review_avg": [ 281.25, 91.10536482556886 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 322.25, 125.0947141169442 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 37, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4421459091196164202&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Melbourne", "aff_unique_dep": "", "aff_unique_url": "https://www.unimelb.edu.au", "aff_unique_abbr": "UniMelb", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Australia" }, { "id": "8FRw857AYba", "title": "Sample efficient Quality Diversity for neural continuous control", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a novel Deep Neuroevolution algorithm, QD-RL, that combines the strengths of off-policy reinforcement learning (RL) algorithms and Quality Diversity (QD) approaches to solve continuous control problems with neural controllers. The QD part contributes structural biases by decoupling the search for diversity from the search for high return, resulting in efficient management of the exploration-exploitation trade-off. The RL part contributes sample efficiency by relying on off-policy gradient-based updates of the agents. More precisely, we train a population of off-policy deep RL agents to simultaneously maximize diversity within the population and the return of each individual agent. QD-RL selects agents interchangeably from a Pareto front or from a Map-Elites grid, resulting in stable and efficient population updates. Our experiments in the Ant-Maze and Ant-Trap environments show that QD-RL can solve challenging exploration and control problems with deceptive rewards while being two orders of magnitude more sample efficient than the evolutionary counterpart.", "keywords": "Deep Neuroevolution;Quality Diversity;Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Thomas PIERROT;Valentin Mac\u00e9;Geoffrey Cideron;Nicolas Perrin;Karim Beguir;Olivier Sigaud", "authorids": "~Thomas_PIERROT1;~Valentin_Mac\u00e91;~Geoffrey_Cideron1;~Nicolas_Perrin1;kb@instadeep.com;~Olivier_Sigaud1", "gender": "M;M;M;M;;M", "homepage": ";;;https://perrin-isir.github.io/;;http://people.isir.upmc.fr/sigaud", "dblp": "228/7739;;;37/1452.html;;50/5522", "google_scholar": "https://scholar.google.fr/citations?user=0zBiyNUAAAAJ;bzIEjccAAAAJ;https://scholar.google.com/citations?hl=en;_UceE6YAAAAJ;;https://scholar.google.fr/citations?user=elLfDv0AAAAJ", "orcid": "0000-0002-5227-6194;;;0000-0001-8626-1938;;0000-0002-8544-0229", "linkedin": "thomas-pierrot-120a43128/;valentinmace/;;nicolas-perrin-gilbert-2815a4179/;;", "or_profile": "~Thomas_PIERROT1;~Valentin_Mac\u00e91;~Geoffrey_Cideron1;~Nicolas_Perrin1;kb@instadeep.com;~Olivier_Sigaud1", "aff": "Universit\u00e9 Pierre et Marie Curie - Paris 6, Computer Science Lab - Pierre and Marie Curie University, Paris, France;;Google;ISIR, UMR 7222;;Sorbonne Universit\u00e9", "aff_domain": "isir.upmc.fr;;google.com;sorbonne-universite.fr;;upmc.fr", "position": "PhD student;;Research Engineer;Research scientist;;Full Professor", "bibtex": "@misc{\npierrot2021sample,\ntitle={Sample efficient Quality Diversity for neural continuous control},\nauthor={Thomas PIERROT and Valentin Mac{\\'e} and Geoffrey Cideron and Nicolas Perrin and Karim Beguir and Olivier Sigaud},\nyear={2021},\nurl={https://openreview.net/forum?id=8FRw857AYba}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=8FRw857AYba", "pdf_size": 0, "rating": "3;6;6;6", "confidence": "5;3;4;3", "wc_review": "113;1408;698;591", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "213;912;952;1133", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;2;2", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 702.5, 463.05858160712233 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 802.5, 350.38585873291174 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.8703882797784892, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17309196502543671854&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Universit\u00e9 Pierre et Marie Curie - Paris 6;Google;Institut des Sciences de l'Ing\u00e9nierie de Robotique;Sorbonne Universit\u00e9", "aff_unique_dep": "Computer Science Lab;Google;UMR 7222;", "aff_unique_url": "https://www.upmc.fr;https://www.google.com;https://www.isir.upmc.fr;https://www.sorbonne-universite.fr", "aff_unique_abbr": "UPMC;Google;ISIR;Sorbonne U", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Paris;Mountain View;", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "France;United States" }, { "title": "Separation and Concentration in Deep Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2842", "id": "8HhkbjrWLdE", "poster": "", "openreview": "https://openreview.net/forum?id=8HhkbjrWLdE", "slides": "https://iclr.cc/virtual/2021/poster/2842", "video": "https://iclr.cc/virtual/2021/poster/2842", "author_site": "John Zarka, Florentin Guth, St\u00e9phane Mallat", "tldr": "", "abstract": "Numerical experiments demonstrate that deep neural network classifiers progressively separate class distributions around their mean, achieving linear separability on the training set, and increasing the Fisher discriminant ratio. We explain this mechanism with two types of operators. We prove that a rectifier without biases applied to sign-invariant tight frames can separate class means and increase Fisher ratios. On the opposite, a soft-thresholding on tight frames can reduce within-class variabilities while preserving class means. Variance reduction bounds are proved for Gaussian mixture models. For image classification, we show that separation of class means can be achieved with rectified wavelet tight frames that are not learned. It defines a scattering transform. Learning $1 \\times 1$ convolutional tight frames along scattering channels and applying a soft-thresholding reduces within-class variabilities. The resulting scattering network reaches the classification accuracy of ResNet-18 on CIFAR-10 and ImageNet, with fewer layers and no learned biases.", "keywords": "fisher ratio;neural collapse;mean separation;concentration;variance reduction;deep learning;image classification", "primary_area": "", "supplementary_material": "/attachment/ac83f62890e38683600abf1995c0e8f5ce012674.zip", "author": "John Zarka;Florentin Guth;St\u00e9phane Mallat", "authorids": "~John_Zarka1;~Florentin_Guth1;~St\u00e9phane_Mallat1", "gender": ";;M", "homepage": ";;https://www.di.ens.fr/~mallat/", "dblp": ";223/6081;61/3978", "google_scholar": ";opC_fpQAAAAJ;https://scholar.google.com.tw/citations?user=g_YTmSgAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~John_Zarka1;~Florentin_Guth1;~St\u00e9phane_Mallat1", "aff": "Ecole Normale Superieure;Ecole Normale Sup\u00e9rieure;College de France", "aff_domain": "ens.fr;ens.fr;college-de-france.fr", "position": "PhD student;PhD student;Full Professor", "bibtex": "@inproceedings{\nzarka2021separation,\ntitle={Separation and Concentration in Deep Networks},\nauthor={John Zarka and Florentin Guth and St{\\'e}phane Mallat},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8HhkbjrWLdE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "4;3;3;4", "wc_review": "338;131;313;829", "wc_reply_reviewers": "0;0;0;53", "wc_reply_authors": "237;50;223;664", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 402.75, 258.73961331810017 ], "wc_reply_reviewers_avg": [ 13.25, 22.949673200287624 ], "wc_reply_authors_avg": [ 293.5, 226.23273414782398 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 41, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6824537402330659377&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=8HhkbjrWLdE", "email": "ens.fr;ens.fr;college-de-france.fr", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "Ecole Normale Superieure;Ecole Normale Sup\u00e9rieure;College de France", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ens.fr;https://www.ens.fr;https://www.college-de-france.fr", "aff_unique_abbr": "ENS;ENS;Coll\u00e8ge de France", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "France" }, { "id": "8IX8Qum6jGR", "title": "A Deep Graph Neural Networks Architecture Design: From Global Pyramid-like Shrinkage Skeleton to Local Link Rewiring", "track": "main", "status": "Withdraw", "tldr": "", "abstract": " Expressivity plays a fundamental role in evaluating deep neural networks, and it is closely related to understanding the limit of performance improvement. In this paper, we propose a three-pipeline training framework based on critical expressivity, including global model contraction, weight evolution, and link's weight rewiring. Specifically, we propose a pyramidal-like skeleton to overcome the saddle points that affect information transfer. Then we analyze the reason for the modularity (clustering) phenomenon in network topology and use it to rewire potential erroneous weighted links. We conduct numerical experiments on node classification and the results confirm that the proposed training framework leads to a significantly improved performance in terms of fast convergence and robustness to potential erroneous weighted links. The architecture design on GNNs, in turn, verifies the expressivity of GNNs from dynamics and topological space aspects and provides useful guidelines in designing more efficient neural networks. The code is available at https://github.com/xjglgjgl/SRGNN.", "keywords": "graph neural networks;architecture design;convergence;errorneous weight links", "primary_area": "", "supplementary_material": "", "author": "Gege Zhang;Gangwei Li;Weining Shen;Huixin Zhang;Weidong Zhang", "authorids": "~Gege_Zhang2;vili@nvidia.com;~Weining_Shen1;hxzhang2013@sjtu.edu.cn;wdzhang@sjtu.edu.cn", "gender": "F;;;;", "homepage": ";;https://faculty.sites.uci.edu/weinings/;;", "dblp": "155/8397;;;;", "google_scholar": ";;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Gege_Zhang2;vili@nvidia.com;~Weining_Shen1;hxzhang2013@sjtu.edu.cn;wdzhang@sjtu.edu.cn", "aff": ";;University of California, Irvine;;", "aff_domain": ";;uci.edu;;", "position": ";;Associate Professor;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=8IX8Qum6jGR", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;2;3", "wc_review": "191;153;575", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 306.3333333333333, 190.6083827001204 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:MF4uZcf6z_EJ:scholar.google.com/&scioq=A+Deep+Graph+Neural+Networks+Architecture+Design:+From+Global+Pyramid-like+Shrinkage+Skeleton+to+Local+Link+Rewiring&hl=en&as_sdt=0,14", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of California, Irvine", "aff_unique_dep": "", "aff_unique_url": "https://www.uci.edu", "aff_unique_abbr": "UCI", "aff_campus_unique_index": "0", "aff_campus_unique": "Irvine", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "8IbZUle6ieH", "title": "Graph Neural Network Acceleration via Matrix Dimension Reduction", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph Neural Networks (GNNs) have become the de facto method for machine learning on graph data (e.g., social networks, protein structures, code ASTs), but they require significant time and resource to train. One alternative method is Graph Neural Tangent Kernel (GNTK), a kernel method that corresponds to infinitely wide multi-layer GNNs. GNTK's parameters can be solved directly in a single step, avoiding time-consuming gradient descent. Today, GNTK is the state-of-the-art method to achieve high training speed without compromising accuracy. Unfortunately, solving for the kernel and searching for parameters can still take hours to days on real-world graphs. The current computation of GNTK has running time $O(N^4)$, where $N$ is the number of nodes in the graph. This prevents GNTK from scaling to datasets that contain large graphs. Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving. This allows us to reduce the dominated computation bottleneck term from $O(N^4)$ to $O(N^3)$. (2) We apply sketching to further reduce the bottleneck term to $o(N^{\\omega})$, where $\\omega \\approx 2.373$ is the exponent of current matrix multiplication. Experimentally, we demonstrate that our approaches speed up kernel learning by up to $19\\times$ on real-world benchmark datasets.", "keywords": "Graph Neural Networks;Deep learning;Optimization;Kernel Method", "primary_area": "", "supplementary_material": "/attachment/5295a94888896e20f0bdebdec089f64d35fc342e.zip", "author": "Shunhua Jiang;Yunze Man;Zhao Song;Danyang Zhuo", "authorids": "~Shunhua_Jiang1;~Yunze_Man2;~Zhao_Song3;~Danyang_Zhuo1", "gender": ";M;M;M", "homepage": "https://www.cs.columbia.edu/~jiangsh/;https://yunzeman.github.io/;https://www.youtube.com/@zhaosong2031;https://danyangzhuo.com/", "dblp": "198/0655;230/4287.html;76/4051-2;151/7537", "google_scholar": ";xvQIEKAAAAAJ;yDZct7UAAAAJ;E3yOuvEAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Shunhua_Jiang1;~Yunze_Man2;~Zhao_Song3;~Danyang_Zhuo1", "aff": "Columbia University;Carnegie Mellon University;Princeton University;Duke University", "aff_domain": "columbia.edu;cmu.edu;princeton.edu;duke.edu", "position": "PhD student;MS student;Postdoc;Assistant Professor", "bibtex": "@misc{\njiang2021graph,\ntitle={Graph Neural Network Acceleration via Matrix Dimension Reduction},\nauthor={Shunhua Jiang and Yunze Man and Zhao Song and Danyang Zhuo},\nyear={2021},\nurl={https://openreview.net/forum?id=8IbZUle6ieH}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=8IbZUle6ieH", "pdf_size": 0, "rating": "4;5;5", "confidence": "2;3;1", "wc_review": "463;204;184", "wc_reply_reviewers": "103;0;0", "wc_reply_authors": "1459;1013;224", "reply_reviewers": "1;0;0", "reply_authors": "3;2;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 2.0, 0.816496580927726 ], "wc_review_avg": [ 283.6666666666667, 127.07040917888354 ], "wc_reply_reviewers_avg": [ 34.333333333333336, 48.554665641476255 ], "wc_reply_authors_avg": [ 898.6666666666666, 510.6272830766314 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.0, 0.816496580927726 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:mSiGNp2LvYoJ:scholar.google.com/&scioq=Graph+Neural+Network+Acceleration+via+Matrix+Dimension+Reduction&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Columbia University;Carnegie Mellon University;Princeton University;Duke University", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.columbia.edu;https://www.cmu.edu;https://www.princeton.edu;https://www.duke.edu", "aff_unique_abbr": "Columbia;CMU;Princeton;Duke", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "8KhxoxKP3iL", "title": "Analyzing Attention Mechanisms through Lens of Sample Complexity and Loss Landscape", "track": "main", "status": "Reject", "tldr": "", "abstract": "Attention mechanisms have advanced state-of-the-art deep learning models in many machine learning tasks. Despite significant empirical gains, there is a lack of theoretical analyses on their effectiveness. In this paper, we address this problem by studying the sample complexity and loss landscape of attention-based neural networks. Our results show that, under mild assumptions, every local minimum of the attention model has low prediction error, and attention models require lower sample complexity than models without attention. Besides revealing why popular self-attention works, our theoretical results also provide guidelines for designing future attention models. Experiments on various datasets validate our theoretical findings.", "keywords": "Attention mechanisms;deep learning;sample complexity;self-attention", "primary_area": "", "supplementary_material": "", "author": "Bingyuan Liu;Yogesh Balaji;Lingzhou Xue;Martin Renqiang Min", "authorids": "~Bingyuan_Liu2;~Yogesh_Balaji1;~Lingzhou_Xue1;~Martin_Renqiang_Min1", "gender": "M;M;M;M", "homepage": ";https://yogeshbalaji.github.io/;https://lingzhou-xue.github.io/;http://www.cs.toronto.edu/~cuty", "dblp": ";185/6906;66/80;29/7048", "google_scholar": ";0I2qH0oAAAAJ;vfiEIqUAAAAJ;T2M4JjEAAAAJ", "orcid": ";;0000-0002-8252-0637;0000-0002-8563-6133", "linkedin": ";;;martin-renqiang-min-955a8766", "or_profile": "~Bingyuan_Liu2;~Yogesh_Balaji1;~Lingzhou_Xue1;~Martin_Renqiang_Min1", "aff": "Pennsylvania State University;Department of Computer Science, University of Maryland, College Park;Pennsylvania State University;NEC Laboratories America", "aff_domain": "psu.edu;cs.umd.edu;psu.edu;nec-labs.com", "position": "PhD student;PhD student;Associate Professor;Researcher", "bibtex": "@misc{\nliu2021analyzing,\ntitle={Analyzing Attention Mechanisms through Lens of Sample Complexity and Loss Landscape},\nauthor={Bingyuan Liu and Yogesh Balaji and Lingzhou Xue and Martin Renqiang Min},\nyear={2021},\nurl={https://openreview.net/forum?id=8KhxoxKP3iL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=8KhxoxKP3iL", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "3;4;3;3", "wc_review": "157;732;1159;295", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "450;1313;1233;1103", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;2;2", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 585.75, 393.18149435089134 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1024.75, 340.1899285693214 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11889904351979994112&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "Pennsylvania State University;University of Maryland, College Park;NEC Laboratories America", "aff_unique_dep": ";Department of Computer Science;", "aff_unique_url": "https://www.psu.edu;https://www/umd.edu;https://www.nec-labs.com", "aff_unique_abbr": "PSU;UMD;NEC Labs America", "aff_campus_unique_index": "1", "aff_campus_unique": ";College Park", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "On the Critical Role of Conventions in Adaptive Human-AI Collaboration", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2573", "id": "8Ln-Bq0mZcy", "poster": "", "openreview": "https://openreview.net/forum?id=8Ln-Bq0mZcy", "slides": "https://iclr.cc/virtual/2021/poster/2573", "video": "https://iclr.cc/virtual/2021/poster/2573", "author_site": "Andy Shih, Arjun Sawhney, Jovana Kondic, Stefano Ermon, Dorsa Sadigh", "tldr": "", "abstract": "Humans can quickly adapt to new partners in collaborative tasks (e.g. playing basketball), because they understand which fundamental skills of the task (e.g. how to dribble, how to shoot) carry over across new partners. Humans can also quickly adapt to similar tasks with the same partners by carrying over conventions that they have developed (e.g. raising hand signals pass the ball), without learning to coordinate from scratch. To collaborate seamlessly with humans, AI agents should adapt quickly to new partners and new tasks as well. However, current approaches have not attempted to distinguish between the complexities intrinsic to a task and the conventions used by a partner, and more generally there has been little focus on leveraging conventions for adapting to new settings. In this work, we propose a learning framework that teases apart rule-dependent representation from convention-dependent representation in a principled way. We show that, under some assumptions, our rule-dependent representation is a sufficient statistic of the distribution over best-response strategies across partners. Using this separation of representations, our agents are able to adapt quickly to new partners, and to coordinate with old partners on new tasks in a zero-shot manner. We experimentally validate our approach on three collaborative tasks varying in complexity: a contextual multi-armed bandit, a block placing task, and the card game Hanabi.", "keywords": "Multi-agent games;emergent behavior;transfer learning;human-AI collaboration", "primary_area": "", "supplementary_material": "/attachment/e49da4c2c473252f5aebf37992a1dce2489661c8.zip", "author": "Andy Shih;Arjun Sawhney;Jovana Kondic;Stefano Ermon;Dorsa Sadigh", "authorids": "~Andy_Shih1;~Arjun_Sawhney1;~Jovana_Kondic1;~Stefano_Ermon1;~Dorsa_Sadigh1", "gender": ";M;F;M;F", "homepage": "https://cs.stanford.edu/~andyshih/;;;http://cs.stanford.edu/~ermon/;https://dorsa.fyi/", "dblp": "https://dblp.uni-trier.de/pers/hd/s/Shih:Andy;;;47/8135;117/3174", "google_scholar": "G85kxUUAAAAJ;;CmAO43YAAAAJ;;ZaJEZpYAAAAJ", "orcid": ";;;;", "linkedin": ";arjun-s-8696b510a/;jovanakondic/;;", "or_profile": "~Andy_Shih1;~Arjun_Sawhney1;~Jovana_Kondic1;~Stefano_Ermon1;~Dorsa_Sadigh1", "aff": "Stanford University;Stanford University;Princeton University;Stanford University;Stanford University", "aff_domain": "cs.stanford.edu;stanford.edu;princeton.edu;stanford.edu;stanford.edu", "position": "PhD student;MS student;Undergrad student;Assistant Professor;Assistant Professor", "bibtex": "@inproceedings{\nshih2021on,\ntitle={On the Critical Role of Conventions in Adaptive Human-{\\{}AI{\\}} Collaboration},\nauthor={Andy Shih and Arjun Sawhney and Jovana Kondic and Stefano Ermon and Dorsa Sadigh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8Ln-Bq0mZcy}\n}", "github": "[![github](/images/github_icon.svg) Stanford-ILIAD/Conventions-ModularPolicy](https://github.com/Stanford-ILIAD/Conventions-ModularPolicy)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "2;3;4;4", "wc_review": "359;908;429;557", "wc_reply_reviewers": "0;0;206;137", "wc_reply_authors": "372;1467;522;1135", "reply_reviewers": "0;0;1;1", "reply_authors": "1;2;2;3", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 563.25, 211.32483881456056 ], "wc_reply_reviewers_avg": [ 85.75, 89.15260792596031 ], "wc_reply_authors_avg": [ 874.0, 446.0039237495563 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8703882797784891, "gs_citation": 47, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11035601410057323120&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=8Ln-Bq0mZcy", "email": "cs.stanford.edu;stanford.edu;princeton.edu;stanford.edu;stanford.edu", "author_num": 5, "aff_unique_index": "0;0;1;0;0", "aff_unique_norm": "Stanford University;Princeton University", "aff_unique_dep": ";", "aff_unique_url": "https://www.stanford.edu;https://www.princeton.edu", "aff_unique_abbr": "Stanford;Princeton", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Stanford;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Implicit Normalizing Flows", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2983", "id": "8PS8m9oYtNy", "poster": "", "openreview": "https://openreview.net/forum?id=8PS8m9oYtNy", "slides": "https://iclr.cc/virtual/2021/poster/2983", "video": "https://iclr.cc/virtual/2021/poster/2983", "author_site": "Cheng Lu, Jianfei Chen, Chongxuan Li, Qiuhao Wang, Jun Zhu", "tldr": "", "abstract": "Normalizing flows define a probability distribution by an explicit invertible transformation $\\boldsymbol{\\mathbf{z}}=f(\\boldsymbol{\\mathbf{x}})$. In this work, we present implicit normalizing flows (ImpFlows), which generalize normalizing flows by allowing the mapping to be implicitly defined by the roots of an equation $F(\\boldsymbol{\\mathbf{z}}, \\boldsymbol{\\mathbf{x}})= \\boldsymbol{\\mathbf{0}}$. ImpFlows build on residual flows (ResFlows) with a proper balance between expressiveness and tractability. Through theoretical analysis, we show that the function space of ImpFlow is strictly richer than that of ResFlows. Furthermore, for any ResFlow with a fixed number of blocks, there exists some function that ResFlow has a non-negligible approximation error. However, the function is exactly representable by a single-block ImpFlow. We propose a scalable algorithm to train and draw samples from ImpFlows. Empirically, we evaluate ImpFlow on several classification and density modeling tasks, and ImpFlow outperforms ResFlow with a comparable amount of parameters on all the benchmarks.", "keywords": "Normalizing flows;deep generative models;probabilistic inference;implicit functions", "primary_area": "", "supplementary_material": "/attachment/b530a9a64d0b8c3e33e0a8c9623dff64ac6b10fd.zip", "author": "Cheng Lu;Jianfei Chen;Chongxuan Li;Qiuhao Wang;Jun Zhu", "authorids": "~Cheng_Lu5;~Jianfei_Chen1;~Chongxuan_Li1;~Qiuhao_Wang1;~Jun_Zhu2", "gender": "M;M;M;;M", "homepage": "https://luchengthu.github.io/;http://ml.cs.tsinghua.edu.cn/~jianfei;http://ml.cs.tsinghua.edu.cn/~chongxuan;https://zero-lab-pku.github.io/personwise/wangqiuhao/;http://ml.cs.tsinghua.edu.cn/~jun", "dblp": "91/1482-11;48/6809-1;161/9965;;50/2644-1", "google_scholar": "vPE9VRoAAAAJ;di5RZ1MAAAAJ;UKMcQn4AAAAJ;;axsP38wAAAAJ", "orcid": ";;0000-0002-0912-9076;;", "linkedin": ";;;;", "or_profile": "~Cheng_Lu5;~Jianfei_Chen1;~Chongxuan_Li1;~Qiuhao_Wang1;~Jun_Zhu2", "aff": "Tsinghua University;Tsinghua University;Tsinghua University;Peking University;Tsinghua University", "aff_domain": "tsinghua.edu.cn;tsinghua.edu.cn;tsinghua.edu.cn;pku.edu.cn;mail.tsinghua.edu.cn", "position": "PhD student;Postdoc;Postdoc;MS student;Professor", "bibtex": "@inproceedings{\nlu2021implicit,\ntitle={Implicit Normalizing Flows},\nauthor={Cheng Lu and Jianfei Chen and Chongxuan Li and Qiuhao Wang and Jun Zhu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8PS8m9oYtNy}\n}", "github": "[![github](/images/github_icon.svg) thu-ml/implicit-normalizing-flows](https://github.com/thu-ml/implicit-normalizing-flows)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "7;7;8;8", "confidence": "3;4;4;4", "wc_review": "454;349;410;709", "wc_reply_reviewers": "20;0;0;774", "wc_reply_authors": "204;411;176;1040", "reply_reviewers": "1;0;0;1", "reply_authors": "1;1;1;2", "rating_avg": [ 7.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 480.5, 137.0921223119695 ], "wc_reply_reviewers_avg": [ 198.5, 332.3653862844325 ], "wc_reply_authors_avg": [ 457.75, 348.1999246122836 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 39, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12318247723954884767&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=8PS8m9oYtNy", "email": "tsinghua.edu.cn;tsinghua.edu.cn;tsinghua.edu.cn;pku.edu.cn;mail.tsinghua.edu.cn", "author_num": 5, "aff_unique_index": "0;0;0;1;0", "aff_unique_norm": "Tsinghua University;Peking University", "aff_unique_dep": ";", "aff_unique_url": "https://www.tsinghua.edu.cn;http://www.pku.edu.cn", "aff_unique_abbr": "THU;Peking U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "China" }, { "id": "8QAXsAOSBjE", "title": "Reusing Preprocessing Data as Auxiliary Supervision in Conversational Analysis", "track": "main", "status": "Reject", "tldr": "", "abstract": "Conversational analysis systems are trained using noisy human labels and often require heavy preprocessing during multi-modal feature extraction. Using noisy labels in single-task learning increases the risk of over-fitting. However, auxiliary tasks could improve the performance of the primary task learning. This approach is known as Primary Multi-Task Learning (MTL). A challenge of MTL is the selection of beneficial auxiliary tasks that avoid negative transfer. In this paper, we explore how the preprocessed data used for feature engineering can be re-used as auxiliary tasks in Primary MTL, thereby promoting the productive use of data in the form of auxiliary supervision learning. Our main contributions are: (1) the identification of sixteen beneficially auxiliary tasks, (2) the method of distributing learning capacity between the primary and auxiliary tasks, and (3) the relative supervision hierarchy between the primary and auxiliary tasks. Extensive experiments on IEMOCAP and SEMAINE data validate the improvements over single-task approaches, and suggest that it may generalize across multiple primary tasks.", "keywords": "Multitask Learning;Multimodal Conversational Analysis", "primary_area": "", "supplementary_material": "/attachment/404df574404e897f4e8158c6a4bb8a8f068c001a.zip", "author": "Joshua Yee Kim;Kalina Yacef", "authorids": "~Joshua_Yee_Kim1;kalina.yacef@sydney.edu.au", "gender": "M;", "homepage": "http://www.joshuakim.io/;", "dblp": ";", "google_scholar": "a7gWVEUAAAAJ;", "orcid": "0000-0002-0605-2114;", "linkedin": "joshuakimyeehaun/;", "or_profile": "~Joshua_Yee_Kim1;kalina.yacef@sydney.edu.au", "aff": "University of Sydney;", "aff_domain": "sydney.edu.au;", "position": "PhD student;", "bibtex": "@misc{\nkim2021reusing,\ntitle={Reusing Preprocessing Data as Auxiliary Supervision in Conversational Analysis},\nauthor={Joshua Yee Kim and Kalina Yacef},\nyear={2021},\nurl={https://openreview.net/forum?id=8QAXsAOSBjE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=8QAXsAOSBjE", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;5;4;4", "wc_review": "240;454;297;243", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "765;1469;432;501", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 308.5, 87.01293007363905 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 791.75, 410.28610444420366 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:1pcid5sqlL0J:scholar.google.com/&scioq=Reusing+Preprocessing+Data+as+Auxiliary+Supervision+in+Conversational+Analysis&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Sydney", "aff_unique_dep": "", "aff_unique_url": "https://www.sydney.edu.au", "aff_unique_abbr": "USYD", "aff_country_unique_index": "0", "aff_country_unique": "Australia" }, { "id": "8SP2-AiWttb", "title": "Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness", "track": "main", "status": "Reject", "tldr": "", "abstract": "Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients, a type of gradient masking, have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called \\emph{Imbalanced Gradients} that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a \\emph{Margin Decomposition (MD)} attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We examine 12 state-of-the-art defense models, and find that models exploiting label smoothing easily cause imbalanced gradients, and on which our MD attacks can decrease their PGD robustness (evaluated by PGD attack) by over 23\\%. For 6 out of the 12 defenses, our attack can reduce their PGD robustness by at least 9\\%. The results suggest that imbalanced gradients need to be carefully addressed for more reliable adversarial robustness.", "keywords": "Adversarial Attack;Robustness Evaluation;Adversarial Defense;Deep Neural Networks", "primary_area": "", "supplementary_material": "/attachment/7bda1a42ba83c73ac6b3cf699c92412998099c62.zip", "author": "Linxi Jiang;Xingjun Ma;Zejia Weng;James Bailey;Yu-Gang Jiang", "authorids": "lxjiang18@fudan.edu.cn;~Xingjun_Ma1;zjweng16@fudan.edu.cn;~James_Bailey1;~Yu-Gang_Jiang1", "gender": ";M;;;M", "homepage": ";http://xingjunma.com/;;;https://fvl.fudan.edu.cn/people/yugangjiang/", "dblp": ";195/8270;;;24/5818", "google_scholar": ";https://scholar.google.com.au/citations?user=XQViiyYAAAAJ;;;f3_FP8AAAAAJ", "orcid": ";;;;", "linkedin": ";xingjun-ma-173532129/;;;", "or_profile": "lxjiang18@fudan.edu.cn;~Xingjun_Ma1;zjweng16@fudan.edu.cn;~James_Bailey1;~Yu-Gang_Jiang1", "aff": ";Deakin University;;;Fudan University", "aff_domain": ";deakin.edu.au;;;fudan.edu.cn", "position": ";Assistant Professor;;;Full Professor", "bibtex": "@misc{\njiang2021imbalanced,\ntitle={Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness},\nauthor={Linxi Jiang and Xingjun Ma and Zejia Weng and James Bailey and Yu-Gang Jiang},\nyear={2021},\nurl={https://openreview.net/forum?id=8SP2-AiWttb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=8SP2-AiWttb", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "5;2;5;4", "wc_review": "437;285;618;276", "wc_reply_reviewers": "0;0;296;0", "wc_reply_authors": "510;400;774;553", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 1.224744871391589 ], "wc_review_avg": [ 404.0, 139.1312330140145 ], "wc_reply_reviewers_avg": [ 74.0, 128.17175976009693 ], "wc_reply_authors_avg": [ 559.25, 135.96208111087444 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.28867513459481287, "gs_citation": 25, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1401876808027941756&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1", "aff_unique_norm": "Deakin University;Fudan University", "aff_unique_dep": ";", "aff_unique_url": "https://www.deakin.edu.au;https://www.fudan.edu.cn", "aff_unique_abbr": "Deakin;Fudan", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Australia;China" }, { "title": "On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3345", "id": "8Sqhl-nF50", "poster": "", "openreview": "https://openreview.net/forum?id=8Sqhl-nF50", "slides": "https://iclr.cc/virtual/2021/poster/3345", "video": "https://iclr.cc/virtual/2021/poster/3345", "author_site": "Zhong Li, Jiequn Han, Weinan E, Qianxiao Li", "tldr": "", "abstract": "We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals and characterize the approximation rate. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs by gradient methods. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on both approximation and optimization: when there is long-term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with increasing memory - a phenomenon we call the \u201ccurse of memory\u201d. These analyses represent a basic step towards a concrete mathematical understanding of new phenomenons that may arise in learning temporal relationships using recurrent architectures.", "keywords": "recurrent neural network;dynamical system;universal approximation;optimization;curse of memory", "primary_area": "", "supplementary_material": "", "author": "Zhong Li;Jiequn Han;Weinan E;Qianxiao Li", "authorids": "~Zhong_Li2;~Jiequn_Han1;~Weinan_E1;~Qianxiao_Li1", "gender": "M;M;;M", "homepage": "https://www.microsoft.com/en-us/research/people/lzhong/;https://users.flatironinstitute.org/~jhan/;;https://blog.nus.edu.sg/qianxiaoli/", "dblp": ";190/7087;06/9390;172/0930.html", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;el5gT4AAAAAJ;;https://scholar.google.com.sg/citations?user=zLgReYoAAAAJ", "orcid": ";;;0000-0002-3903-3737", "linkedin": ";;;", "or_profile": "~Zhong_Li2;~Jiequn_Han1;~Weinan_E1;~Qianxiao_Li1", "aff": "Peking University;Princeton University;;National University of Singapore", "aff_domain": "pku.edu.cn;princeton.edu;;nus.edu.sg", "position": "PhD student;Postdoc;;Assistant Professor", "bibtex": "@inproceedings{\nli2021on,\ntitle={On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis},\nauthor={Zhong Li and Jiequn Han and Weinan E and Qianxiao Li},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8Sqhl-nF50}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "3;6;8;8", "confidence": "2;3;4;4", "wc_review": "302;599;676;370", "wc_reply_reviewers": "300;0;272;147", "wc_reply_authors": "2059;1814;874;342", "reply_reviewers": "1;0;2;1", "reply_authors": "3;3;2;1", "rating_avg": [ 6.25, 2.0463381929681126 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 486.75, 155.06349505928208 ], "wc_reply_reviewers_avg": [ 179.75, 118.69367085063972 ], "wc_reply_authors_avg": [ 1272.25, 695.7795538099693 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.9945577827230725, "gs_citation": 43, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10157340831286414344&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=8Sqhl-nF50", "email": "pku.edu.cn;princeton.edu;;nus.edu.sg", "author_num": 4, "aff_unique_index": "0;1;2", "aff_unique_norm": "Peking University;Princeton University;National University of Singapore", "aff_unique_dep": ";;", "aff_unique_url": "http://www.pku.edu.cn;https://www.princeton.edu;https://www.nus.edu.sg", "aff_unique_abbr": "Peking U;Princeton;NUS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2", "aff_country_unique": "China;United States;Singapore" }, { "title": "On the Transfer of Disentangled Representations in Realistic Settings", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2681", "id": "8VXvj1QNRl1", "poster": "", "openreview": "https://openreview.net/forum?id=8VXvj1QNRl1", "slides": "https://iclr.cc/virtual/2021/poster/2681", "video": "https://iclr.cc/virtual/2021/poster/2681", "author_site": "Andrea Dittadi, Frederik Tr\u00e4uble, Francesco Locatello, Manuel Wuthrich, Vaibhav Agrawal, Ole Winther, Stefan Bauer, Bernhard Schoelkopf", "tldr": "", "abstract": "Learning meaningful representations that disentangle the underlying structure of the data generating process is considered to be of key importance in machine learning. While disentangled representations were found to be useful for diverse tasks such as abstract reasoning and fair classification, their scalability and real-world impact remain questionable.\nWe introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images of the same setup. In contrast to previous work, this new dataset exhibits correlations, a complex underlying structure, and allows to evaluate transfer to unseen simulated and real-world settings where the encoder i) remains in distribution or ii) is out of distribution.\nWe propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset. We observe that disentanglement is a good predictor for out-of-distribution (OOD) task performance.", "keywords": "representation learning;disentanglement;real-world", "primary_area": "", "supplementary_material": "", "author": "Andrea Dittadi;Frederik Tr\u00e4uble;Francesco Locatello;Manuel Wuthrich;Vaibhav Agrawal;Ole Winther;Stefan Bauer;Bernhard Sch\u00f6lkopf", "authorids": "~Andrea_Dittadi1;~Frederik_Tr\u00e4uble1;~Francesco_Locatello1;~Manuel_Wuthrich1;~Vaibhav_Agrawal1;~Ole_Winther1;~Stefan_Bauer1;~Bernhard_Sch\u00f6lkopf1", "gender": "M;M;M;M;;M;;", "homepage": "https://addtt.github.io;https://ei.is.tuebingen.mpg.de/person/ftraeuble;https://twitter.com/FrancescoLocat8;;https://ei.is.tuebingen.mpg.de/person/vagrawal;https://olewinther.github.io/;https://cifar.ca/bios/stefan-bauer/;", "dblp": ";;195/6074;https://dblp.uni-trier.de/pers/hd/w/W=uuml=thrich:Manuel;;36/1568;;", "google_scholar": "PrvuuaAAAAAJ;https://scholar.google.de/citations?user=oc2OOyMAAAAJ;;;;7VAwhzUAAAAJ;O-oICE8AAAAJ;", "orcid": ";;;;;0000-0002-1966-3205;;", "linkedin": ";;;;;owinther/;;", "or_profile": "~Andrea_Dittadi1;~Frederik_Tr\u00e4uble1;~Francesco_Locatello1;~Manuel_Wuthrich1;~Vaibhav_Agrawal1;~Ole_Winther1;~Stefan_Bauer1;~Bernhard_Sch\u00f6lkopf1", "aff": "Amazon;Max Planck Institute for Intelligent Systems;Amazon;Max Planck Institute for Intelligent Systems;;Technical University of Denmark;Max Planck Institute for Intelligent Systems, Max-Planck Institute;", "aff_domain": "amazon.com;is.tuebingen.mpg.de;amazon.com;mpg.tuebingen.de;;dtu.dk;tuebingen.mpg.de;", "position": "Intern;PhD student;Senior Applied Scientist;Postdoc;;Full Professor;Research Group Leader;", "bibtex": "@inproceedings{\ndittadi2021on,\ntitle={On the Transfer of Disentangled Representations in Realistic Settings},\nauthor={Andrea Dittadi and Frederik Tr{\\\"a}uble and Francesco Locatello and Manuel Wuthrich and Vaibhav Agrawal and Ole Winther and Stefan Bauer and Bernhard Sch{\\\"o}lkopf},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8VXvj1QNRl1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "2;5;7;9", "confidence": "5;4;5;4", "wc_review": "317;324;1495;268", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "538;531;778;46", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 2.5860201081971503 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 601.0, 516.6018776582215 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 473.25, 265.9618158683686 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.4833682445228318, "gs_citation": 95, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12802052970920370542&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=8VXvj1QNRl1", "email": "amazon.com;is.tuebingen.mpg.de;amazon.com;mpg.tuebingen.de;;dtu.dk;tuebingen.mpg.de;", "author_num": 8, "aff_unique_index": "0;1;0;1;2;1", "aff_unique_norm": "Amazon;Max Planck Institute for Intelligent Systems;Technical University of Denmark", "aff_unique_dep": "Amazon.com, Inc.;Intelligent Systems;", "aff_unique_url": "https://www.amazon.com;https://www.mpi-is.mpg.de;https://www.tek.dk", "aff_unique_abbr": "Amazon;MPI-IS;DTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;1;2;1", "aff_country_unique": "United States;Germany;Denmark" }, { "id": "8W7LTo_zxdE", "title": "Variational Deterministic Uncertainty Quantification", "track": "main", "status": "Reject", "tldr": "", "abstract": "Building on recent advances in uncertainty quantification using a single deep deterministic model (DUQ), we introduce variational Deterministic Uncertainty Quantification (vDUQ). We overcome several shortcomings of DUQ by recasting it as a Gaussian process (GP) approximation. Our principled approximation is based on an inducing point GP in combination with Deep Kernel Learning. This enables vDUQ to use rigorous probabilistic foundations, and work not only on classification but also on regression problems. We avoid uncertainty collapse away from the training data by regularizing the spectral norm of the deep feature extractor. Our method matches SotA accuracy, 96.2\\% on CIFAR-10, while maintaining the speed of softmax models, and provides uncertainty estimates competitive with Deep Ensembles. We demonstrate our method in regression problems and by estimating uncertainty in causal inference for personalized medicine", "keywords": "Uncertainty estimation;gaussian processes;deep learning;variational inference", "primary_area": "", "supplementary_material": "", "author": "Joost van Amersfoort;Lewis Smith;Andrew Jesson;Oscar Key;Yarin Gal", "authorids": "~Joost_van_Amersfoort1;~Lewis_Smith1;~Andrew_Jesson1;~Oscar_Key1;~Yarin_Gal1", "gender": "M;;M;M;", "homepage": ";https://www.robots.ox.ac.uk/~lsgs;https://oatml.cs.ox.ac.uk/members/andrew_jesson/;https://oscarkey.github.io;http://www.cs.ox.ac.uk/people/yarin.gal/website//", "dblp": ";;;276/1203;67/9076", "google_scholar": "https://scholar.google.co.uk/citations?user=C0LaV8IAAAAJ;eWbHsRIAAAAJ;ElJ_fC4AAAAJ;;https://scholar.google.co.uk/citations?user=SIayDoQAAAAJ", "orcid": ";0000-0001-6632-8162;;;", "linkedin": ";;;;", "or_profile": "~Joost_van_Amersfoort1;~Lewis_Smith1;~Andrew_Jesson1;~Oscar_Key1;~Yarin_Gal1", "aff": "University of Oxford;University of Oxford;Department of Computer Science, University of Oxford;University College London;University of Oxford", "aff_domain": "ox.ac.uk;oxford.ac.uk;cs.ox.ac.uk;ucl.ac.uk;ox.ac.uk", "position": "PhD student;PhD student;PhD student;PhD student;Associate Professor", "bibtex": "@misc{\namersfoort2021variational,\ntitle={Variational Deterministic Uncertainty Quantification},\nauthor={Joost van Amersfoort and Lewis Smith and Andrew Jesson and Oscar Key and Yarin Gal},\nyear={2021},\nurl={https://openreview.net/forum?id=8W7LTo_zxdE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=8W7LTo_zxdE", "pdf_size": 0, "rating": "2;5;5;5", "confidence": "3;4;4;4", "wc_review": "230;564;558;407", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "549;692;641;638", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.299038105676658 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 439.75, 136.46313604779863 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 630.0, 51.45386282875174 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2972603681658450379&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;1;0", "aff_unique_norm": "University of Oxford;University College London", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://www.ucl.ac.uk", "aff_unique_abbr": "Oxford;UCL", "aff_campus_unique_index": "1", "aff_campus_unique": ";Oxford", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "8Woqvdszj8B", "title": "Illuminating Dark Knowledge via Random Matrix Ensembles", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "It is all but certain that machine learning models based on deep neural networks will soon feature ubiquitously in a wide variety of critical products and services that people rely on. This should be a major cause of concern given that we still lack a rigorous understanding of the failure modes of these systems, and can hardly make guarantees about the conditions under which the models are expected to work. In particular, we would like to understand how these models manage to generalize so well, even when seemingly overparametrized, effectively evading many of the intuitions expected from statistical learning theory. We argue that Distillation (Caruana et al., 2006, Hinton et al., 2014) provides us with a rich playground for understanding what enables generalization in a concrete setting. We carry out a precise high-dimensional analysis of generalization under distillation in a real world setting, eschewing ad hoc assumptions, and instead consider models actually encountered in the wild. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Anthony Ndirango", "authorids": "~Anthony_Ndirango1", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~Anthony_Ndirango1", "aff": "", "aff_domain": "", "position": "", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=8Woqvdszj8B", "pdf_size": 0, "rating": "1;2;2;4", "confidence": "4;4;3;3", "wc_review": "351;480;296;335", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 2.25, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 365.5, 69.06699645995909 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:JVb4NN5ztZoJ:scholar.google.com/&scioq=Illuminating+Dark+Knowledge+via+Random+Matrix+Ensembles&hl=en&as_sdt=0,5", "gs_version_total": 0 }, { "title": "PC2WF: 3D Wireframe Reconstruction from Raw Point Clouds", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3169", "id": "8X2eaSZxTP", "poster": "", "openreview": "https://openreview.net/forum?id=8X2eaSZxTP", "slides": "https://iclr.cc/virtual/2021/poster/3169", "video": "https://iclr.cc/virtual/2021/poster/3169", "author_site": "Yujia Liu, Stefano D'Aronco, Konrad Schindler, Jan D Wegner", "tldr": "", "abstract": "We introduce PC2WF, the first end-to-end trainable deep network architecture to convert a 3D point cloud into a wireframe model. The network takes as input an unordered set of 3D points sampled from the surface of some object, and outputs a wireframe of that object, i.e., a sparse set of corner points linked by line segments. Recovering the wireframe is a challenging task, where the numbers of both vertices and edges are different for every instance, and a-priori unknown. Our architecture gradually builds up the model: It starts by encoding the points into feature vectors. Based on those features, it identifies a pool of candidate vertices, then prunes those candidates to a final set of corner vertices and refines their locations. Next, the corners are linked with an exhaustive set of candidate edges, which is again pruned to obtain the final wireframe. All steps are trainable, and errors can be backpropagated through the entire sequence. We validate the proposed model on a publicly available synthetic dataset, for which the ground truth wireframes are accessible, as well as on a new real-world dataset. Our model produces wireframe abstractions of good quality and outperforms several baselines.", "keywords": "deep neural network;3d point cloud;wireframe model", "primary_area": "", "supplementary_material": "/attachment/245be12b020d26a6c6b623585936687da8af5cb6.zip", "author": "Yujia Liu;Stefano D'Aronco;Konrad Schindler;Jan Dirk Wegner", "authorids": "~Yujia_Liu3;~Stefano_D'Aronco1;~Konrad_Schindler1;~Jan_Dirk_Wegner1", "gender": "F;M;M;M", "homepage": "https://scholar.google.com/citations?hl=en&user=IwBPrmkAAAAJ;;https://igp.ethz.ch/personen/person-detail.html?persid=143986;https://igp.ethz.ch/personen/person-detail.html?persid=186562", "dblp": ";164/6077;73/488;66/8991", "google_scholar": ";https://scholar.google.it/citations?user=vLYzYl4AAAAJ;FZuNgqIAAAAJ;sxLG1rgAAAAJ", "orcid": ";0000-0003-0142-1731;0000-0002-3172-9246;0000-0002-0290-6901", "linkedin": ";;konrad-schindler-5b0b22153/;", "or_profile": "~Yujia_Liu3;~Stefano_D'Aronco1;~Konrad_Schindler1;~Jan_Dirk_Wegner3", "aff": "Swiss Federal Institute of Technology;Swiss Federal Institute of Technology;Swiss Federal Institute of Technology;University of Zurich", "aff_domain": "ethz.ch;ethz.ch;ethz.ch;uzh.ch", "position": "PhD student;Postdoc;Professor;Associate Professor", "bibtex": "@inproceedings{\nliu2021pcwf,\ntitle={{\\{}PC{\\}}2WF: 3D Wireframe Reconstruction from Raw Point Clouds},\nauthor={Yujia Liu and Stefano D'Aronco and Konrad Schindler and Jan Dirk Wegner},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8X2eaSZxTP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "5;3;4;3", "wc_review": "340;200;802;393", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "330;152;776;863", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;3", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 433.75, 223.99595420453468 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 530.25, 297.61079869520864 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 49, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6987500480707884796&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=8X2eaSZxTP", "email": "ethz.ch;ethz.ch;ethz.ch;uzh.ch", "author_num": 4, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Swiss Federal Institute of Technology;University of Zurich", "aff_unique_dep": ";", "aff_unique_url": "https://www.ethz.ch;https://www.unizh.ch", "aff_unique_abbr": "ETH Zurich;UZH", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Switzerland" }, { "id": "8Xi5MLFE_IW", "title": "Episodic Memory for Learning Subjective-Timescale Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "In model-based learning, an agent\u2019s model is commonly defined over transitions between consecutive states of an environment even though planning often requires reasoning over multi-step timescales, with intermediate states either unnecessary, or worse, accumulating prediction error. In contrast, intelligent behaviour in biological organisms is characterised by the ability to plan over varying temporal scales depending on the context. Inspired by the recent works on human time perception, we devise a novel approach to learning a transition dynamics model, based on the sequences of episodic memories that define the agent's subjective timescale \u2013 over which it learns world dynamics and over which future planning is performed. We implement this in the framework of active inference and demonstrate that the resulting subjective-timescale model (STM) can systematically vary the temporal extent of its predictions while preserving the same computational efficiency. Additionally, we show that STM predictions are more likely to introduce future salient events (for example new objects coming into view), incentivising exploration of new areas of the environment. As a result, STM produces more informative action-conditioned roll-outs that assist the agent in making better decisions. We validate significant improvement in our STM agent's performance in the Animal-AI environment against a baseline system, trained using the environment's objective-timescale dynamics.", "keywords": "Episodic Memory;Time Perception;Active Inference;Model-based Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Alexey Zakharov;Matthew Crosby;Zafeirios Fountas", "authorids": "~Alexey_Zakharov1;m.crosby@imperial.ac.uk;~Zafeirios_Fountas1", "gender": ";;M", "homepage": ";;http://zfountas.com/", "dblp": ";;", "google_scholar": ";;https://scholar.google.co.uk/citations?user=aaEGHR4AAAAJ", "orcid": ";;0000-0002-6312-3409", "linkedin": ";;zfountas/", "or_profile": "~Alexey_Zakharov1;m.crosby@imperial.ac.uk;~Zafeirios_Fountas1", "aff": ";;University College London", "aff_domain": ";;ucl.ac.uk", "position": ";;Honorary research fellow", "bibtex": "@misc{\nzakharov2021episodic,\ntitle={Episodic Memory for Learning Subjective-Timescale Models},\nauthor={Alexey Zakharov and Matthew Crosby and Zafeirios Fountas},\nyear={2021},\nurl={https://openreview.net/forum?id=8Xi5MLFE_IW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=8Xi5MLFE_IW", "pdf_size": 0, "rating": "4;4;5", "confidence": "4;3;3", "wc_review": "557;796;575", "wc_reply_reviewers": "0;306;392", "wc_reply_authors": "819;1271;653", "reply_reviewers": "0;1;1", "reply_authors": "2;2;1", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 642.6666666666666, 108.67177902084586 ], "wc_reply_reviewers_avg": [ 232.66666666666666, 168.22471743342462 ], "wc_reply_authors_avg": [ 914.3333333333334, 261.14789339035883 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1648703160273291180&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "University College London", "aff_unique_dep": "", "aff_unique_url": "https://www.ucl.ac.uk", "aff_unique_abbr": "UCL", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "8Y-Y7RVo8vn", "title": "Improved generalization by noise enhancement", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Recent studies have demonstrated that noise in stochastic gradient descent (SGD) is closely related to generalization; A larger SGD noise, if not too large, results in better generalization. Since the covariance of the SGD noise is proportional to $\\eta^2/B$, where $\\eta$ is the learning rate and $B$ is the minibatch size of SGD, so far the SGD noise has been controlled by changing $\\eta$ and/or $B$. However, too large $\\eta$ results in instability in the training dynamics and a small $B$ prevents scalable parallel computation. It is thus desirable to develop a method of controlling the SGD noise without changing $\\eta$ and $B$. In this paper, we propose a method that achieves this goal using ``noise enhancement'', which is easily implemented in practice. We expound the underlying theoretical idea and demonstrate that the noise enhancement actually improves generalization for real datasets. It turns out that large-batch training with the noise enhancement even shows better generalization compared with small-batch training.\n", "keywords": "deep learning;generalization;stochastic gradient descent;large-batch training", "primary_area": "", "supplementary_material": "/attachment/48ef287546aa9af2e305926b454700addc514515.zip", "author": "Takashi Mori;Masahito Ueda", "authorids": "~Takashi_Mori1;ueda@phys.s.u-tokyo.ac.jp", "gender": "M;", "homepage": "https://sites.google.com/view/takashimori/home;", "dblp": ";", "google_scholar": "https://scholar.google.co.jp/citations?hl=ja;", "orcid": ";", "linkedin": ";", "or_profile": "~Takashi_Mori1;ueda@phys.s.u-tokyo.ac.jp", "aff": "RIKEN;", "aff_domain": "riken.jp;", "position": "Postdoc;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=8Y-Y7RVo8vn", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "5;5;4;5", "wc_review": "324;381;668;546", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 479.75, 135.86459251769756 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10101031302351396984&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "RIKEN", "aff_unique_dep": "", "aff_unique_url": "https://www.riken.jp", "aff_unique_abbr": "RIKEN", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "id": "8YFhXYe1Ps", "title": "Interpretability Through Invertibility: A Deep Convolutional Network With Ideal Counterfactuals And Isosurfaces", "track": "main", "status": "Reject", "tldr": "", "abstract": "Current state of the art computer vision applications rely on highly complex models. Their interpretability is mostly limited to post-hoc methods which are not guaranteed to be faithful to the model. To elucidate a model\u2019s decision, we present a novel interpretable model based on an invertible deep convolutional network. Our model generates meaningful, faithful, and ideal counterfactuals. Using PCA on the classifier\u2019s input, we can also create \u201cisofactuals\u201d\u2013 image interpolations with the same outcome but visually meaningful different features. Counter- and isofactuals can be used to identify positive and negative evidence in an image. This can also be visualized with heatmaps. We evaluate our approach against gradient-based attribution methods, which we find to produce meaningless adversarial perturbations. Using our method, we reveal biases in three different datasets. In a human subject experiment, we test whether non-experts find our method useful to spot spurious correlations learned by a model. Our work is a step towards more trustworthy explanations for computer vision.", "keywords": "Interpretable Machine Learning;Counterfactuals;Computer Vision;Human Evaluation;User Study", "primary_area": "", "supplementary_material": "/attachment/74b00c8e6e7ea503ac48760cf7c184afada069eb.zip", "author": "Leon Sixt;Martin Schuessler;Philipp Wei\u00df;Tim Landgraf", "authorids": "~Leon_Sixt1;schuessler@tu-berlin.de;philipp@itp.tu-berlin.de;~Tim_Landgraf1", "gender": "M;;;", "homepage": "https://userpage.fu-berlin.de/leonsixt/;;;", "dblp": ";;;04/10008", "google_scholar": "XtejLN8AAAAJ;;;https://scholar.google.de/citations?user=ChX0opIAAAAJ", "orcid": ";;;0000-0003-4951-5235", "linkedin": ";;;", "or_profile": "~Leon_Sixt1;schuessler@tu-berlin.de;philipp@itp.tu-berlin.de;~Tim_Landgraf1", "aff": "Google;;;Freie Universit\u00e4t Berlin", "aff_domain": "google.com;;;fu-berlin.de", "position": "Internship;;;Assistant Professor", "bibtex": "@misc{\nsixt2021interpretability,\ntitle={Interpretability Through Invertibility: A Deep Convolutional Network With Ideal Counterfactuals And Isosurfaces},\nauthor={Leon Sixt and Martin Schuessler and Philipp Wei{\\ss} and Tim Landgraf},\nyear={2021},\nurl={https://openreview.net/forum?id=8YFhXYe1Ps}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer5", "site": "https://openreview.net/forum?id=8YFhXYe1Ps", "pdf_size": 0, "rating": "5;5;6;6;6", "confidence": "5;3;4;4;4", "wc_review": "672;247;1161;412;290", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "840;466;1589;331;393", "reply_reviewers": "0;0;0;0;0", "reply_authors": "2;1;3;1;1", "rating_avg": [ 5.6, 0.48989794855663565 ], "confidence_avg": [ 4.0, 0.6324555320336759 ], "wc_review_avg": [ 556.4, 336.58674959065155 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 723.8, 467.3873768085741 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6, 0.8 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14945620224065556570&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Google;Freie Universit\u00e4t Berlin", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.fu-berlin.de", "aff_unique_abbr": "Google;FU Berlin", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Germany" }, { "id": "8_7yhptEWD", "title": "On the Neural Tangent Kernel of Equilibrium Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Existing analyses of the neural tangent kernel (NTK) for infinite-depth networks show that the kernel typically becomes degenerate as the number of layers grows. This raises the question of how to apply such methods to practical \"infinite depth\" architectures such as the recently-proposed deep equilibrium (DEQ) model, which directly computes the infinite-depth limit of a weight-tied network via root-finding. In this work, we show that because of the input injection component of these networks, DEQ models have non-degenerate NTKs even in the infinite depth limit. Furthermore, we show that these kernels themselves can be computed by an analogous root-finding problem as in traditional DEQs, and highlight methods for computing the NTK for both fully-connected and convolutional variants. We evaluate these models empirically, showing they match or improve upon the performance of existing regularized NTK methods.", "keywords": "deel learning;equilibrium model;neural tangent kernel", "primary_area": "", "supplementary_material": "", "author": "Zhili Feng;J Zico Kolter", "authorids": "~Zhili_Feng1;~J_Zico_Kolter1", "gender": ";M", "homepage": "https://zhilif.github.io/;http://www.zicokolter.com", "dblp": "189/7590;67/2526", "google_scholar": "_lnL4aQAAAAJ;UXh1I6UAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Zhili_Feng1;~Zico_Kolter1", "aff": "Bosch;Carnegie Mellon University", "aff_domain": "bosch.com;cmu.edu", "position": "Intern;Full Professor", "bibtex": "@misc{\nfeng2021on,\ntitle={On the Neural Tangent Kernel of Equilibrium Models},\nauthor={Zhili Feng and J Zico Kolter},\nyear={2021},\nurl={https://openreview.net/forum?id=8_7yhptEWD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=8_7yhptEWD", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "4;3;4;4", "wc_review": "254;302;453;198", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "99;131;351;124", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 301.75, 94.76385123030828 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 176.25, 101.5907845230068 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7234461327107346483&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Robert Bosch GmbH;Carnegie Mellon University", "aff_unique_dep": ";", "aff_unique_url": "https://www.bosch.com;https://www.cmu.edu", "aff_unique_abbr": "Bosch;CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Germany;United States" }, { "id": "8_Ve-wi_IOx", "title": "Interpretable Super-Resolution via a Learned Time-Series Representation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We develop an interpretable and learnable Wigner-Ville distribution that produces a super-resolved quadratic signal representation for time-series analysis.\nOur approach has two main hallmarks. \nFirst, it interpolates between known time-frequency representations (TFRs) in that it can reach super-resolution with increased time and frequency resolution beyond what the Heisenberg uncertainty principle prescribes and thus beyond commonly employed TFRs, \nSecond, it is interpretable thanks to an explicit low-dimensional and physical parametrization of the Wigner-Ville distribution.\nWe demonstrate that our approach is able to learn highly adapted TFRs and is ready and able to tackle various large-scale classification tasks, where we reach state-of-the-art performance compared to baseline and learned TFRs.", "keywords": "time frequency representation;time series;wigner ville;cohen class;wavelet transform;scalogram;bird;speech", "primary_area": "", "supplementary_material": "", "author": "Randall Balestriero;Herv\u00e9 Glotin;Richard Baraniuk", "authorids": "~Randall_Balestriero1;~Herv\u00e9_Glotin1;~Richard_Baraniuk1", "gender": "M;M;", "homepage": "https://randallbalestriero.github.io/;http://glotin.univ-tln.fr;http://richb.rice.edu/", "dblp": "175/5364;http://dblp.uni-trier.de/pers/hd/g/Glotin:Herv=eacute=;32/2804", "google_scholar": "S1x_xqcAAAAJ;DqieizcAAAAJ;https://scholar.google.com.tw/citations?user=N-BBA20AAAAJ", "orcid": ";http://orcid.org/0000-0001-7338-8518;", "linkedin": "randallbalestriero/;herv%C3%A9-glotin-06249b21/;richard-baraniuk", "or_profile": "~Randall_Balestriero1;~Herv\u00e9_Glotin1;~Richard_Baraniuk1", "aff": "Rice University;CNRS university Toulon;William Marsh Rice University", "aff_domain": "rice.edu;univ-tln.fr;rice.edu", "position": "PhD student;Full Professor;C. Sidney Burrus Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=8_Ve-wi_IOx", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "2;5;4;4", "wc_review": "478;831;507;102", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;479;44", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;1;1", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 479.5, 258.27165930469414 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 130.75, 201.86304144146843 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.5, 0.5 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.22941573387056177, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=85654950853379360&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;0", "aff_unique_norm": "Rice University;CNRS", "aff_unique_dep": ";university Toulon", "aff_unique_url": "https://www.rice.edu;https://www.cnrs.fr", "aff_unique_abbr": "Rice;CNRS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;France" }, { "id": "8bZC3CyF-f7", "title": "Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement Learning algorithms require a large number of samples to solve complex tasks with sparse and delayed rewards. \nComplex tasks are often hierarchically composed of sub-tasks.\nA step in the Q-function indicates solving a sub-task, where the expectation of the return increases. \nRUDDER identifies these steps and then redistributes reward to them, thus immediately giving reward if sub-tasks are solved. \nSince the delay of rewards is reduced, learning is considerably sped up.\nHowever, for complex tasks, current exploration strategies struggle with discovering episodes with high rewards.\nTherefore, we assume that episodes with high rewards are given as demonstrations and do not have to be discovered by exploration.\nTypically the number of demonstrations is small and RUDDER's LSTM model does not learn well.\nHence, we introduce Align-RUDDER, which is RUDDER with two major modifications. \nFirst, Align-RUDDER assumes that episodes with high rewards are given as demonstrations, \nreplacing RUDDER\u2019s safe exploration and lessons replay buffer.\nSecond, we substitute RUDDER\u2019s LSTM model by a profile model that is obtained from multiple sequence alignment of demonstrations. \nProfile models can be constructed from as few as two demonstrations.\nAlign-RUDDER inherits the concept of reward redistribution, which speeds up learning by reducing the delay of rewards. \nAlign-RUDDER outperforms competitors on complex artificial tasks with delayed reward and few demonstrations.\nOn the MineCraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/e5d8a24a10b8abdaa385cf965b8d646f6b3f5f1e.zip", "author": "Vihang Prakash Patil;Markus Hofmarcher;Marius-Constantin Dinu;Matthias Dorfer;Patrick M Blies;Johannes Brandstetter;Jose Arjona-Medina;Sepp Hochreiter", "authorids": "~Vihang_Prakash_Patil1;~Markus_Hofmarcher1;dinu@ml.jku.at;~Matthias_Dorfer1;~Patrick_M_Blies1;~Johannes_Brandstetter1;~Jose_Arjona-Medina1;~Sepp_Hochreiter1", "gender": "M;M;;;;M;;M", "homepage": "https://vihangp.github.io;;;https://www.jku.at/en/institut-fuer-computational-perception/ueber-uns/ehemalige-mitarbeiter/matthias-dorfer/;;;;https://www.jku.at/en/institute-for-machine-learning/about-us/team/sepp-hochreiter/", "dblp": "https://dblp.uni-trier.de/pid/275/2942;224/9960;;;;251/8691;;h/SeppHochreiter.html", "google_scholar": "1iwYpk0AAAAJ;FD27EMIAAAAJ;;;;KiRvOHcAAAAJ;;https://scholar.google.at/citations?user=tvUH3WMAAAAJ", "orcid": ";;;;;;;0000-0001-7449-2528", "linkedin": ";;;;;;;https://linkedin.com/in/sepp-hochreiter-41514846", "or_profile": "~Vihang_Prakash_Patil1;~Markus_Hofmarcher1;dinu@ml.jku.at;~Matthias_Dorfer1;~Patrick_M_Blies1;~Johannes_Brandstetter1;~Jose_Arjona-Medina1;~Sepp_Hochreiter1", "aff": "Johannes Kepler University Linz;Johannes Kepler Universit\u00e4t Linz;;;;Johannes Kepler University Linz;;Johannes Kepler University Linz", "aff_domain": "jku.at;jku.at;;;;jku.at;;jku.at", "position": "PhD student;PhD student;;;;Assistant Professor;;Full Professor", "bibtex": "@misc{\npatil2021alignrudder,\ntitle={Align-{\\{}RUDDER{\\}}: Learning From Few Demonstrations by Reward Redistribution},\nauthor={Vihang Prakash Patil and Markus Hofmarcher and Marius-Constantin Dinu and Matthias Dorfer and Patrick M Blies and Johannes Brandstetter and Jose Arjona-Medina and Sepp Hochreiter},\nyear={2021},\nurl={https://openreview.net/forum?id=8bZC3CyF-f7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=8bZC3CyF-f7", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;3;4;4", "wc_review": "449;584;599;443", "wc_reply_reviewers": "0;0;61;42", "wc_reply_authors": "671;503;824;419", "reply_reviewers": "0;0;1;1", "reply_authors": "1;1;2;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 518.75, 72.97388231415401 ], "wc_reply_reviewers_avg": [ 25.75, 26.61179249881526 ], "wc_reply_authors_avg": [ 604.25, 155.97656073910593 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 64, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17099796649634976721&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Johannes Kepler University;Johannes Kepler University Linz", "aff_unique_dep": ";", "aff_unique_url": "https://www.jku.at;https://www.jku.at", "aff_unique_abbr": "JKU;JKU", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Linz", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Austria" }, { "title": "FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3343", "id": "8cpHIfgY4Dj", "poster": "", "openreview": "https://openreview.net/forum?id=8cpHIfgY4Dj", "slides": "https://iclr.cc/virtual/2021/poster/3343", "video": "https://iclr.cc/virtual/2021/poster/3343", "author_site": "Lanqing Li, Rui Yang, Dijun Luo", "tldr": "", "abstract": "We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks without any interactions with the environments, making RL truly practical in many real-world applications. This problem is still not fully understood, for which two major challenges need to be addressed. First, offline RL usually suffers from bootstrapping errors of out-of-distribution state-actions which leads to divergence of value functions. Second, meta-RL requires efficient and robust task inference learned jointly with control policy. In this work, we enforce behavior regularization on learned policy as a general approach to offline RL, combined with a deterministic context encoder for efficient task inference. We propose a novel negative-power distance metric on bounded context embedding space, whose gradients propagation is detached from the Bellman backup. We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches involving meta-RL and distance metric learning. To the best of our knowledge, our method is the first model-free and end-to-end OMRL algorithm, which is computationally efficient and demonstrated to outperform prior algorithms on several meta-RL benchmarks.", "keywords": "offline/batch reinforcement learning;meta-reinforcement learning;multi-task reinforcement learning;distance metric learning;contrastive learning", "primary_area": "", "supplementary_material": "", "author": "Lanqing Li;Rui Yang;Dijun Luo", "authorids": "~Lanqing_Li1;yangrui19@mails.tsinghua.edu.cn;~Dijun_Luo1", "gender": "M;;M", "homepage": "https://lanqingli1993.github.io/;;https://sites.google.com/site/dijunluo/", "dblp": "275/9979;;", "google_scholar": "n8IjgKkAAAAJ;;y_1aniIAAAAJ", "orcid": "0000-0003-1998-4022;;", "linkedin": "lanqing-li-%EF%BC%88%E6%9D%8E%E8%93%9D%E9%9D%92%EF%BC%89-49209a83/;;", "or_profile": "~Lanqing_Li1;yangrui19@mails.tsinghua.edu.cn;~Dijun_Luo1", "aff": "Tencent AI Lab;;Tencent AI Lab", "aff_domain": "tencent.com;;tencent.com", "position": "Research Scientist;;Researcher", "bibtex": "@inproceedings{\nli2021focal,\ntitle={{FOCAL}: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization},\nauthor={Lanqing Li and Rui Yang and Dijun Luo},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8cpHIfgY4Dj}\n}", "github": "[![github](/images/github_icon.svg) FOCAL-ICLR/FOCAL-ICLR](https://github.com/FOCAL-ICLR/FOCAL-ICLR)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "5;5;7", "confidence": "3;4;3", "wc_review": "476;455;471", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "629;1079;515", "reply_reviewers": "0;0;0", "reply_authors": "1;2;1", "rating_avg": [ 5.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 467.3333333333333, 8.9566858950296 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 741.0, 243.4912729442269 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.49999999999999983, "gs_citation": 87, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9761035246816366860&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=8cpHIfgY4Dj", "email": "tencent.com;;tencent.com", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Tencent", "aff_unique_dep": "Tencent AI Lab", "aff_unique_url": "https://ai.tencent.com", "aff_unique_abbr": "Tencent AI Lab", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "title": "MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3071", "id": "8e6BrwU6AjQ", "poster": "", "openreview": "https://openreview.net/forum?id=8e6BrwU6AjQ", "slides": "https://iclr.cc/virtual/2021/poster/3071", "video": "https://iclr.cc/virtual/2021/poster/3071", "author_site": "Duy-Kien Nguyen, Vedanuj Goswami, Xinlei Chen", "tldr": "", "abstract": "This paper focuses on visual counting, which aims to predict the number of occurrences given a natural image and a query (e.g. a question or a category). Unlike most prior works that use explicit, symbolic models which can be computationally expensive and limited in generalization, we propose a simple and effective alternative by revisiting modulated convolutions that fuse the query and the image locally. Following the design of residual bottleneck, we call our method MoVie, short for Modulated conVolutional bottlenecks. Notably, MoVie reasons implicitly and holistically and only needs a single forward-pass during inference. Nevertheless, MoVie showcases strong performance for counting: 1) advancing the state-of-the-art on counting-specific VQA tasks while being more efficient; 2) outperforming prior-art on difficult benchmarks like COCO for common object counting; 3) helped us secure the first place of 2020 VQA challenge when integrated as a module for \u2018number\u2019 related questions in generic VQA models. Finally, we show evidence that modulated convolutions such as MoVie can serve as a general mechanism for reasoning tasks beyond counting.", "keywords": "visual counting;visual question answering;common object counting;visual reasoning;modulated convolution", "primary_area": "", "supplementary_material": "/attachment/5aeea56934d90105d467d7556bfe54c115be6be9.zip", "author": "Duy Kien Nguyen;Vedanuj Goswami;Xinlei Chen", "authorids": "~Duy_Kien_Nguyen1;~Vedanuj_Goswami1;~Xinlei_Chen1", "gender": "M;M;M", "homepage": ";https://vedanuj.github.io/;http://xinleic.xyz", "dblp": "218/5480.html;156/5885;", "google_scholar": "welhhBIAAAAJ;bh08FeIAAAAJ;bSU7LYoAAAAJ", "orcid": ";;", "linkedin": "https://linkedin.com/in/duy-kien-nguyen-940b63109;;", "or_profile": "~Duy_Kien_Nguyen1;~Vedanuj_Goswami1;~Xinlei_Chen1", "aff": "University of Amsterdam;;Meta", "aff_domain": "uva.nl;;meta.com", "position": "PhD student;;Researcher", "bibtex": "@inproceedings{\nnguyen2021movie,\ntitle={MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond},\nauthor={Duy Kien Nguyen and Vedanuj Goswami and Xinlei Chen},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8e6BrwU6AjQ}\n}", "github": "[![github](/images/github_icon.svg) facebookresearch/mmf](https://github.com/facebookresearch/mmf/tree/master/projects/movie_mcan)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "5;3;3;4", "wc_review": "410;832;511;844", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "360;488;376;472", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 649.25, 192.144964805222 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 424.0, 56.568542494923804 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 36, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3698897327076309369&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=8e6BrwU6AjQ", "email": "uva.nl;;meta.com", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of Amsterdam;Meta", "aff_unique_dep": ";Meta Platforms, Inc.", "aff_unique_url": "https://www.uva.nl;https://meta.com", "aff_unique_abbr": "UvA;Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Netherlands;United States" }, { "id": "8iW8HOidj1_", "title": "Dream and Search to Control: Latent Space Planning for Continuous Control", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning and planning with latent space dynamics has been shown to be useful for sample efficiency in model-based reinforcement learning (MBRL) for discrete and continuous control tasks. In particular, recent work, for discrete action spaces, demonstrated the effectiveness of latent-space planning via Monte-Carlo Tree Search (MCTS) for bootstrapping MBRL during learning and at test time. However, the potential gains from latent-space tree search have not yet been demonstrated for environments with continuous action spaces. In this work, we propose and explore an MBRL approach for continuous action spaces based on tree-based planning over learned latent dynamics. We show that it is possible to demonstrate the types of bootstrapping benefits as previously shown for discrete spaces. In particular, the approach achieves improved sample efficiency and performance on a majority of challenging continuous-control benchmarks compared to the state-of-the-art. ", "keywords": "Reinforcement Learning;Model Based RL;Continuous Control;Search;Planning;MCTS", "primary_area": "", "supplementary_material": "/attachment/5dddb4386eb44a5d3ec32ae7d5f66514a4817eb1.zip", "author": "Anurag Koul;Varun Kumar Vijay;Alan Fern;Somdeb Majumdar", "authorids": "~Anurag_Koul1;~Varun_Kumar_Vijay1;~Alan_Fern1;~Somdeb_Majumdar1", "gender": "M;;M;M", "homepage": "http://koulanurag.github.io/;;http://www.eecs.oregonstate.edu/~afern;https://www.intel.ai/bio/somdeb-majumdar/", "dblp": "209/9666;;49/6764;63/8320", "google_scholar": "K-Q0Xq4AAAAJ;;https://scholar.google.com.tw/citations?user=GaKxFrcAAAAJ;", "orcid": ";;;", "linkedin": "koulanurag/;varun-vijay-24384295/;;somdebmajumdar/", "or_profile": "~Anurag_Koul1;~Varun_Kumar_Vijay1;~Alan_Fern1;~Somdeb_Majumdar1", "aff": "Oregon State University;;;Intel", "aff_domain": "oregonstate.edu;;;intel.com", "position": "PhD student;;;AI/ML Researcher", "bibtex": "@misc{\nkoul2021dream,\ntitle={Dream and Search to Control: Latent Space Planning for Continuous Control},\nauthor={Anurag Koul and Varun Kumar Vijay and Alan Fern and Somdeb Majumdar},\nyear={2021},\nurl={https://openreview.net/forum?id=8iW8HOidj1_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=8iW8HOidj1_", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;5;4;4", "wc_review": "659;416;354;276", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "477;341;249;83", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 426.25, 143.2417100568127 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 287.5, 143.24367350776788 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6888332924823605158&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1", "aff_unique_norm": "Oregon State University;Intel", "aff_unique_dep": ";Intel Corporation", "aff_unique_url": "https://oregonstate.edu;https://www.intel.com", "aff_unique_abbr": "OSU;Intel", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "8mVSD0ETOXl", "title": "Prediction of Enzyme Specificity using Protein Graph Convolutional Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Specific molecular recognition by proteins, for example, protease enzymes, is critical for maintaining the robustness of key life processes. The substrate specificity landscape of a protease enzyme comprises the set of all sequence motifs that are recognized/cut, or just as importantly, not recognized/cut by the enzyme. Current methods for predicting protease specificity landscapes rely on learning sequence patterns in experimentally derived data with a single enzyme, but are not robust to even small mutational changes. A comprehensive evaluation of specificity requires consideration of the three-dimensional structure and energetics of molecular interactions. In this work, we present a protein graph convolutional neural network (PGCN), which uses a physically intuitive, structure-based molecular interaction graph generated using the Rosetta energy function that describes the topology and energetic features, to determine substrate specificity. We use the PGCN to recapitulate and predict the specificity of the NS3/4 protease from the Hepatitic C virus. We compare our PGCN with previously used machine learning models and show that its performance in classification tasks is equivalent or better. Because PGCN is based on physical interactions, it is inherently more interpretable; determination of feature importance reveals key sub-graph patterns responsible for molecular recognition that are biochemically reasonable. The PGCN model also readily lends itself to the design of novel enzymes with tailored specificity against disease targets.", "keywords": "graph convolutional neural networks;protease specificity;proteins;Rosetta energy function", "primary_area": "", "supplementary_material": "", "author": "Changpeng Lu;Samuel Z Stentz;Joseph H Lubin;Sijian Wang;Sagar D Khare", "authorids": "~Changpeng_Lu1;samuelstentz@gatech.edu;jhl133@scarletmail.rutgers.edu;sijian.wang@stat.rutgers.edu;~Sagar_D_Khare1", "gender": ";;;;", "homepage": ";;;;", "dblp": ";;;;", "google_scholar": ";;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Changpeng_Lu1;samuelstentz@gatech.edu;jhl133@scarletmail.rutgers.edu;sijian.wang@stat.rutgers.edu;~Sagar_D_Khare1", "aff": ";;;;", "aff_domain": ";;;;", "position": ";;;;", "bibtex": "@misc{\nlu2021prediction,\ntitle={Prediction of Enzyme Specificity using Protein Graph Convolutional Neural Networks},\nauthor={Changpeng Lu and Samuel Z Stentz and Joseph H Lubin and Sijian Wang and Sagar D Khare},\nyear={2021},\nurl={https://openreview.net/forum?id=8mVSD0ETOXl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=8mVSD0ETOXl", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "5;4;4;3", "wc_review": "194;450;824;369", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 459.25, 230.01671134941478 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7071067811865475, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:H3qkVR3AevoJ:scholar.google.com/&scioq=Prediction+of+Enzyme+Specificity+using+Protein+Graph+Convolutional+Neural+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0 }, { "id": "8m_XkdqjZAr", "title": "AE-SMOTE: A Multi-Modal Minority Oversampling Framework", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Real-world binary classification tasks are in many cases unbalanced i.e. the minority class is much smaller than the majority class. This skewness is challenging for machine learning algorithms as they tend to focus on the majority and greatly misclassify the minority. Oversampling the minority using \\emph{SMOTE} before training the model is a popular method to address this challenge. Inspired by \\emph{SMOTE}, we propose \\emph{AE-SMOTE}, which by using an autoencoder, (1) maps the features to a dense continuous latent space, (2) applies oversampling by interpolation in the latent space, and (3) maps the synthetic samples back to the original feature space. While \\emph{SMOTE} supports discrete (categorical) features, almost all variants and extensions of \\emph{SMOTE} do not. Wrapping any one of these \\emph{SMOTE} variants with an autoencoder will enable it to support multi-modal datasets that include discrete features. We have empirically shown the effectiveness of the proposed approach on 35 publicly available datasets.", "keywords": "Data Augmentation;Binary Classification;Autoencoder;Tabular Data;Imbalanced Data", "primary_area": "", "supplementary_material": "", "author": "Sajad Darabi;Yotam Elor", "authorids": "~Sajad_Darabi1;yotam.elor@gmail.com", "gender": "M;", "homepage": "http://sajaddarabi.com;", "dblp": ";", "google_scholar": "prEGOhQAAAAJ;", "orcid": ";", "linkedin": "sdarabi;", "or_profile": "~Sajad_Darabi1;yotam.elor@gmail.com", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=8m_XkdqjZAr", "pdf_size": 0, "rating": "3;4;4", "confidence": "4;5;3", "wc_review": "541;969;311", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 607.0, 272.65118130436673 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17065224396972142502&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0 }, { "id": "8nXkyH2_s6", "title": "Neural networks behave as hash encoders: An empirical study", "track": "main", "status": "Reject", "tldr": "", "abstract": "The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions, each corresponding to a specific activation pattern of the included ReLU-like activations. We demonstrate that this partition exhibits the following encoding properties across a variety of deep learning models: (1) {\\it determinism}: almost every linear region contains at most one training example. We can therefore represent almost every training example by a unique activation pattern, which is parameterized by a {\\it neural code}; and (2) {\\it categorization}: according to the neural code, simple algorithms, such as $K$-Means, $K$-NN, and logistic regression, can achieve fairly good performance on both training and test data. These encoding properties surprisingly suggest that {\\it normal neural networks well-trained for classification behave as hash encoders without any extra efforts.} In addition, the encoding properties exhibit variability in different scenarios. {Further experiments demonstrate that {\\it model size}, {\\it training time}, {\\it training sample size}, {\\it regularization}, and {\\it label noise} contribute in shaping the encoding properties, while the impacts of the first three are dominant.} We then define an {\\it activation hash phase chart} to represent the space expanded by {model size}, training time, training sample size, and the encoding properties, which is divided into three canonical regions: {\\it under-expressive regime}, {\\it critically-expressive regime}, and {\\it sufficiently-expressive regime}.", "keywords": "Explainability of deep learning", "primary_area": "", "supplementary_material": "/attachment/bd9d2d5013dfee7cf06fdfc96e5e7a6178f321ff.zip", "author": "Fengxiang He;Shiye Lei;Jianmin Ji;Dacheng Tao", "authorids": "~Fengxiang_He1;leishiye@gmail.com;jianmin@ustc.edu.cn;~Dacheng_Tao1", "gender": ";;;", "homepage": "https://fengxianghe.github.io/;;;", "dblp": "225/4682;;;", "google_scholar": "QSx-Yu0AAAAJ;;;", "orcid": ";;;", "linkedin": "fengxiang-he-35b173122;;;", "or_profile": "~Fengxiang_He1;leishiye@gmail.com;jianmin@ustc.edu.cn;~Dacheng_Tao1", "aff": "University of Sydney;;;", "aff_domain": "sydney.edu.au;;;", "position": "PhD student;;;", "bibtex": "@misc{\nhe2021neural,\ntitle={Neural networks behave as hash encoders: An empirical study},\nauthor={Fengxiang He and Shiye Lei and Jianmin Ji and Dacheng Tao},\nyear={2021},\nurl={https://openreview.net/forum?id=8nXkyH2_s6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=8nXkyH2_s6", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;2;4;4", "wc_review": "343;514;849;2255", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "772;789;2089;2213", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;3;3", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 990.25, 752.5441432235056 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1465.75, 686.6772804600427 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15476738600544841597&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "University of Sydney", "aff_unique_dep": "", "aff_unique_url": "https://www.sydney.edu.au", "aff_unique_abbr": "USYD", "aff_country_unique_index": "0", "aff_country_unique": "Australia" }, { "title": "Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2799", "id": "8nl0k08uMi", "poster": "", "openreview": "https://openreview.net/forum?id=8nl0k08uMi", "slides": "https://iclr.cc/virtual/2021/poster/2799", "video": "https://iclr.cc/virtual/2021/poster/2799", "author_site": "Matthew Leavitt, Ari Morcos", "tldr": "", "abstract": "The properties of individual neurons are often analyzed in order to understand the biological and artificial neural networks in which they're embedded. Class selectivity\u2014typically defined as how different a neuron's responses are across different classes of stimuli or data samples\u2014is commonly used for this purpose. However, it remains an open question whether it is necessary and/or sufficient for deep neural networks (DNNs) to learn class selectivity in individual units. We investigated the causal impact of class selectivity on network function by directly regularizing for or against class selectivity. Using this regularizer to reduce class selectivity across units in convolutional neural networks increased test accuracy by over 2% in ResNet18 and 1% in ResNet50 trained on Tiny ImageNet. For ResNet20 trained on CIFAR10 we could reduce class selectivity by a factor of 2.5 with no impact on test accuracy, and reduce it nearly to zero with only a small (~2%) drop in test accuracy. In contrast, regularizing to increase class selectivity significantly decreased test accuracy across all models and datasets. These results indicate that class selectivity in individual units is neither sufficient nor strictly necessary, and can even impair DNN performance. They also encourage caution when focusing on the properties of single units as representative of the mechanisms by which DNNs function.", "keywords": "interpretability;explainability;empirical analysis;deep learning;selectivity", "primary_area": "", "supplementary_material": "", "author": "Matthew L Leavitt;Ari S. Morcos", "authorids": "~Matthew_L_Leavitt1;~Ari_S._Morcos1", "gender": "M;M", "homepage": "https://mleavitt.net;http://www.arimorcos.com", "dblp": "260/0952;217/3720", "google_scholar": ";v-A_7UsAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Matthew_L_Leavitt1;~Ari_Morcos1", "aff": "Meta Facebook;Meta AI (FAIR)", "aff_domain": "fb.com;meta.com", "position": "AI Resident;Research Scientist", "bibtex": "@inproceedings{\nleavitt2021selectivity,\ntitle={Selectivity considered harmful: evaluating the causal impact of class selectivity in {\\{}DNN{\\}}s},\nauthor={Matthew L Leavitt and Ari S. Morcos},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8nl0k08uMi}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7", "confidence": "3;4;4", "wc_review": "412;513;477", "wc_reply_reviewers": "0;0;36", "wc_reply_authors": "662;1076;1106", "reply_reviewers": "0;0;1", "reply_authors": "1;2;2", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 467.3333333333333, 41.79579989531112 ], "wc_reply_reviewers_avg": [ 12.0, 16.97056274847714 ], "wc_reply_authors_avg": [ 948.0, 202.6030601940652 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 58, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10956841151792828564&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=8nl0k08uMi", "email": "fb.com;meta.com", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta Platforms, Inc.", "aff_unique_url": "https://meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "8pz6GXZ3YT", "title": "Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Sparse Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "The $\\textit{lottery ticket hypothesis}$ (LTH) states that learning on a properly pruned network (the $\\textit{winning ticket}$) has improved test accuracy over the originally unpruned network. Although LTH has been justified empirically in a broad range of deep neural network (DNN) involved applications like computer vision and natural language processing, the theoretical validation of the improved generalization of a winning ticket remains elusive. To the best of our knowledge, our work, for the first time, characterizes the performance of training a sparse neural network by analyzing the geometric structure of the objective function and the sample complexity to achieve zero generalization error. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned, indicating the structural importance of a winning ticket. Moreover, as the algorithm for training a sparse neural network is specified as (accelerated) stochastic gradient descent algorithm, we theoretically show that the number of samples required for achieving zero generalization error is proportional to the number of the non-pruned model weights in the hidden layer. With a fixed number of samples, training a pruned neural network enjoys a faster convergence rate to the desirable model than training the original unpruned one, providing a formal justification of the improved generalization of the winning ticket. Our theoretical results are acquired from learning a sparse neural network of one hidden layer, while experimental results are further provided to justify the implications in pruning multi-layer neural networks.", "keywords": "Sparse neural network;Lottery Ticket Hypothesis;network pruning;generalization analysis;optimization landscape;sample complexity", "primary_area": "", "supplementary_material": "/attachment/9ef35caff8162d2bc16da66cc78f63c1b06c7a4b.zip", "author": "Shuai Zhang;Meng Wang;Sijia Liu;Pin-Yu Chen;Jinjun Xiong", "authorids": "~Shuai_Zhang6;~Meng_Wang4;~Sijia_Liu1;~Pin-Yu_Chen1;~Jinjun_Xiong1", "gender": "M;F;M;M;", "homepage": "https://inchs708.github.io/shuaizhang.github.io/index.html;https://www.ecse.rpi.edu/~wang/index.html;https://lsjxjtu.github.io/;http://www.pinyuchen.com;https://www.xlab-ub.com", "dblp": "71/208-15;93/6765-3;128/6972-1;39/8969;81/1130", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;;C7dO_UgAAAAJ;jxwlCUUAAAAJ;tRt1xPYAAAAJ", "orcid": "0000-0001-8280-6988;;;0000-0003-1039-8369;0000-0002-2620-4859", "linkedin": ";;;pin-yu-chen-940062a2;jinjun-xiong-314774/", "or_profile": "~Shuai_Zhang6;~Meng_Wang4;~Sijia_Liu1;~Pin-Yu_Chen1;~Jinjun_Xiong1", "aff": "Rensselaer Polytechnic Institute;Rensselaer Polytechnic Institute;Michigan State University;International Business Machines;International Business Machines", "aff_domain": "rpi.edu;rpi.edu;msu.edu;ibm.com;ibm.com", "position": "PhD student;Associate Professor;Assistant Professor;Research Staff Member;Researcher", "bibtex": "@misc{\nzhang2021why,\ntitle={Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Sparse Neural Networks},\nauthor={Shuai Zhang and Meng Wang and Sijia Liu and Pin-Yu Chen and Jinjun Xiong},\nyear={2021},\nurl={https://openreview.net/forum?id=8pz6GXZ3YT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=8pz6GXZ3YT", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "2;4;3;4", "wc_review": "436;421;437;373", "wc_reply_reviewers": "0;0;136;175", "wc_reply_authors": "745;1010;1749;1239", "reply_reviewers": "0;0;1;1", "reply_authors": "1;2;3;3", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 416.75, 26.042033330752037 ], "wc_reply_reviewers_avg": [ 77.75, 78.96320345578692 ], "wc_reply_authors_avg": [ 1185.75, 369.20006432827176 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.4545454545454545, "gs_citation": 39, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16945853212555763401&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 14, "aff_unique_index": "0;0;1;2;2", "aff_unique_norm": "Rensselaer Polytechnic Institute;Michigan State University;International Business Machines Corporation", "aff_unique_dep": ";;", "aff_unique_url": "https://www.rpi.edu;https://www.msu.edu;https://www.ibm.com", "aff_unique_abbr": "RPI;MSU;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2897", "id": "8qDwejCuCN", "poster": "", "openreview": "https://openreview.net/forum?id=8qDwejCuCN", "slides": "https://iclr.cc/virtual/2021/poster/2897", "video": "https://iclr.cc/virtual/2021/poster/2897", "author_site": "Sana Tonekaboni, Danny Eytan, Anna Goldenberg", "tldr": "", "abstract": "Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning robust and generalizable representations for time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/a90619176ecef8837a869e76bf17465c5a84a5ea.zip", "author": "Sana Tonekaboni;Danny Eytan;Anna Goldenberg", "authorids": "~Sana_Tonekaboni1;biliary.colic@gmail.com;~Anna_Goldenberg1", "gender": ";;F", "homepage": ";;http://goldenberglab.ca/", "dblp": ";;06/3543", "google_scholar": ";;https://scholar.google.com.tw/citations?user=cEepZOEAAAAJ", "orcid": ";;0000-0002-2416-833X", "linkedin": ";;", "or_profile": "~Sana_Tonekaboni1;biliary.colic@gmail.com;~Anna_Goldenberg1", "aff": ";;University of Toronto", "aff_domain": ";;utoronto.ca", "position": ";;Associate Professor", "bibtex": "@inproceedings{\ntonekaboni2021unsupervised,\ntitle={Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding},\nauthor={Sana Tonekaboni and Danny Eytan and Anna Goldenberg},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8qDwejCuCN}\n}", "github": "[![github](/images/github_icon.svg) sanatonek/TNC_representation_learning](https://github.com/sanatonek/TNC_representation_learning) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=8qDwejCuCN)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "4;4;4;3", "wc_review": "758;555;314;401", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "869;764;272;472", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 507.0, 168.66386690693415 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 594.25, 236.16347621933414 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 403, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12306257235943010010&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=8qDwejCuCN", "email": ";;utoronto.ca", "author_num": 3, "aff_unique_index": "0", "aff_unique_norm": "University of Toronto", "aff_unique_dep": "", "aff_unique_url": "https://www.utoronto.ca", "aff_unique_abbr": "U of T", "aff_country_unique_index": "0", "aff_country_unique": "Canada" }, { "id": "8q_ca26L1fz", "title": "Revisiting Graph Neural Networks for Link Prediction", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph neural networks (GNNs) have achieved great success in recent years. Three most common applications include node classification, link prediction, and graph classification. While there is rich literature on node classification and graph classification, GNNs for link prediction is relatively less studied and less understood. Two representative classes of methods exist: GAE and SEAL. GAE (Graph Autoencoder) first uses a GNN to learn node embeddings for all nodes, and then aggregates the embeddings of the source and target nodes as their link representation. SEAL extracts a subgraph around the source and target nodes, labels the nodes in the subgraph, and then uses a GNN to learn a link representation from the labeled subgraph. In this paper, we thoroughly discuss the differences between these two classes of methods, and conclude that simply aggregating \\textit{node} embeddings does not lead to effective \\textit{link} representations, while learning from \\textit{properly labeled subgraphs} around links provides highly expressive and generalizable link representations. Experiments on the recent large-scale OGB link prediction datasets show that SEAL has up to 195\\% performance gains over GAE methods, achieving new state-of-the-art results on 3 out of 4 datasets.", "keywords": "Graph Neural Networks;Link Prediction", "primary_area": "", "supplementary_material": "", "author": "Muhan Zhang;Pan Li;Yinglong Xia;Kai Wang;Long Jin", "authorids": "~Muhan_Zhang1;~Pan_Li2;yxia@fb.com;wangkai@fb.com;longjin@fb.com", "gender": "M;;;;", "homepage": "https://muhanzhang.github.io/;;;;", "dblp": "157/5518;https://dblp.org/pers/hd/l/Li_0005:Pan;;;", "google_scholar": "https://scholar.google.com.hk/citations?user=OBBqkosAAAAJ;IroP0EwAAAAJ;;;", "orcid": "0000-0002-7680-6401;;;;", "linkedin": "jerry-muhan-zhang-a33a1777/;pan-li-b951105a/;;;", "or_profile": "~Muhan_Zhang1;~Pan_Li2;yxia@fb.com;wangkai@fb.com;longjin@fb.com", "aff": "Meta Facebook;Purdue University;;;", "aff_domain": "fb.com;purdue.edu;;;", "position": "Research Scientist;Assistant Professor;;;", "bibtex": "@misc{\nzhang2021revisiting,\ntitle={Revisiting Graph Neural Networks for Link Prediction},\nauthor={Muhan Zhang and Pan Li and Yinglong Xia and Kai Wang and Long Jin},\nyear={2021},\nurl={https://openreview.net/forum?id=8q_ca26L1fz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=8q_ca26L1fz", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "4;5;3;4", "wc_review": "330;171;672;554", "wc_reply_reviewers": "719;0;131;0", "wc_reply_authors": "1586;307;802;785", "reply_reviewers": "2;0;1;0", "reply_authors": "3;1;1;1", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 431.75, 194.2992215630315 ], "wc_reply_reviewers_avg": [ 212.5, 297.2780684813463 ], "wc_reply_authors_avg": [ 870.0, 458.65945973020115 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.42640143271122083, "gs_citation": 84, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15061800415731504402&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Meta;Purdue University", "aff_unique_dep": "Meta Platforms, Inc.;", "aff_unique_url": "https://meta.com;https://www.purdue.edu", "aff_unique_abbr": "Meta;Purdue", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "8qsqXlyn-Lp", "title": "Factoring out Prior Knowledge from Low-Dimensional Embeddings", "track": "main", "status": "Reject", "tldr": "", "abstract": "Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing high-dimensional data and therewith facilitate the discovery of interesting structure. Although they are widely used, they visualize data as is, rather than in light of the background knowledge we have about the data. What we already know, however, strongly determines what is novel and hence interesting. In this paper we propose two methods for factoring out prior knowledge in the form of distance matrices from low-dimensional embeddings. To factor out prior knowledge from tSNE embeddings, we propose JEDI that adapts the tSNE objective in a principled way using Jensen-Shannon divergence. To factor out prior knowledge from any downstream embedding approach, we propose CONFETTI , in which we directly operate on the input distance matrices. Extensive experiments on both synthetic and real world data show that both methods work well, providing embeddings that exhibit meaningful structure that would otherwise remain hidden.", "keywords": "embedding;visualization;prior;tsne;umap", "primary_area": "", "supplementary_material": "", "author": "Edith Heiter;Jonas Fischer;Jilles Vreeken", "authorids": "eheiter@mmci.uni-saarland.de;~Jonas_Fischer1;jv@cispa.de", "gender": ";;", "homepage": ";;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": ";;", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "@misc{\nheiter2021factoring,\ntitle={Factoring out Prior Knowledge from Low-Dimensional Embeddings},\nauthor={Edith Heiter and Jonas Fischer and Jilles Vreeken},\nyear={2021},\nurl={https://openreview.net/forum?id=8qsqXlyn-Lp}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=8qsqXlyn-Lp", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "3;4;4;4", "wc_review": "468;733;180;351", "wc_reply_reviewers": "0;146;0;16", "wc_reply_authors": "707;639;341;447", "reply_reviewers": "0;1;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 433.0, 201.22002882417047 ], "wc_reply_reviewers_avg": [ 40.5, 61.259693110560065 ], "wc_reply_authors_avg": [ 533.5, 146.4334319750787 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10133103998724373958&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5 }, { "id": "8uv1YXVi80", "title": "Dynamic Probabilistic Pruning: Training sparse networks based on stochastic and dynamic masking", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Deep Learning (DL) models are known to be heavily over-parametrized, resulting in a large memory footprint and power consumption. This hampers the use of such models in hardware-constrained edge technologies such as wearables and mobile devices. Model compression during training can be achieved by promoting sparse network structures both through weight regularization and by leveraging dynamic pruning methods. State-of-the-art pruning methods are however mostly magnitude-based which impedes their use in e.g. binary settings. Importantly, most of the pruning methods do not provide a structural sparsity, resulting in an inefficient memory allocation and access for hardware implementations. In this paper, we propose a novel dynamic pruning solution that we term Dynamic Probabilistic Pruning (DPP). DPP leverages Gumbel top-K sampling to select subsets of weights during training, which enables exploring which weights are most relevant. Our approach allows for setting an explicit per-neuron layer-wise sparsity level and structural pruning across weights and feature maps, without relying on weight magnitude heuristics. Relevantly, our method generates a hardware-oriented structural sparsity for fully-connected and convolutional layers that facilitates memory allocation and access, in contrast with conventional unstructured pruning.\nWe show that DPP achieves competitive sparsity levels and classification accuracy on MNIST and CIFAR-10, CIFAR-100 datasets compared to a state-of-the-art baseline for various DL architectures, while respecting per-neuron sparsity constraints.", "keywords": "deep probabilistic subsampling;sparse deep learning;structured pruning;hardware-oriented pruning", "primary_area": "", "supplementary_material": "", "author": "Lizeth Gonzalez Carabarin;Iris A.M. Huijben;Bastiaan S. Veeling;Alexandre Schmid;Ruud Van Sloun", "authorids": "~Lizeth_Gonzalez_Carabarin1;~Iris_A.M._Huijben1;~Bastiaan_S._Veeling1;alexandre.schmid@epfl.ch;~Ruud_Van_Sloun1", "gender": ";;;;F", "homepage": ";;;;https://www.tue.nl/en/research/researchers/ruud-van-sloun", "dblp": ";;;;162/9715.html", "google_scholar": ";;;;gQQJgocAAAAJ", "orcid": ";;;;", "linkedin": "lizethgonzalezcarabarin/;;;;", "or_profile": "~Lizeth_Gonzalez_Carabarin1;~Iris_A.M._Huijben1;~Bastiaan_S._Veeling1;alexandre.schmid@epfl.ch;~Ruud_Van_Sloun1", "aff": "Eindhoven University of Technology;;;;Eindhoven University of Technology", "aff_domain": "tue.nl;;;;tue.nl", "position": "Postdoc;;;;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=8uv1YXVi80", "pdf_size": 0, "rating": "2;4;5;5", "confidence": "5;4;3;4", "wc_review": "1650;302;393;804", "wc_reply_reviewers": "56;0;0;0", "wc_reply_authors": "1093;571;319;522", "reply_reviewers": "1;0;0;0", "reply_authors": "3;2;2;2", "rating_avg": [ 4.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 787.25, 532.8036106296578 ], "wc_reply_reviewers_avg": [ 14.0, 24.24871130596428 ], "wc_reply_authors_avg": [ 626.25, 285.56030448926197 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8660254037844386, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:FOJXABquo08J:scholar.google.com/&scioq=Dynamic+Probabilistic+Pruning:+Training+sparse+networks+based+on+stochastic+and+dynamic+masking&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Eindhoven University of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.tue.nl", "aff_unique_abbr": "TU/e", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Netherlands" }, { "id": "8wa7HrUsElL", "title": "D3C: Reducing the Price of Anarchy in Multi-Agent Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Even in simple multi-agent systems, fixed incentives can lead to outcomes that are poor for the group and each individual agent. We propose a method, D3C, for online adjustment of agent incentives that reduces the loss incurred at a Nash equilibrium. Agents adjust their incentives by learning to mix their incentive with that of other agents, until a compromise is reached in a distributed fashion. We show that D3C improves outcomes for each agent and the group as a whole in several social dilemmas including a traffic network with Braess\u2019s paradox, a prisoner\u2019s dilemma, and several reinforcement learning domains.", "keywords": "multiagent;social dilemma;reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/40924c19c5afc5cb9687f36468897af613c11944.zip", "author": "Ian Gemp;Kevin McKee;Richard Everett;Edgar Alfredo Duenez-Guzman;Yoram Bachrach;David Balduzzi;Andrea Tacchetti", "authorids": "~Ian_Gemp1;kevinrmckee@google.com;~Richard_Everett1;~Edgar_Alfredo_Duenez-Guzman1;~Yoram_Bachrach2;~David_Balduzzi1;~Andrea_Tacchetti1", "gender": "M;;;M;;M;M", "homepage": "https://imgemp.github.io/;;;http://duenez.evolicious.org;;https://sites.google.com/site/dbalduzzi/;http://web.mit.edu/~atacchet/www/", "dblp": "66/10996;;215/4855;57/8801;;http://dblp.uni-trier.de/pers/hc/b/Balduzzi:David;127/6624", "google_scholar": "5vo3MeEAAAAJ;;;Hl5wdRMAAAAJ;;xA3Jd5gAAAAJ;https://scholar.google.co.uk/citations?user=HKybSogAAAAJ", "orcid": ";;;0000-0002-6212-9104;;;0000-0001-9311-9171", "linkedin": ";;;duenez/;;;andreatacchetti/", "or_profile": "~Ian_Gemp1;kevinrmckee@google.com;~Richard_Everett1;~Edgar_Alfredo_Duenez-Guzman1;~Yoram_Bachrach2;~David_Balduzzi1;~Andrea_Tacchetti1", "aff": "Google DeepMind;;Google DeepMind;Google DeepMind;;XTX Markets;Google DeepMind", "aff_domain": "google.com;;google.com;deepmind.com;;xtxmarkets.com;google.com", "position": "Research Scientist;;Research Scientist;Researcher;;Researcher;Research Scientist", "bibtex": "@misc{\ngemp2021dc,\ntitle={D3C: Reducing the Price of Anarchy in Multi-Agent Learning},\nauthor={Ian Gemp and Kevin McKee and Richard Everett and Edgar Alfredo Duenez-Guzman and Yoram Bachrach and David Balduzzi and Andrea Tacchetti},\nyear={2021},\nurl={https://openreview.net/forum?id=8wa7HrUsElL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=8wa7HrUsElL", "pdf_size": 0, "rating": "3;6;6;7", "confidence": "3;2;3;3", "wc_review": "272;335;271;442", "wc_reply_reviewers": "0;91;0;0", "wc_reply_authors": "768;1482;427;468", "reply_reviewers": "0;1;0;0", "reply_authors": "1;3;1;1", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 330.0, 69.6670653896086 ], "wc_reply_reviewers_avg": [ 22.75, 39.40415587219196 ], "wc_reply_authors_avg": [ 786.25, 422.71289015122306 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.19245008972987526, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13137073377160548209&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;0;0;1;0", "aff_unique_norm": "Google;XTX Markets", "aff_unique_dep": "Google DeepMind;", "aff_unique_url": "https://deepmind.com;", "aff_unique_abbr": "DeepMind;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United Kingdom;" }, { "title": "Zero-shot Synthesis with Group-Supervised Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3059", "id": "8wqCDnBmnrT", "poster": "", "openreview": "https://openreview.net/forum?id=8wqCDnBmnrT", "slides": "https://iclr.cc/virtual/2021/poster/3059", "video": "https://iclr.cc/virtual/2021/poster/3059", "author_site": "Yunhao Ge, Sami Abu-El-Haija, Gan Xin, Laurent Itti", "tldr": "", "abstract": "Visual cognition of primates is superior to that of artificial neural networks in its ability to \u201cenvision\u201d a visual object, even a newly-introduced one, in different attributes including pose, position, color, texture, etc. To aid neural networks to envision objects with different attributes, we propose a family of objective functions, expressed on groups of examples, as a novel learning framework that we term Group-Supervised Learning (GSL). GSL allows us to decompose inputs into a disentangled representation with swappable components, that can be recombined to synthesize new samples. For instance, images of red boats & blue cars can be decomposed and recombined to synthesize novel images of red cars. We propose an implementation based on auto-encoder, termed group-supervised zero-shot synthesis network (GZS-Net) trained with our learning framework, that can produce a high-quality red car even if no such example is witnessed during training. We test our model and learning framework on existing benchmarks, in addition to a new dataset that we open-source. We qualitatively and quantitatively demonstrate that GZS-Net trained with GSL outperforms state-of-the-art methods", "keywords": "Disentangled representation learning;Group-supervised learning;Zero-shot synthesis;Knowledge factorization", "primary_area": "", "supplementary_material": "/attachment/a3622b4545bd79a31a81da80dc2154fc19805298.zip", "author": "Yunhao Ge;Sami Abu-El-Haija;Gan Xin;Laurent Itti", "authorids": "~Yunhao_Ge1;~Sami_Abu-El-Haija1;~Gan_Xin1;~Laurent_Itti1", "gender": "M;M;M;M", "homepage": "https://gyhandy.github.io/;http://www.haija.org;;http://ilab.usc.edu", "dblp": "204/1908;127/6620;;31/3256", "google_scholar": "https://scholar.google.ca/citations?user=QhjGr4oAAAAJ;t80qlTcAAAAJ;;xhUvqK8AAAAJ", "orcid": ";;;0000-0002-0168-2977", "linkedin": "yunhao-ge-720727135/;samihaija/;ganxin/;", "or_profile": "~Yunhao_Ge1;~Sami_Abu-El-Haija1;~Gan_Xin1;~Laurent_Itti1", "aff": "University of Southern California;University of Southern California;University of Southern California;University of Southern California", "aff_domain": "usc.edu;usc.edu;usc.edu;usc.edu", "position": "PhD student;PhD student;MS student;Professor", "bibtex": "@inproceedings{\nge2021zeroshot,\ntitle={Zero-shot Synthesis with Group-Supervised Learning},\nauthor={Yunhao Ge and Sami Abu-El-Haija and Gan Xin and Laurent Itti},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8wqCDnBmnrT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "4;3;4;3", "wc_review": "150;176;602;160", "wc_reply_reviewers": "23;24;131;21", "wc_reply_authors": "436;205;148;223", "reply_reviewers": "1;1;1;1", "reply_authors": "2;1;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 272.0, 190.7511467855436 ], "wc_reply_reviewers_avg": [ 49.75, 46.92214296044033 ], "wc_reply_authors_avg": [ 253.0, 109.22225048038517 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 56, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17286236408692157227&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=8wqCDnBmnrT", "email": "usc.edu;usc.edu;usc.edu;usc.edu", "author_num": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "University of Southern California", "aff_unique_dep": "", "aff_unique_url": "https://www.usc.edu", "aff_unique_abbr": "USC", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Adaptive Procedural Task Generation for Hard-Exploration Problems", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3094", "id": "8xLkv08d70T", "poster": "", "openreview": "https://openreview.net/forum?id=8xLkv08d70T", "slides": "https://iclr.cc/virtual/2021/poster/3094", "video": "https://iclr.cc/virtual/2021/poster/3094", "author_site": "Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei", "tldr": "", "abstract": "We introduce Adaptive Procedural Task Generation (APT-Gen), an approach to progressively generate a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks from a parameterized task space via a black-box procedural generation module. To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks. Through adversarial training, the task similarity is adaptively estimated by a task discriminator defined on the agent's experiences, allowing the generated tasks to approximate target tasks of unknown parameterization or outside of the predefined task space. Our experiments on the grid world and robotic manipulation task domains show that APT-Gen achieves substantially better performance than various existing baselines by generating suitable tasks of rich variations.", "keywords": "reinforcement learning;curriculum learning;procedural generation;task generation", "primary_area": "", "supplementary_material": "", "author": "Kuan Fang;Yuke Zhu;Silvio Savarese;L. Fei-Fei", "authorids": "~Kuan_Fang3;~Yuke_Zhu1;~Silvio_Savarese1;~L._Fei-Fei1", "gender": ";M;M;", "homepage": ";https://cs.utexas.edu/~yukez/;;", "dblp": ";133/1772;50/3578;", "google_scholar": ";mWGyYMsAAAAJ;ImpbxLsAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Kuan_Fang3;~Yuke_Zhu1;~Silvio_Savarese1;~L._Fei-Fei1", "aff": ";Computer Science Department, University of Texas, Austin;Stanford University;", "aff_domain": ";cs.utexas.edu;stanford.edu;", "position": ";Assistant Professor;Associate professor;", "bibtex": "@inproceedings{\nfang2021adaptive,\ntitle={Adaptive Procedural Task Generation for Hard-Exploration Problems},\nauthor={Kuan Fang and Yuke Zhu and Silvio Savarese and L. Fei-Fei},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8xLkv08d70T}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;4;4;3", "wc_review": "464;477;482;210", "wc_reply_reviewers": "112;63;295;18", "wc_reply_authors": "1503;557;1232;451", "reply_reviewers": "1;1;2;1", "reply_authors": "3;2;3;2", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 408.25, 114.64810290624088 ], "wc_reply_reviewers_avg": [ 122.0, 105.2687038012723 ], "wc_reply_authors_avg": [ 935.75, 443.8385827077227 ], "reply_reviewers_avg": [ 1.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.5, 0.5 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.6622661785325219, "gs_citation": 35, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5267572907625726884&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=8xLkv08d70T", "email": ";cs.utexas.edu;stanford.edu;", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "University of Texas at Austin;Stanford University", "aff_unique_dep": "Computer Science Department;", "aff_unique_url": "https://www.utexas.edu;https://www.stanford.edu", "aff_unique_abbr": "UT Austin;Stanford", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Austin;Stanford", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Continual learning in recurrent neural networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2851", "id": "8xeBUgD8u9", "poster": "", "openreview": "https://openreview.net/forum?id=8xeBUgD8u9", "slides": "https://iclr.cc/virtual/2021/poster/2851", "video": "https://iclr.cc/virtual/2021/poster/2851", "author_site": "Benjamin Ehret, Christian Henning, Maria Cervera, Alexander Meulemans, Johannes von Oswald, Benjamin F Grewe", "tldr": "", "abstract": "While a diverse collection of continual learning (CL) methods has been proposed to prevent catastrophic forgetting, a thorough investigation of their effectiveness for processing sequential data with recurrent neural networks (RNNs) is lacking. Here, we provide the first comprehensive evaluation of established CL methods on a variety of sequential data benchmarks. Specifically, we shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. In contrast to feedforward networks, RNNs iteratively reuse a shared set of weights and require working memory to process input samples. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements, which lead to an increased need for stability at the cost of decreased plasticity for learning subsequent tasks. We additionally provide theoretical arguments supporting this interpretation by studying linear RNNs. Our study shows that established CL methods can be successfully ported to the recurrent case, and that a recent regularization approach based on hypernetworks outperforms weight-importance methods, thus emerging as a promising candidate for CL in RNNs. Overall, we provide insights on the differences between CL in feedforward networks and RNNs, while guiding towards effective solutions to tackle CL on sequential data.", "keywords": "Recurrent Neural Networks;Continual Learning", "primary_area": "", "supplementary_material": "/attachment/70ab27c591f864455171a518f622293f084ee064.zip", "author": "Benjamin Ehret;Christian Henning;Maria Cervera;Alexander Meulemans;Johannes von Oswald;Benjamin F Grewe", "authorids": "behret@ethz.ch;~Christian_Henning1;~Maria_Cervera1;~Alexander_Meulemans1;~Johannes_von_Oswald2;~Benjamin_F_Grewe1", "gender": ";M;;M;Not Specified;M", "homepage": ";https://www.ini.uzh.ch/en/institute/people?uname=christian;;http://alexandermeulemans.com/;https://as.inf.ethz.ch/people/members/voswaldj/index.html;https://www.ini.uzh.ch/en/institute/people?uname=bgrewe", "dblp": ";;;267/9546;242/8029;", "google_scholar": ";u6QSFrsAAAAJ;;https://scholar.google.ch/citations?user=nnMccw4AAAAJ;https://scholar.google.ch/citations?user=jdnL-PgAAAAJ;https://scholar.google.de/citations?user=ZA-1rh8AAAAJ", "orcid": ";;;;;0000-0001-8560-2120", "linkedin": ";christian-henning/;;alexander-meulemans-72589b146/;johswald/?originalSubdomain=de;", "or_profile": "behret@ethz.ch;~Christian_Henning1;~Maria_Cervera1;~Alexander_Meulemans1;~Johannes_von_Oswald2;~Benjamin_F_Grewe1", "aff": ";Swiss Federal Institute of Technology;;Swiss Federal Institute of Technology;Swiss Federal Institute of Technology;ETHZ - ETH Zurich", "aff_domain": ";ethz.ch;;ethz.ch;ethz.ch;ethz.ch", "position": ";PhD student;;PhD student;PhD student;Assistant Professor", "bibtex": "@inproceedings{\nehret2021continual,\ntitle={Continual learning in recurrent neural networks},\nauthor={Benjamin Ehret and Christian Henning and Maria Cervera and Alexander Meulemans and Johannes von Oswald and Benjamin F Grewe},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8xeBUgD8u9}\n}", "github": "[![github](/images/github_icon.svg) mariacer/cl_in_rnns](https://github.com/mariacer/cl_in_rnns) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=8xeBUgD8u9)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7", "confidence": "3;4;4", "wc_review": "194;209;346", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "493;543;986", "reply_reviewers": "0;0;0", "reply_authors": "1;1;2", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 249.66666666666666, 68.39265717571993 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 674.0, 221.55962327704628 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.9999999999999997, "gs_citation": 48, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11490619605153761902&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 12, "pdf": "https://openreview.net/pdf?id=8xeBUgD8u9", "email": ";ethz.ch;;ethz.ch;ethz.ch;ethz.ch", "author_num": 6, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Swiss Federal Institute of Technology;ETH Zurich", "aff_unique_dep": ";", "aff_unique_url": "https://www.ethz.ch;https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich;ETHZ", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Switzerland" }, { "title": "How Does Mixup Help With Robustness and Generalization?", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2734", "id": "8yKEo06dKNo", "poster": "", "openreview": "https://openreview.net/forum?id=8yKEo06dKNo", "slides": "https://iclr.cc/virtual/2021/poster/2734", "video": "https://iclr.cc/virtual/2021/poster/2734", "author_site": "Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, James Zou", "tldr": "", "abstract": "Mixup is a popular data augmentation technique based on on convex combinations of pairs of examples and their labels. This simple technique has shown to substantially improve both the model's robustness as well as the generalization of the trained model. However, it is not well-understood why such improvement occurs. In this paper, we provide theoretical analysis to demonstrate how using Mixup in training helps model robustness and generalization. For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss. This explains why models obtained by Mixup training exhibits robustness to several kinds of adversarial attacks such as Fast Gradient Sign Method (FGSM). For generalization, we prove that Mixup augmentation corresponds to a specific type of data-adaptive regularization which reduces overfitting. Our analysis provides new insights and a framework to understand Mixup.\n", "keywords": "Mixup;adversarial robustness;generalization", "primary_area": "", "supplementary_material": "", "author": "Linjun Zhang;Zhun Deng;Kenji Kawaguchi;Amirata Ghorbani;James Zou", "authorids": "linjun.zhang@rutgers.edu;~Zhun_Deng1;~Kenji_Kawaguchi1;~Amirata_Ghorbani2;~James_Zou1", "gender": ";M;;M;", "homepage": ";https://www.zhundeng.org/;https://ml.comp.nus.edu.sg/#members;http://web.stanford.edu/~amiratag;", "dblp": ";204/4353;;https://dblp.org/pers/hd/g/Ghorbani:Amirata;", "google_scholar": ";nkmi-moAAAAJ;aLl3rYoAAAAJ;BtgIFycAAAAJ;23ZXZvEAAAAJ", "orcid": ";;;;", "linkedin": ";;;amirata-ghorbani-68438765;", "or_profile": "linjun.zhang@rutgers.edu;~Zhun_Deng1;~Kenji_Kawaguchi1;~Amirata_Ghorbani2;~James_Zou1", "aff": ";Harvard University;Harvard University;Stanford University;Stanford University", "aff_domain": ";harvard.edu;harvard.edu;stanford.edu;stanford.edu", "position": ";PhD student;Postdoctoral fellow;PhD student;Assistant Professor", "bibtex": "@inproceedings{\nzhang2021how,\ntitle={How Does Mixup Help With Robustness and Generalization?},\nauthor={Linjun Zhang and Zhun Deng and Kenji Kawaguchi and Amirata Ghorbani and James Zou},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=8yKEo06dKNo}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer5", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "3;4;4;3", "wc_review": "131;219;147;375", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "85;230;410;349", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 218.0, 96.51424765287247 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 268.5, 124.1541380703841 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 313, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6686389082608515508&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=8yKEo06dKNo", "email": ";harvard.edu;harvard.edu;stanford.edu;stanford.edu", "author_num": 5, "aff_unique_index": "0;0;1;1", "aff_unique_norm": "Harvard University;Stanford University", "aff_unique_dep": ";", "aff_unique_url": "https://www.harvard.edu;https://www.stanford.edu", "aff_unique_abbr": "Harvard;Stanford", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Stanford", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "8znruLfUZnT", "title": "Frequency Regularized Deep Convolutional Dictionary Learning and Application to Blind Denoising", "track": "main", "status": "Reject", "tldr": "", "abstract": "Sparse representation via a learned dictionary is a powerful prior for natural images. In recent years, unrolled sparse coding algorithms (e.g. LISTA) have proven to be useful for constructing interpretable deep-learning networks that perform on par with state-of-the-art models on image-restoration tasks. In this study we are concerned with extending the work of such convolutional dictionary learning (CDL) models. We propose to construct strided convolutional dictionaries with a single analytic low-pass filter and a set of learned filters regularized to occupy the complementary frequency space. By doing so, we address the necessary modeling assumptions of natural images with respect to convolutional sparse coding and reduce the mutual coherence and redundancy of the learned filters. We show improved denoising performance at reduced computational complexity when compared to other CDL methods, and competitive results when compared to popular deep-learning models. We further propose to parameterize the thresholds in the soft-thresholding operator of LISTA to be proportional to the estimated noise-variance from an input image. We demonstrate that this parameterization enhances robustness to noise-level mismatch between training and inference.", "keywords": "Dictionary learning;unrolled algorithm;convolutional sparse coding;interpretable deep learning;inverse problems;blind denoising;LISTA", "primary_area": "", "supplementary_material": "/attachment/c9abfcc682e44e4750153b77eb6968240bc5fb60.zip", "author": "Nikola Pavle Janjusevic;Amirhossein Khalilian-Gourtani;Yao Wang", "authorids": "~Nikola_Pavle_Janjusevic1;akg404@nyu.edu;~Yao_Wang2", "gender": "M;;F", "homepage": "https://nikopj.github.io;;https://engineering.nyu.edu/faculty/yao-wang", "dblp": ";;", "google_scholar": ";;https://scholar.google.com/citations?hl=en", "orcid": ";;0000-0003-3199-3802", "linkedin": "nikola-janju%C5%A1evi%C4%87-554204157/;;", "or_profile": "~Nikola_Pavle_Janjusevic1;akg404@nyu.edu;~Yao_Wang2", "aff": "New York University;;New York University Tandon School of Engineering", "aff_domain": "nyu.edu;;nyu.edu", "position": "PhD student;;Full Professor", "bibtex": "@misc{\njanjusevic2021frequency,\ntitle={Frequency Regularized Deep Convolutional Dictionary Learning and Application to Blind Denoising},\nauthor={Nikola Pavle Janjusevic and Amirhossein Khalilian-Gourtani and Yao Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=8znruLfUZnT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=8znruLfUZnT", "pdf_size": 0, "rating": "3;4;4", "confidence": "3;5;4", "wc_review": "378;281;1419", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "579;617;730", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 692.6666666666666, 515.1196193334342 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 642.0, 64.13007614736371 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:n1kxNrJuNwoJ:scholar.google.com/&scioq=Frequency+Regularized+Deep+Convolutional+Dictionary+Learning+and+Application+to+Blind+Denoising&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "New York University", "aff_unique_dep": "", "aff_unique_url": "https://www.nyu.edu", "aff_unique_abbr": "NYU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Tandon School of Engineering", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Learning a Latent Search Space for Routing Problems using Variational Autoencoders", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3009", "id": "90JprVrJBO", "poster": "", "openreview": "https://openreview.net/forum?id=90JprVrJBO", "slides": "https://iclr.cc/virtual/2021/poster/3009", "video": "https://iclr.cc/virtual/2021/poster/3009", "author_site": "Andr\u00e9 Hottung, Bhanu Bhandari, Kevin Tierney", "tldr": "", "abstract": "Methods for automatically learning to solve routing problems are rapidly improving in performance. While most of these methods excel at generating solutions quickly, they are unable to effectively utilize longer run times because they lack a sophisticated search component. We present a learning-based optimization approach that allows a guided search in the distribution of high-quality solutions for a problem instance. More precisely, our method uses a conditional variational autoencoder that learns to map points in a continuous (latent) search space to high-quality, instance-specific routing problem solutions. The learned space can then be searched by any unconstrained continuous optimization method. We show that even using a standard differential evolution search strategy our approach is able to outperform existing purely machine learning based approaches. ", "keywords": "heuristic search;variational autoencoders;learning to optimize;routing problems;traveling salesperson problem;vehicle routing problem;combinatorial optimization", "primary_area": "", "supplementary_material": "", "author": "Andr\u00e9 Hottung;Bhanu Bhandari;Kevin Tierney", "authorids": "~Andr\u00e9_Hottung1;bhanubhandar@cs.umass.edu;~Kevin_Tierney1", "gender": ";;M", "homepage": ";;http://www.tierney.de", "dblp": ";;13/7407", "google_scholar": "zzqATFsAAAAJ;;https://scholar.google.de/citations?user=G-EGfLEAAAAJ", "orcid": "0000-0002-7251-9093;;0000-0002-5931-4907", "linkedin": ";;kevinbtierney/", "or_profile": "~Andr\u00e9_Hottung1;bhanubhandar@cs.umass.edu;~Kevin_Tierney1", "aff": "Bielefeld University;;Bielefeld University", "aff_domain": "uni-bielefeld.de;;uni-bielefeld.de", "position": "PhD student;;Full Professor", "bibtex": "@inproceedings{\nhottung2021learning,\ntitle={Learning a Latent Search Space for Routing Problems using Variational Autoencoders},\nauthor={Andr{\\'e} Hottung and Bhanu Bhandari and Kevin Tierney},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=90JprVrJBO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;3;2;4", "wc_review": "521;369;662;326", "wc_reply_reviewers": "424;0;0;0", "wc_reply_authors": "1456;607;808;623", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 469.5, 132.66593383382187 ], "wc_reply_reviewers_avg": [ 106.0, 183.597385602301 ], "wc_reply_authors_avg": [ 873.5, 345.4594766394461 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4545454545454545, "gs_citation": 90, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3807353712834589580&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=90JprVrJBO", "email": "uni-bielefeld.de;;uni-bielefeld.de", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Bielefeld University", "aff_unique_dep": "", "aff_unique_url": "https://www.uni-bielefeld.de/", "aff_unique_abbr": "Uni Bielefeld", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "98fWAc-sFkv", "title": "A Unified Bayesian Framework for Discriminative and Generative Continual Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Continual Learning is a learning paradigm where learning systems are trained on a sequence of tasks. The goal here is to perform well on the current task without suffering from a performance drop on the previous tasks. Two notable directions among the recent advances in continual learning with neural networks are (1) variational Bayes based regularization by learning priors from previous tasks, and, (2) learning the structure of deep networks to adapt to new tasks. So far, these two approaches have been orthogonal. We present a novel Bayesian framework for continual learning based on learning the structure of deep neural networks, addressing the shortcomings of both these approaches. The proposed framework learns the deep structure for each task by learning which weights to be used, and supports inter-task transfer through the overlapping of different sparse subsets of weights learned by different tasks. An appealing aspect of our proposed continual learning framework is that it is applicable to both discriminative (supervised) and generative (unsupervised) settings. Experimental results on supervised and unsupervised benchmarks shows that our model performs comparably or better than recent advances in continual learning.", "keywords": "continual learning;bayesian learning", "primary_area": "", "supplementary_material": "/attachment/2e56a2d07c032b16f6a95d30699df086e4830ae0.zip", "author": "Abhishek Kumar;Sunabha Chatterjee;Piyush Rai", "authorids": "abhi.kumar.chaudhary@gmail.com;sunabhac@gmail.com;~Piyush_Rai1", "gender": ";;M", "homepage": ";;http://cse.iitk.ac.in/users/piyush/", "dblp": ";;02/525", "google_scholar": ";;https://scholar.google.com.tw/citations?user=D50grEgAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "abhi.kumar.chaudhary@gmail.com;sunabhac@gmail.com;~Piyush_Rai1", "aff": ";;IIT Kanpur, IIT Kanpur", "aff_domain": ";;cse.iitk.ac.in", "position": ";;Associate Professor", "bibtex": "@misc{\nkumar2021a,\ntitle={A Unified Bayesian Framework for Discriminative and Generative Continual Learning},\nauthor={Abhishek Kumar and Sunabha Chatterjee and Piyush Rai},\nyear={2021},\nurl={https://openreview.net/forum?id=98fWAc-sFkv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=98fWAc-sFkv", "pdf_size": 0, "rating": "4;6;7;8", "confidence": "4;3;4;5", "wc_review": "416;350;325;462", "wc_reply_reviewers": "207;87;19;153", "wc_reply_authors": "891;1346;146;833", "reply_reviewers": "1;1;1;2", "reply_authors": "2;2;1;2", "rating_avg": [ 6.25, 1.479019945774904 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 388.25, 54.02025083244246 ], "wc_reply_reviewers_avg": [ 116.5, 70.53190767305249 ], "wc_reply_authors_avg": [ 804.0, 428.7009447155441 ], "reply_reviewers_avg": [ 1.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.47809144373375745, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Q79yGTa5DsgJ:scholar.google.com/&scioq=A+Unified+Bayesian+Framework+for+Discriminative+and+Generative+Continual+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Indian Institute of Technology Kanpur", "aff_unique_dep": "", "aff_unique_url": "https://www.iitk.ac.in", "aff_unique_abbr": "IITK", "aff_campus_unique_index": "0", "aff_campus_unique": "Kanpur", "aff_country_unique_index": "0", "aff_country_unique": "India" }, { "id": "98ntbCuqf4i", "title": "MQES: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "The principle of optimism in the face of (aleatoric and epistemic) uncertainty has been utilized to design efficient exploration strategies for Reinforcement Learning (RL). Different from most prior work targeting at discrete action space, we propose a generally information-theoretic exploration principle called Max-Q Entropy Search (MQES) for continuous RL algorithms.\nMQES formulates the exploration policy to maximize the information about the globally optimal distribution of $Q$ function, which could explore optimistically and avoid over-exploration by recognizing the epistemic and aleatoric uncertainty, respectively. To make MQES practically tractable, we firstly incorporate distributional and ensemble $Q$ function approximations to MQES, which could formulate the epistemic and aleatoric uncertainty accordingly. Then, we introduce a constraint to stabilize the training and solve the constrained MQES problem to derive the exploration policy in closed form. Empirical evaluations show that MQES outperforms state-of-the-art algorithms on Mujoco environments.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Jinyi Liu;Zhi Wang;Jianye HAO;YAN ZHENG", "authorids": "~Jinyi_Liu1;~Zhi_Wang4;~Jianye_HAO1;~YAN_ZHENG1", "gender": ";;M;M", "homepage": ";;http://www.icdai.org/jianye.html;https://yanzzzzz.github.io", "dblp": "192/6688-2;;21/7664.html;10/2381-2", "google_scholar": "kaQS7NAAAAAJ;VoB6-2cAAAAJ;;https://scholar.google.com.hk/citations?user=tJuhd1kAAAAJ", "orcid": ";;0000-0002-0422-8235;", "linkedin": "\u91d1\u6bc5-\u5218-5b7447118;;;", "or_profile": "~Jinyi_Liu1;~Zhi_Wang4;~Jianye_HAO1;~YAN_ZHENG1", "aff": "Tianjin University;Huawei Technologies Ltd.;Tianjin University;Tianjin Unibersity, China", "aff_domain": "tju.edu.cn;huawei.com;tju.edu.cn;tju.edu.cn", "position": "MS student;Researcher;Associate Professor;Associate Professor", "bibtex": "@misc{\nliu2021mqes,\ntitle={{\\{}MQES{\\}}: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning},\nauthor={Jinyi Liu and Zhi Wang and Jianye HAO and YAN ZHENG},\nyear={2021},\nurl={https://openreview.net/forum?id=98ntbCuqf4i}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer5;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=98ntbCuqf4i", "pdf_size": 0, "rating": "3;4;4;5;6", "confidence": "4;3;3;4;3", "wc_review": "224;742;532;301;248", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "459;1031;834;619;444", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;2;2;1;1", "rating_avg": [ 4.4, 1.0198039027185568 ], "confidence_avg": [ 3.4, 0.4898979485566356 ], "wc_review_avg": [ 409.4, 198.91063319993728 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 677.4, 225.89431157069893 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.4, 0.4898979485566356 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.32025630761017426, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:XuYQQlefIXwJ:scholar.google.com/&scioq=MQES:+Max-Q+Entropy+Search+for+Efficient+Exploration+in+Continuous+Reinforcement+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Tianjin University;Huawei", "aff_unique_dep": ";Huawei Technologies", "aff_unique_url": "http://www.tju.edu.cn;https://www.huawei.com", "aff_unique_abbr": "TJU;Huawei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "99M-4QlinPr", "title": "Efficient Competitive Self-Play Policy Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement learning from self-play has recently reported many successes. Self-play, where the agents compete with themselves, is often used to generate training data for iterative policy improvement. In previous work, heuristic rules are designed to choose an opponent for the current learner. Typical rules include choosing the latest agent, the best agent, or a random historical agent. However, these rules may be inefficient in practice and sometimes do not guarantee convergence even in the simplest matrix games. This paper proposes a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games. We recognize the fact that the Nash equilibrium coincides with the saddle point of the stochastic payoff function, which motivates us to borrow ideas from classical saddle point optimization literature. Our method simultaneously trains several agents and intelligently takes each other as opponents based on a simple adversarial rule derived from a principled perturbation-based saddle optimization method. We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games under standard assumptions. Beyond the theory, we further show the empirical superiority of our method over baseline methods relying on the aforementioned opponent-selection heuristics in matrix games, grid-world soccer, Gomoku, and simulated robot sumo, with neural net policy function approximators.", "keywords": "self-play;policy optimization;two-player zero-sum game;multiagent", "primary_area": "", "supplementary_material": "/attachment/00dbe72abaed677675e23547c2d73eba50334181.zip", "author": "Yuanyi Zhong;Yuan Zhou;Jian Peng", "authorids": "~Yuanyi_Zhong1;~Yuan_Zhou1;~Jian_Peng1", "gender": ";M;M", "homepage": ";http://yuanz.web.illinois.edu;http://jianpeng.web.engr.illinois.edu/", "dblp": "194/2743;40/7018;29/4181-1", "google_scholar": "PtmjwooAAAAJ;https://scholar.google.com.tw/citations?user=aR34e1gAAAAJ;https://scholar.google.com.tw/citations?user=4wcAVXAAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Yuanyi_Zhong1;~Yuan_Zhou1;~Jian_Peng1", "aff": "University of Illinois Urbana Champaign;University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign", "aff_domain": "illinois.edu;illinois.edu;illinois.edu", "position": "PhD student;Assistant Professor;Assistant Professor", "bibtex": "@misc{\nzhong2021efficient,\ntitle={Efficient Competitive Self-Play Policy Optimization},\nauthor={Yuanyi Zhong and Yuan Zhou and Jian Peng},\nyear={2021},\nurl={https://openreview.net/forum?id=99M-4QlinPr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=99M-4QlinPr", "pdf_size": 0, "rating": "3;5;5;7", "confidence": "4;3;4;3", "wc_review": "458;359;534;210", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.4142135623730951 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 390.25, 121.16182360793353 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10644744871992727550&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Illinois Urbana-Champaign", "aff_unique_dep": "", "aff_unique_url": "https://illinois.edu", "aff_unique_abbr": "UIUC", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Urbana-Champaign", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "9CG8RW_p3Y", "title": "Fundamental Limits and Tradeoffs in Invariant Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many machine learning applications involve learning representations that achieve two competing goals: To maximize information or accuracy with respect to a target while simultaneously maximizing invariance or independence with respect to a subset of features. Typical examples include privacy-preserving learning, domain adaptation, and algorithmic fairness, just to name a few. In fact, all of the above problems admit a common minimax game-theoretic formulation, whose equilibrium represents a fundamental tradeoff between accuracy and invariance. In this paper, we provide an information-theoretic analysis of this general and important problem under both classification and regression settings. In both cases, we analyze the inherent tradeoffs between accuracy and invariance by providing a geometric characterization of the feasible region in the information plane, where we connect the geometric properties of this feasible region to the fundamental limitations of the tradeoff problem. In the regression setting, we also derive a tight lower bound on the Lagrangian objective that quantifies the tradeoff between accuracy and invariance. Our results shed new light on this fundamental problem by providing insights on the interplay between accuracy and invariance. These results deepen our understanding of this fundamental problem and may be useful in guiding the design of adversarial representation learning algorithms.\n", "keywords": "Representation learning", "primary_area": "", "supplementary_material": "", "author": "Han Zhao;Chen Dan;Bryon Aragam;Tommi S. Jaakkola;Geoff Gordon;Pradeep Kumar Ravikumar", "authorids": "~Han_Zhao1;~Chen_Dan1;~Bryon_Aragam1;~Tommi_S._Jaakkola1;~Geoff_Gordon2;~Pradeep_Kumar_Ravikumar1", "gender": "M;M;;;;M", "homepage": "https://hanzhaoml.github.io/;https://chendancmu.github.io/;http://bryonaragam.com/;;;http://www.cs.cmu.edu/~pradeepr/", "dblp": "03/3520-2;156/6710;140/7564;;;94/3594", "google_scholar": "x942ipYAAAAJ;hQQFfuwAAAAJ;u-W3_9QAAAAJ;;;https://scholar.google.com.tw/citations?user=Q4DTPw4AAAAJ", "orcid": "0000-0002-8579-1600;;;;;", "linkedin": ";;;;;", "or_profile": "~Han_Zhao1;~Chen_Dan1;~Bryon_Aragam1;~Tommi_S._Jaakkola1;~Geoff_Gordon2;~Pradeep_Kumar_Ravikumar1", "aff": "University of Illinois, Urbana Champaign;Carnegie Mellon University;Booth School of Business;;;School of Computer Science, Carnegie Mellon University", "aff_domain": "illinois.edu;cmu.edu;chicagobooth.edu;;;cs.cmu.edu", "position": "Assistant Professor;PhD student;Assistant Professor;;;Associate Professor", "bibtex": "@misc{\nzhao2021fundamental,\ntitle={Fundamental Limits and Tradeoffs in Invariant Representation Learning},\nauthor={Han Zhao and Chen Dan and Bryon Aragam and Tommi S. Jaakkola and Geoff Gordon and Pradeep Kumar Ravikumar},\nyear={2021},\nurl={https://openreview.net/forum?id=9CG8RW_p3Y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer5", "site": "https://openreview.net/forum?id=9CG8RW_p3Y", "pdf_size": 0, "rating": "5;5;5", "confidence": "1;3;2", "wc_review": "186;249;327", "wc_reply_reviewers": "0;0;75", "wc_reply_authors": "316;307;308", "reply_reviewers": "0;0;1", "reply_authors": "1;1;2", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 2.0, 0.816496580927726 ], "wc_review_avg": [ 254.0, 57.67148342118486 ], "wc_reply_reviewers_avg": [ 25.0, 35.35533905932738 ], "wc_reply_authors_avg": [ 310.3333333333333, 4.027681991198191 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 40, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8445640749516339359&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;2;1", "aff_unique_norm": "University of Illinois Urbana-Champaign;Carnegie Mellon University;University of Chicago Booth School of Business", "aff_unique_dep": ";;Booth School of Business", "aff_unique_url": "https://illinois.edu;https://www.cmu.edu;https://www.chicagobooth.edu", "aff_unique_abbr": "UIUC;CMU;Booth", "aff_campus_unique_index": "0;2;3", "aff_campus_unique": "Urbana-Champaign;;Chicago;Pittsburgh", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "9DQ0SdY4UIz", "title": "Effective Subspace Indexing via Interpolation on Stiefel and Grassmann manifolds", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a novel local Subspace Indexing Model with Interpolation (SIM-I) for low-dimensional embedding of image datasets. Our SIM-I is constructed via two steps: in the first step we build a piece-wise linear affinity-aware subspace model under a given partition of the dataset; in the second step we interpolate between several adjacent linear subspace models constructed previously using the `\"center of mass\" calculation on Stiefel and Grassmann manifolds. The resulting subspace indexing model built by SIM-I is a globally non-linear low-dimensional embedding of the original data set. Furthermore, the interpolation step produces a `\"smoothed\u201d version of the piece-wise linear embedding mapping constructed in the first step, and can be viewed as a regularization procedure. We provide experimental results validating the effectiveness of SIM-I, that improves PCA recovery for SIFT dataset and nearest-neighbor classification success rates for MNIST and CIFAR-10 datasets. ", "keywords": "subspace indexing;locality preserving projection;Stiefel and Grassmann manifolds", "primary_area": "", "supplementary_material": "/attachment/3e031ab58b001fbe9327cee1b68f0d21ddbca479.zip", "author": "Wenqing Hu;Tiefeng Jiang;Zhu Li", "authorids": "~Wenqing_Hu1;~Tiefeng_Jiang1;~Zhu_Li2", "gender": "Not Specified;;", "homepage": "https://huwenqing0606.github.io/;;", "dblp": "131/1965;;", "google_scholar": "Hgvn4eQAAAAJ;;", "orcid": "0000-0002-6116-9104;;", "linkedin": ";;", "or_profile": "~Wenqing_Hu1;~Tiefeng_Jiang1;~Zhu_Li2", "aff": "Missouri University of Science and Technology;University of Minnesota-Twin Cities;University of Missouri-Kansas City", "aff_domain": "mst.edu;;", "position": "Assistant Professor;;", "bibtex": "@misc{\nhu2021effective,\ntitle={Effective Subspace Indexing via Interpolation on Stiefel and Grassmann manifolds},\nauthor={Wenqing Hu and Tiefeng Jiang and Zhu Li},\nyear={2021},\nurl={https://openreview.net/forum?id=9DQ0SdY4UIz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=9DQ0SdY4UIz", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "2;3;5;4", "wc_review": "144;828;273;517", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 440.5, 260.75707085331356 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.6324555320336759, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:_U0t5ZMXmRwJ:scholar.google.com/&scioq=Effective+Subspace+Indexing+via+Interpolation+on+Stiefel+and+Grassmann+manifolds&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Missouri University of Science and Technology;University of Minnesota;University of Missouri-Kansas City", "aff_unique_dep": ";;", "aff_unique_url": "https://www.mst.edu;https://www.minnesota.edu;https://www.umkc.edu", "aff_unique_abbr": "Missouri S&T;UMN;UMKC", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Twin Cities;Kansas City", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "9D_Ovq4Mgho", "title": "Network-Agnostic Knowledge Transfer for Medical Image Segmentation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Conventional transfer learning leverages weights of pre-trained networks, but mandates the need for similar neural architectures. Alternatively, knowledge distillation can transfer knowledge between heterogeneous networks but often requires access to the original training data or additional generative networks. Knowledge transfer between networks can be improved by being agnostic to the choice of network architecture and reducing the dependence on original training data. We propose a knowledge transfer approach from a teacher to a student network wherein we train the student on an independent transferal dataset, whose annotations are generated by the teacher. Experiments were conducted on five state-of-the-art networks for semantic segmentation and seven datasets across three imaging modalities. We studied knowledge transfer from a single teacher, combination of knowledge transfer and fine-tuning, and knowledge transfer from multiple teachers. The student model with a single teacher achieved similar performance as the teacher; and the student model with multiple teachers achieved better performance than the teachers. The salient features of our algorithm include: 1) no need for original training data or generative networks, 2) knowledge transfer between different architectures, 3) ease of implementation for downstream tasks by using the downstream task dataset as the transferal dataset, 4) knowledge transfer of an ensemble of models, trained independently, into one student model. Extensive experiments demonstrate that the proposed algorithm is effective for knowledge transfer and easily tunable. ", "keywords": "Knowledge Transfer;Deep Learning;Medical Image Segmentation;Pseudo Annotation", "primary_area": "", "supplementary_material": "", "author": "Shuhang Wang;Eugene Cheah;Elham Yousef Kalafi;Mercy Asiedu;Alex Benjamin;Vivek Kumar Singh;Ge Zhang;Viksit Kumar;Anthony Edward Samir", "authorids": "~Shuhang_Wang1;echeah1@mgh.harvard.edu;ekalafi@mgh.harvard.edu;masiedu1@mgh.harvard.edu;abenjamin2@mgh.harvard.edu;vsingh11@mgh.harvard.edu;gzhang11@mgh.harvard.edu;vkumar14@mgh.harvard.edu;~Anthony_Edward_Samir1", "gender": "M;;;;;;;;M", "homepage": ";;;;;;;;http://scholar.harvard.edu/anthonysamir", "dblp": ";;;;;;;;", "google_scholar": "BsjQtDoAAAAJ;;;;;;;;", "orcid": ";;;;;;;;", "linkedin": ";;;;;;;;", "or_profile": "~Shuhang_Wang1;echeah1@mgh.harvard.edu;ekalafi@mgh.harvard.edu;masiedu1@mgh.harvard.edu;abenjamin2@mgh.harvard.edu;vsingh11@mgh.harvard.edu;gzhang11@mgh.harvard.edu;vkumar14@mgh.harvard.edu;~Anthony_Edward_Samir1", "aff": ";;;;;;;;Massachusetts General Hospital, Harvard University", "aff_domain": ";;;;;;;;mgh.harvard.edu", "position": ";;;;;;;;Assistant Professor", "bibtex": "@misc{\nwang2021networkagnostic,\ntitle={Network-Agnostic Knowledge Transfer for Medical Image Segmentation},\nauthor={Shuhang Wang and Eugene Cheah and Elham Yousef Kalafi and Mercy Asiedu and Alex Benjamin and Vivek Kumar Singh and Ge Zhang and Viksit Kumar and Anthony Edward Samir},\nyear={2021},\nurl={https://openreview.net/forum?id=9D_Ovq4Mgho}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=9D_Ovq4Mgho", "pdf_size": 0, "rating": "3;4;7", "confidence": "4;3;4", "wc_review": "149;866;156", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "697;2239;740", "reply_reviewers": "0;0;0", "reply_authors": "1;4;1", "rating_avg": [ 4.666666666666667, 1.699673171197595 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 390.3333333333333, 336.35926560086847 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1225.3333333333333, 716.9855104688115 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 1.4142135623730951 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.2773500981126145, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3813109494732843321&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Harvard University", "aff_unique_dep": "Massachusetts General Hospital", "aff_unique_url": "https://www.harvard.edu", "aff_unique_abbr": "Harvard", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2658", "id": "9EKHN1jOlA", "poster": "", "openreview": "https://openreview.net/forum?id=9EKHN1jOlA", "slides": "https://iclr.cc/virtual/2021/poster/2658", "video": "https://iclr.cc/virtual/2021/poster/2658", "author_site": "Cheng Wang, Carolin Lawrence, Mathias Niepert", "tldr": "", "abstract": "Uncertainty quantification is crucial for building reliable and trustable machine learning systems. We propose to estimate uncertainty in recurrent neural networks (RNNs) via stochastic discrete state transitions over recurrent timesteps. The uncertainty of the model can be quantified by running a prediction several times, each time sampling from the recurrent state transition distribution, leading to potentially different results if the model is uncertain. Alongside uncertainty quantification, our proposed method offers several advantages in different settings. The proposed method can (1) learn deterministic and probabilistic automata from data, (2) learn well-calibrated models on real-world classification tasks, (3) improve the performance of out-of-distribution detection, and (4) control the exploration-exploitation trade-off in reinforcement learning. An implementation is available.", "keywords": "uncertainty estimation;calibration;RNN", "primary_area": "", "supplementary_material": "/attachment/cb49f3ce8a99370ff897c4da68dfb00f5e7ad55a.zip", "author": "Cheng Wang;Carolin Lawrence;Mathias Niepert", "authorids": "~Cheng_Wang9;~Carolin_Lawrence1;~Mathias_Niepert1", "gender": ";M;M", "homepage": "https://carolinlawrence.github.io/;http://www.matlog.net;https://deepsemantic.github.io.", "dblp": "191/6056;n/MathiasNiepert;", "google_scholar": "9xtF8-MAAAAJ;https://scholar.google.de/citations?user=p5vLzq0AAAAJ;https://scholar.google.de/citations?user=L2CUcFsAAAAJ", "orcid": ";;", "linkedin": "carolin-lawrence/;;", "or_profile": "~Carolin_Lawrence1;~Mathias_Niepert1;~cheng_wang1", "aff": "NEC Laboratories Europe;NEC;Amazon", "aff_domain": "neclab.eu;neclab.eu;amazon.com", "position": "Researcher;Research Scientist;Applied Scientist", "bibtex": "@inproceedings{\nwang2021uncertainty,\ntitle={Uncertainty Estimation and Calibration with Finite-State Probabilistic {\\{}RNN{\\}}s},\nauthor={Cheng Wang and Carolin Lawrence and Mathias Niepert},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9EKHN1jOlA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;2;3;2", "wc_review": "378;386;352;281", "wc_reply_reviewers": "78;33;0;0", "wc_reply_authors": "560;220;198;115", "reply_reviewers": "1;1;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 2.75, 0.82915619758885 ], "wc_review_avg": [ 349.25, 41.360458169609295 ], "wc_reply_reviewers_avg": [ 27.75, 31.98730216820418 ], "wc_reply_authors_avg": [ 273.25, 170.12256611043696 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6154730047340298888&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=9EKHN1jOlA", "email": "neclab.eu;neclab.eu;amazon.com", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "NEC Laboratories Europe;NEC Corporation;Amazon", "aff_unique_dep": ";;Amazon.com, Inc.", "aff_unique_url": "https://www.nec-labs.eu;https://www.nec.com;https://www.amazon.com", "aff_unique_abbr": "NEC LE;NEC;Amazon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2", "aff_country_unique": "Unknown;Japan;United States" }, { "title": "Async-RED: A Provably Convergent Asynchronous Block Parallel Stochastic Method using Deep Denoising Priors", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3077", "id": "9EsrXMzlFQY", "poster": "", "openreview": "https://openreview.net/forum?id=9EsrXMzlFQY", "slides": "https://iclr.cc/virtual/2021/poster/3077", "video": "https://iclr.cc/virtual/2021/poster/3077", "author_site": "Yu Sun, Jiaming Liu, Yiran Sun, Brendt Wohlberg, Ulugbek Kamilov", "tldr": "", "abstract": "Regularization by denoising (RED) is a recently developed framework for solving inverse problems by integrating advanced denoisers as image priors. Recent work has shown its state-of-the-art performance when combined with pre-trained deep denoisers. However, current RED algorithms are inadequate for parallel processing on multicore systems. We address this issue by proposing a new{asynchronous RED (Async-RED) algorithm that enables asynchronous parallel processing of data, making it significantly faster than its serial counterparts for large-scale inverse problems. The computational complexity of Async-RED is further reduced by using a random subset of measurements at every iteration. We present a complete theoretical analysis of the algorithm by establishing its convergence under explicit assumptions on the data-fidelity and the denoiser. We validate Async-RED on image recovery using pre-trained deep denoisers as priors.", "keywords": "Regularization by denoising;Computational imaging;asynchronous parallel algorithm;Deep denoising priors", "primary_area": "", "supplementary_material": "/attachment/5c966ab57a6bc5503214725599856a145f5437f6.zip", "author": "Yu Sun;Jiaming Liu;Yiran Sun;Brendt Wohlberg;Ulugbek Kamilov", "authorids": "~Yu_Sun11;jiaming.liu@wustl.edu;yiran.s@wustl.edu;~Brendt_Wohlberg2;~Ulugbek_Kamilov1", "gender": "M;;;M;Not Specified", "homepage": ";;;http://brendt.wohlberg.net/;https://ukmlv.github.io", "dblp": "62/3689-22;;;45/5430;73/9223", "google_scholar": "https://scholar.google.com/citations?hl=en;;;https://scholar.google.com/citations?hl=en;https://scholar.google.com.tw/citations?user=3qYUSDwAAAAJ", "orcid": "0000-0001-7225-9677;;;0000-0002-4767-1843;0000-0001-6770-3278", "linkedin": ";;;;", "or_profile": "~Yu_Sun11;jiaming.liu@wustl.edu;yiran.s@wustl.edu;~Brendt_Wohlberg2;~Ulugbek_Kamilov1", "aff": "Washington University, St. Louis;;;Los Alamos National Laboratory;Washington University, St. Louis", "aff_domain": "wustl.edu;;;lanl.gov;wustl.edu", "position": "PhD student;;;Scientist;Assistant Professor", "bibtex": "@inproceedings{\nsun2021asyncred,\ntitle={Async-{\\{}RED{\\}}: A Provably Convergent Asynchronous Block Parallel Stochastic Method using Deep Denoising Priors},\nauthor={Yu Sun and Jiaming Liu and Yiran Sun and Brendt Wohlberg and Ulugbek Kamilov},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9EsrXMzlFQY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "2;3;5;2", "wc_review": "260;250;431;285", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "635;494;852;208", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.0, 1.224744871391589 ], "wc_review_avg": [ 306.5, 73.00171230868493 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 547.25, 233.7192493142146 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12618004746706410893&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=9EsrXMzlFQY", "email": "wustl.edu;;;lanl.gov;wustl.edu", "author_num": 5, "aff_unique_index": "0;1;0", "aff_unique_norm": "Washington University in St. Louis;Los Alamos National Laboratory", "aff_unique_dep": ";", "aff_unique_url": "https://wustl.edu;https://www.lanl.gov", "aff_unique_abbr": "WUSTL;LANL", "aff_campus_unique_index": "0;0", "aff_campus_unique": "St. Louis;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "DrNAS: Dirichlet Neural Architecture Search", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3124", "id": "9FWas6YbmB3", "poster": "", "openreview": "https://openreview.net/forum?id=9FWas6YbmB3", "slides": "https://iclr.cc/virtual/2021/poster/3124", "video": "https://iclr.cc/virtual/2021/poster/3124", "author_site": "Xiangning Chen, Ruochen Wang, Minhao Cheng, Xiaocheng Tang, Cho-Jui Hsieh", "tldr": "", "abstract": "This paper proposes a novel differentiable architecture search method by formulating it into a distribution learning problem. We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution. With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based optimizer in an end-to-end manner. This formulation improves the generalization ability and induces stochasticity that naturally encourages exploration in the search space. Furthermore, to alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme that enables searching directly on large-scale tasks, eliminating the gap between search and evaluation phases. Extensive experiments demonstrate the effectiveness of our method. Specifically, we obtain a test error of 2.46\\% for CIFAR-10, 23.7\\% for ImageNet under the mobile setting. On NAS-Bench-201, we also achieve state-of-the-art results on all three datasets and provide insights for the effective design of neural architecture search algorithms.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/18c53bb6a5368cf3b278b250c76588e58c1565f0.zip", "author": "Xiangning Chen;Ruochen Wang;Minhao Cheng;Xiaocheng Tang;Cho-Jui Hsieh", "authorids": "~Xiangning_Chen1;~Ruochen_Wang2;~Minhao_Cheng1;~Xiaocheng_Tang1;~Cho-Jui_Hsieh1", "gender": "M;M;M;;M", "homepage": ";https://ruocwang.github.io/;https://cmhcbb.github.io/;https://mktal.github.io/;http://web.cs.ucla.edu/~chohsieh/index.html", "dblp": "56/7393;33/120;174/1717;03/6299;14/2770", "google_scholar": "vNcBx1sAAAAJ;8fXrlRAAAAAJ;_LkC1yoAAAAJ;fSrzDjIAAAAJ;Wy89g4IAAAAJ", "orcid": ";;0000-0003-3965-4215;;", "linkedin": ";ruochen-wang-1699b1113/;;xiaochengt/;", "or_profile": "~Xiangning_Chen1;~Ruochen_Wang2;~Minhao_Cheng1;~Xiaocheng_Tang1;~Cho-Jui_Hsieh1", "aff": "University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles;;University of California, Los Angeles", "aff_domain": "cs.ucla.edu;ucla.edu;ucla.edu;;ucla.edu", "position": "PhD student;MS student;PhD student;;Assistant Professor", "bibtex": "@inproceedings{\nchen2021drnas,\ntitle={Dr{\\{}NAS{\\}}: Dirichlet Neural Architecture Search},\nauthor={Xiangning Chen and Ruochen Wang and Minhao Cheng and Xiaocheng Tang and Cho-Jui Hsieh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9FWas6YbmB3}\n}", "github": "[![github](/images/github_icon.svg) xiangning-chen/DrNAS](https://github.com/xiangning-chen/DrNAS)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;3;2;4", "wc_review": "801;276;230;516", "wc_reply_reviewers": "0;0;0;128", "wc_reply_authors": "811;636;789;1292", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;1;4", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 455.75, 226.9916022675729 ], "wc_reply_reviewers_avg": [ 32.0, 55.42562584220407 ], "wc_reply_authors_avg": [ 882.0, 246.122936761286 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 1.224744871391589 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 166, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10097373512584874749&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=9FWas6YbmB3", "email": "cs.ucla.edu;ucla.edu;ucla.edu;;ucla.edu", "author_num": 5, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "University of California, Los Angeles", "aff_unique_dep": "", "aff_unique_url": "https://www.ucla.edu", "aff_unique_abbr": "UCLA", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Reweighting Augmented Samples by Minimizing the Maximal Expected Loss", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3131", "id": "9G5MIc-goqB", "poster": "", "openreview": "https://openreview.net/forum?id=9G5MIc-goqB", "slides": "https://iclr.cc/virtual/2021/poster/3131", "video": "https://iclr.cc/virtual/2021/poster/3131", "author_site": "Mingyang Yi, LU HOU, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma", "tldr": "", "abstract": "Data augmentation is an effective technique to improve the generalization of deep neural networks. However, previous data augmentation methods usually treat the augmented samples equally without considering their individual impacts on the model. To address this, for the augmented samples from the same training example, we propose to assign different weights to them. We construct the maximal expected loss which is the supremum over any reweighted loss on augmented samples. Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i.e., harder examples). Minimizing this maximal expected loss enables the model to perform well under any reweighting strategy. The proposed method can generally be applied on top of any data augmentation methods. Experiments are conducted on both natural language understanding tasks with token-level data augmentation, and image classification tasks with commonly-used image augmentation techniques like random crop and horizontal flip. Empirical results show that the proposed method improves the generalization performance of the model.", "keywords": "data augmentation;sample reweighting", "primary_area": "", "supplementary_material": "", "author": "Mingyang Yi;Lu Hou;Lifeng Shang;Xin Jiang;Qun Liu;Zhi-Ming Ma", "authorids": "~Mingyang_Yi1;~Lu_Hou2;~Lifeng_Shang1;~Xin_Jiang1;~Qun_Liu1;~Zhi-Ming_Ma1", "gender": "M;M;M;M;;F", "homepage": "http://mingyangyi.github.io;;;http://liuquncn.github.io/;http://homepage.amss.ac.cn/research/homePage/8eb59241e2e74d828fb84eec0efadba5/myHomePage.html;https://houlu369.github.io/", "dblp": ";70/4288;42/4142-2;75/4402-1;;", "google_scholar": "RlOZiPUAAAAJ;https://scholar.google.com.hk/citations?user=jMQIjYoAAAAJ;DUfcez0AAAAJ;2HhiGzcAAAAJ;;https://scholar.google.com.hk/citations?user=rnjoL5cAAAAJ", "orcid": ";;0000-0002-9117-8247;0000-0002-7000-1792;;", "linkedin": ";;xin-jiang-9577b76/;qunliu/;;", "or_profile": "~Mingyang_Yi1;~Lifeng_Shang1;~Xin_Jiang1;~Qun_Liu1;~Zhi-Ming_Ma1;~LU_HOU1", "aff": "Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Chinese Academy of Sciences;Huawei Technologies Ltd.;Noah\u2019s Ark Lab, Huawei Technologies;Huawei Noah's Ark Lab;Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Chinese Academy of Sciences;Huawei Technologies Ltd.", "aff_domain": "amss.ac.cn;huawei.com;huawei.com;huawei.com;amss.ac.cn;huawei.com", "position": "PhD student;Researcher;Principal Researcher;Chief Scientist of Speech and Language Computing;Full Professor;researcher", "bibtex": "@inproceedings{\nyi2021reweighting,\ntitle={Reweighting Augmented Samples by Minimizing the Maximal Expected Loss},\nauthor={Mingyang Yi and Lu Hou and Lifeng Shang and Xin Jiang and Qun Liu and Zhi-Ming Ma},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9G5MIc-goqB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7", "confidence": "4;4;3", "wc_review": "419;625;238", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "971;1024;529", "reply_reviewers": "0;0;0", "reply_authors": "2;2;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 427.3333333333333, 158.10193617480533 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 841.3333333333334, 221.91039232587153 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16983914225242467666&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=9G5MIc-goqB", "email": "amss.ac.cn;huawei.com;huawei.com;huawei.com;amss.ac.cn;huawei.com", "author_num": 6, "aff_unique_index": "0;1;1;1;0;1", "aff_unique_norm": "Chinese Academy of Sciences;Huawei", "aff_unique_dep": "Academy of Mathematics and Systems Science;Huawei Technologies", "aff_unique_url": "http://www.cas.cn;https://www.huawei.com", "aff_unique_abbr": "CAS;Huawei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "China" }, { "title": "HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2883", "id": "9GBZBPn0Jx", "poster": "", "openreview": "https://openreview.net/forum?id=9GBZBPn0Jx", "slides": "https://iclr.cc/virtual/2021/poster/2883", "video": "https://iclr.cc/virtual/2021/poster/2883", "author_site": "Deyao Zhu, Mohamed Zahran, Erran Li, Mohamed Elhoseiny", "tldr": "", "abstract": "Motion forecasting is essential for making intelligent decisions in robotic navigation. As a result, the multi-agent behavioral prediction has become a core component of modern human-robot interaction applications such as autonomous driving. Due to various intentions and interactions among agents, agent trajectories can have multiple possible futures. Hence, the motion forecasting model's ability to cover possible modes becomes essential to enable accurate prediction. Towards this goal, we introduce HalentNet to better model the future motion distribution in addition to a traditional trajectory regression learning objective by incorporating generative augmentation losses. We model intents with unsupervised discrete random variables whose training is guided by a collaboration between two key signals: A discriminative loss that encourages intents' diversity and a hallucinative loss that explores intent transitions (i.e., mixed intents) and encourages their smoothness. This regulates the neural network behavior to be more accurately predictive on uncertain scenarios due to the active yet careful exploration of possible future agent behavior. Our model's learned representation leads to better and more semantically meaningful coverage of the trajectory distribution. Our experiments show that our method can improve over the state-of-the-art trajectory forecasting benchmarks, including vehicles and pedestrians, for about 20% on average FDE and 50% on road boundary violation rate when predicting 6 seconds future. We also conducted human experiments to show that our predicted trajectories received 39.6% more votes than the runner-up approach and 32.2% more votes than our variant without hallucinative mixed intent loss. The code will be released soon. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Deyao Zhu;Mohamed Zahran;Li Erran Li;Mohamed Elhoseiny", "authorids": "~Deyao_Zhu1;~Mohamed_Zahran1;~Li_Erran_Li1;~Mohamed_Elhoseiny1", "gender": "M;M;;M", "homepage": "https://tsutikgiau.github.io/;;http://www.cs.columbia.edu/~lierranli/;http://www.mohamed-elhoseiny.com", "dblp": "251/6017;;l/ErranLLi.html;125/2894", "google_scholar": "dENNKrsAAAAJ;https://scholar.google.com.eg/citations?user=Wdv4WLYAAAAJ;GkMfzy4AAAAJ;iRBUTOAAAAAJ", "orcid": ";0000-0002-4082-814X;;0000-0001-9659-1551", "linkedin": "deyao-zhu-205774154/;mzahran001/;;mohamed-elhoseiny-8a836215/", "or_profile": "~Deyao_Zhu1;~Mohamed_Zahran1;~Li_Erran_Li1;~Mohamed_Elhoseiny1", "aff": "KAUST;Udacity;Columbia University;KAUST", "aff_domain": "kaust.edu.sa;udacity.com;columbia.edu;kaust.edu.sa", "position": "PhD student;Program Experience Manager;Adjunct Professor;Associate Professor", "bibtex": "@inproceedings{\nzhu2021halentnet,\ntitle={HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents},\nauthor={Deyao Zhu and Mohamed Zahran and Li Erran Li and Mohamed Elhoseiny},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9GBZBPn0Jx}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "5;6;6;8", "confidence": "5;3;4;3", "wc_review": "464;303;695;236", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1245;848;729;160", "reply_reviewers": "0;0;0;0", "reply_authors": "3;2;2;1", "rating_avg": [ 6.25, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 424.5, 176.79437208237144 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 745.5, 388.2940251922504 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7608859102526822, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4524069867673893710&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=9GBZBPn0Jx", "email": "kaust.edu.sa;udacity.com;columbia.edu;kaust.edu.sa", "author_num": 4, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "King Abdullah University of Science and Technology;Udacity;Columbia University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.kaust.edu.sa;https://www.udacity.com;https://www.columbia.edu", "aff_unique_abbr": "KAUST;Udacity;Columbia", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "Saudi Arabia;United States" }, { "id": "9GUTgHZgKCH", "title": "Reducing the number of neurons of Deep ReLU Networks based on the current theory of Regularization", "track": "main", "status": "Reject", "tldr": "", "abstract": "We introduce a new Reduction Algorithm which makes use of the properties of ReLU neurons to reduce significantly the number of neurons in a trained Deep Neural Network. This algorithm is based on the recent theory of implicit and explicit regularization in Deep ReLU Networks from (Maennel et al, 2018) and the authors.\n\nWe discuss two experiments which illustrate the efficiency of the algorithm to reduce the number of neurons significantly with provably almost no change of the learned function within the training data (and therefore almost no loss in accuracy).", "keywords": "Reduction;Compression;Regularization;Theory;Pruning;Deep;Interpretability;Generalization", "primary_area": "", "supplementary_material": "", "author": "Jakob Heiss;Alexis Stockinger;Josef Teichmann", "authorids": "~Jakob_Heiss1;~Alexis_Stockinger1;josef.teichmann@math.ethz.ch", "gender": "M;;", "homepage": "http://jakob.heisss.at;;", "dblp": ";;", "google_scholar": "mKqte6AAAAAJ;;", "orcid": "0000-0003-1447-6782;0000-0002-1072-9511;", "linkedin": "jakob-heiss/;;", "or_profile": "~Jakob_Heiss1;~Alexis_Stockinger1;josef.teichmann@math.ethz.ch", "aff": "Swiss Federal Institute of Technology;Swiss Federal Institute of Technology;", "aff_domain": "ethz.ch;ethz.ch;", "position": "PhD student;MS student;", "bibtex": "@misc{\nheiss2021reducing,\ntitle={Reducing the number of neurons of Deep Re{\\{}LU{\\}} Networks based on the current theory of Regularization},\nauthor={Jakob Heiss and Alexis Stockinger and Josef Teichmann},\nyear={2021},\nurl={https://openreview.net/forum?id=9GUTgHZgKCH}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer5;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=9GUTgHZgKCH", "pdf_size": 0, "rating": "2;2;2;3;4", "confidence": "5;4;5;4;4", "wc_review": "140;657;66;318;512", "wc_reply_reviewers": "0;157;0;0;0", "wc_reply_authors": "263;904;120;858;1418", "reply_reviewers": "0;1;0;0;0", "reply_authors": "1;2;1;2;3", "rating_avg": [ 2.6, 0.8 ], "confidence_avg": [ 4.4, 0.48989794855663565 ], "wc_review_avg": [ 338.6, 221.64079046962453 ], "wc_reply_reviewers_avg": [ 31.4, 62.8 ], "wc_reply_authors_avg": [ 712.6, 470.88835194767773 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 1.8, 0.7483314773547883 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6123724356957946, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1012934606518264380&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Swiss Federal Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Switzerland" }, { "title": "Progressive Skeletonization: Trimming more fat from a network at initialization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2957", "id": "9GsFOUyUPi", "poster": "", "openreview": "https://openreview.net/forum?id=9GsFOUyUPi", "slides": "https://iclr.cc/virtual/2021/poster/2957", "video": "https://iclr.cc/virtual/2021/poster/2957", "author_site": "Pau de Jorge Aranda, Amartya Sanyal, Harkirat Singh Behl, Philip Torr, Gr\u00e9gory Rogez, Puneet Dokania", "tldr": "", "abstract": "Recent studies have shown that skeletonization (pruning parameters) of networks at initialization provides all the practical benefits of sparsity both at inference and training time, while only marginally degrading their performance. However, we observe that beyond a certain level of sparsity (approx 95%), these approaches fail to preserve the network performance, and to our surprise, in many cases perform even worse than trivial random pruning. To this end, we propose an objective to find a skeletonized network with maximum foresight connection sensitivity (FORCE) whereby the trainability, in terms of connection sensitivity, of a pruned network is taken into consideration. We then propose two approximate procedures to maximize our objective (1) Iterative SNIP: allows parameters that were unimportant at earlier stages of skeletonization to become important at later stages; and (2) FORCE: iterative process that allows exploration by allowing already pruned parameters to resurrect at later stages of skeletonization. Empirical analysis on a large suite of experiments show that our approach, while providing at least as good performance as other recent approaches on moderate pruning levels, provide remarkably improved performance on high pruning levels (could remove up to 99.5% parameters while keeping the networks trainable).", "keywords": "Pruning;Pruning at initialization;Sparsity", "primary_area": "", "supplementary_material": "", "author": "Pau de Jorge;Amartya Sanyal;Harkirat Behl;Philip Torr;Gr\u00e9gory Rogez;Puneet K. Dokania", "authorids": "~Pau_de_Jorge1;~Amartya_Sanyal1;~Harkirat_Behl1;~Philip_Torr1;~Gr\u00e9gory_Rogez1;~Puneet_K._Dokania1", "gender": "M;M;M;;M;M", "homepage": "https://europe.naverlabs.com/people_user/Pau-De-Jorge/;https://amartya18x.github.io;https://harkiratbehl.github.io/;http://www.robots.ox.ac.uk/~tvg/;https://europe.naverlabs.com/people_user/gregory-rogez/;http://puneetkdokania.github.io/", "dblp": "267/5657;203/8807;199/2125;;49/4408;150/4211", "google_scholar": "https://scholar.google.hk/citations?user=9voBw90AAAAJ;;R7k23-0AAAAJ;;Atzr3VgAAAAJ;https://scholar.google.fr/citations?user=WsM7ybkAAAAJ", "orcid": ";0000-0002-4190-0449;;;;", "linkedin": "pau-de-jorge-aranda/;;;;gr\u00e9gory-rogez/?originalSubdomain=fr;", "or_profile": "~Pau_de_Jorge1;~Amartya_Sanyal1;~Harkirat_Behl1;~Philip_Torr1;~Gregory_Rogez3;~Puneet_Dokania1", "aff": "University of Oxford;University of Oxford;University of Oxford;University of Oxford;Naver Labs Europe;University of Oxford", "aff_domain": "ox.ac.uk;ox.ac.uk;ox.ac.uk;ox.ac.uk;naverlabs.com;oxford.ac.uk", "position": "PhD student;PhD student;PhD student;Full Professor;Group Lead - Senior Scientist;Senior Researcher", "bibtex": "@inproceedings{\njorge2021progressive,\ntitle={Progressive Skeletonization: Trimming more fat from a network at initialization},\nauthor={Pau de Jorge and Amartya Sanyal and Harkirat Behl and Philip Torr and Gr{\\'e}gory Rogez and Puneet K. Dokania},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9GsFOUyUPi}\n}", "github": "[![github](/images/github_icon.svg) naver/force](https://github.com/naver/force)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7", "confidence": "3;3;4", "wc_review": "189;305;481", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "368;333;706", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 325.0, 120.04443621703867 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 469.0, 168.1923502025781 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 118, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5929326556429040468&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=9GsFOUyUPi", "email": "ox.ac.uk;ox.ac.uk;ox.ac.uk;ox.ac.uk;naverlabs.com;oxford.ac.uk", "author_num": 6, "aff_unique_index": "0;0;0;0;1;0", "aff_unique_norm": "University of Oxford;NAVER LABS", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://labs.naver.com", "aff_unique_abbr": "Oxford;NLE", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;1;0", "aff_country_unique": "United Kingdom;Unknown" }, { "title": "Multi-timescale Representation Learning in LSTM Language Models", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3095", "id": "9ITXiTrAoT", "poster": "", "openreview": "https://openreview.net/forum?id=9ITXiTrAoT", "slides": "https://iclr.cc/virtual/2021/poster/3095", "video": "https://iclr.cc/virtual/2021/poster/3095", "author_site": "Shivangi Mahto, Vy Vo, Javier Turek, Alexander Huth", "tldr": "", "abstract": "Language models must capture statistical dependencies between words at timescales ranging from very short to very long. Earlier work has demonstrated that dependencies in natural language tend to decay with distance between words according to a power law. However, it is unclear how this knowledge can be used for analyzing or designing neural network language models. In this work, we derived a theory for how the memory gating mechanism in long short-term memory (LSTM) language models can capture power law decay. We found that unit timescales within an LSTM, which are determined by the forget gate bias, should follow an Inverse Gamma distribution. Experiments then showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution. Further, we found that explicitly imposing the theoretical distribution upon the model during training yielded better language model perplexity overall, with particular improvements for predicting low-frequency (rare) words. Moreover, the explicit multi-timescale model selectively routes information about different types of words through units with different timescales, potentially improving model interpretability. These results demonstrate the importance of careful, theoretically-motivated analysis of memory and timescale in language models.", "keywords": "Language Model;LSTM;timescales", "primary_area": "", "supplementary_material": "", "author": "Shivangi Mahto;Vy Ai Vo;Javier S. Turek;Alexander Huth", "authorids": "shivangi@utexas.edu;vy.vo@intel.com;~Javier_S._Turek1;~Alexander_Huth1", "gender": ";;;", "homepage": ";;;https://www.cs.utexas.edu/~huth/", "dblp": ";;;44/8860.html", "google_scholar": ";;;JNXWWkIAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "shivangi@utexas.edu;vy.vo@intel.com;~Javier_S._Turek1;~Alexander_Huth1", "aff": ";;;The University of Texas at Austin", "aff_domain": ";;;utexas.edu", "position": ";;;Assistant Professor", "bibtex": "@inproceedings{\nmahto2021multitimescale,\ntitle={Multi-timescale Representation Learning in {\\{}LSTM{\\}} Language Models},\nauthor={Shivangi Mahto and Vy Ai Vo and Javier S. Turek and Alexander Huth},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9ITXiTrAoT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "3;4;4;4", "wc_review": "850;561;449;226", "wc_reply_reviewers": "45;0;0;0", "wc_reply_authors": "1151;876;430;414", "reply_reviewers": "1;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 521.5, 224.74930478201708 ], "wc_reply_reviewers_avg": [ 11.25, 19.48557158514987 ], "wc_reply_authors_avg": [ 717.75, 311.3730680389683 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 35, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14412836618537439668&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=9ITXiTrAoT", "email": ";;;utexas.edu", "author_num": 4, "aff_unique_index": "0", "aff_unique_norm": "University of Texas at Austin", "aff_unique_dep": "", "aff_unique_url": "https://www.utexas.edu", "aff_unique_abbr": "UT Austin", "aff_campus_unique_index": "0", "aff_campus_unique": "Austin", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "9MdLwggYa02", "title": "ROMUL: Scale Adaptative Population Based Training", "track": "main", "status": "Reject", "tldr": "", "abstract": "In most pragmatic settings, data augmentation and regularization are essential, and require hyperparameter search.\nPopulation based training (PBT) is an effective tool for efficiently finding them as well as schedules over hyperparameters.\nIn this paper, we compare existing PBT algorithms and contribute a new one: ROMUL, for RObust MULtistep search, which adapts its stepsize over the course of training.\nWe report competitive results with standard models on CIFAR (image classification) as well as Penn Tree Bank (language modeling), which both depend on heavy regularization.\nWe also open-source hoptim, a PBT library agnostic to the training framework, which is simple to use, reentrant, and provides good defaults with ROMUL.", "keywords": "hyperparameter search;population based training;differential evolution;hyperparameter optimization;online optimization;deep learning", "primary_area": "", "supplementary_material": "/attachment/f9b704af9013b8982f9edd49e262b95d6a2bdf38.zip", "author": "Daniel HAZIZA;J\u00e9r\u00e9my Rapin;Gabriel Synnaeve", "authorids": "~Daniel_HAZIZA2;jrapin@fb.com;~Gabriel_Synnaeve1", "gender": ";;M", "homepage": ";;", "dblp": ";;http://dblp.uni-trier.de/pers/hd/s/Synnaeve:Gabriel", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;;wN9rBkcAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Daniel_HAZIZA2;jrapin@fb.com;~Gabriel_Synnaeve1", "aff": "Facebook AI Research (FAIR);;Meta Facebook", "aff_domain": "fb.com;;fb.com", "position": "Research Engineer;;Research Scientist", "bibtex": "@misc{\nhaziza2021romul,\ntitle={{\\{}ROMUL{\\}}: Scale Adaptative Population Based Training},\nauthor={Daniel HAZIZA and J{\\'e}r{\\'e}my Rapin and Gabriel Synnaeve},\nyear={2021},\nurl={https://openreview.net/forum?id=9MdLwggYa02}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=9MdLwggYa02", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "4;4;4;5", "wc_review": "246;579;632;323", "wc_reply_reviewers": "0;238;189;0", "wc_reply_authors": "418;908;743;416", "reply_reviewers": "0;1;1;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 445.0, 163.86732438164725 ], "wc_reply_reviewers_avg": [ 106.75, 108.14660188836264 ], "wc_reply_authors_avg": [ 621.25, 212.41866090341497 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.9271726499455306, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:YQwV3ePH5WIJ:scholar.google.com/&scioq=ROMUL:+Scale+Adaptative+Population+Based+Training&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Facebook AI Research", "aff_unique_url": "https://research.facebook.com", "aff_unique_abbr": "FAIR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Graph Convolution with Low-rank Learnable Local Filters", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2849", "id": "9OHFhefeB86", "poster": "", "openreview": "https://openreview.net/forum?id=9OHFhefeB86", "slides": "https://iclr.cc/virtual/2021/poster/2849", "video": "https://iclr.cc/virtual/2021/poster/2849", "author_site": "Xiuyuan Cheng, Zichen Miao, Qiang Qiu", "tldr": "", "abstract": "Geometric variations like rotation, scaling, and viewpoint changes pose a significant challenge to visual understanding. One common solution is to directly model certain intrinsic structures, e.g., using landmarks. However, it then becomes non-trivial to build effective deep models, especially when the underlying non-Euclidean grid is irregular and coarse. Recent deep models using graph convolutions provide an appropriate framework to handle such non-Euclidean data, but many of them, particularly those based on global graph Laplacians, lack expressiveness to capture local features required for representation of signals lying on the non-Euclidean grid. The current paper introduces a new type of graph convolution with learnable low-rank local filters, which is provably more expressive than previous spectral graph convolution methods. The model also provides a unified framework for both spectral and spatial graph convolutions. To improve model robustness, regularization by local graph Laplacians is introduced. The representation stability against input graph data perturbation is theoretically proved, making use of the graph filter locality and the local graph regularization. Experiments on spherical mesh data, real-world facial expression recognition/skeleton-based action recognition data, and data with simulated graph noise show the empirical advantage of the proposed model.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/5b804f43368e21a94b5d7a911ce39d00b34e8bf2.zip", "author": "Xiuyuan Cheng;Zichen Miao;Qiang Qiu", "authorids": "~Xiuyuan_Cheng1;~Zichen_Miao1;~Qiang_Qiu1", "gender": ";M;", "homepage": ";https://zichenmiao.github.io;https://web.ics.purdue.edu/~qqiu/", "dblp": "79/9747;206/1549;97/360", "google_scholar": "I2gwdssAAAAJ;Kmv2KIkAAAAJ;jdLtt_YAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Xiuyuan_Cheng1;~Zichen_Miao1;~Qiang_Qiu1", "aff": "Duke University;Purdue University;Purdue University", "aff_domain": "duke.edu;purdue.edu;purdue.edu", "position": "Assistant Professor;PhD student;Assistant Professor", "bibtex": "@inproceedings{\ncheng2021graph,\ntitle={Graph Convolution with Low-rank Learnable Local Filters},\nauthor={Xiuyuan Cheng and Zichen Miao and Qiang Qiu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9OHFhefeB86}\n}", "github": "[![github](/images/github_icon.svg) ZichenMiao/GNN-with-Low-rank-Learnable-Local-Filters](https://github.com/ZichenMiao/GNN-with-Low-rank-Learnable-Local-Filters) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=9OHFhefeB86)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "7;7;7;8", "confidence": "3;3;3;5", "wc_review": "143;185;405;477", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 7.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 302.5, 141.60067090236544 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 20, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11972287774874578634&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=9OHFhefeB86", "email": "duke.edu;purdue.edu;purdue.edu", "author_num": 3, "aff_unique_index": "0;1;1", "aff_unique_norm": "Duke University;Purdue University", "aff_unique_dep": ";", "aff_unique_url": "https://www.duke.edu;https://www.purdue.edu", "aff_unique_abbr": "Duke;Purdue", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "BiPointNet: Binary Neural Network for Point Clouds", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3065", "id": "9QLRCVysdlO", "poster": "", "openreview": "https://openreview.net/forum?id=9QLRCVysdlO", "slides": "https://iclr.cc/virtual/2021/poster/3065", "video": "https://iclr.cc/virtual/2021/poster/3065", "author_site": "Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Liu, Hao Su", "tldr": "", "abstract": "To alleviate the resource constraint for real-time point cloud applications that run on edge devices, in this paper we present BiPointNet, the first model binarization approach for efficient deep learning on point clouds. We discover that the immense performance drop of binarized models for point clouds mainly stems from two challenges: aggregation-induced feature homogenization that leads to a degradation of information entropy, and scale distortion that hinders optimization and invalidates scale-sensitive structures. With theoretical justifications and in-depth analysis, our BiPointNet introduces Entropy-Maximizing Aggregation (EMA) to modulate the distribution before aggregation for the maximum information entropy, and Layer-wise Scale Recovery (LSR) to efficiently restore feature representation capacity. Extensive experiments show that BiPointNet outperforms existing binarization methods by convincing margins, at the level even comparable with the full precision counterpart. We highlight that our techniques are generic, guaranteeing significant improvements on various fundamental tasks and mainstream backbones. Moreover, BiPointNet gives an impressive 14.7\u00d7 speedup and 18.9\u00d7 storage saving on real-world resource-constrained devices.", "keywords": "point clouds;efficient deep learning;binary neural networks", "primary_area": "", "supplementary_material": "", "author": "Haotong Qin;Zhongang Cai;Mingyuan Zhang;Yifu Ding;Haiyu Zhao;Shuai Yi;Xianglong Liu;Hao Su", "authorids": "qinhaotong@buaa.edu.cn;~Zhongang_Cai1;~Mingyuan_Zhang1;~Yifu_Ding2;~Haiyu_Zhao1;~Shuai_Yi3;~Xianglong_Liu2;~Hao_Su1", "gender": ";M;M;F;M;M;M;M", "homepage": ";https://caizhongang.com;https://mingyuan-zhang.github.io/;https://yifu-ding.github.io/;;https://scholar.google.com/citations?hl=zh-CN&pli=1&user=afbbNmwAAAAJ;http://www.nlsde.buaa.edu.cn/~xlliu;http://ai.ucsd.edu/~haosu", "dblp": ";232/3190;;;;150/6633;55/7901;09/4945-1", "google_scholar": ";WrDKqIAAAAAJ;2QLD4fAAAAAJ;RCEI1r0AAAAJ;sMQV1ecAAAAJ;https://scholar.google.com/citations?hl=zh-CN;https://scholar.google.com.hk/citations?user=8VY7ZDcAAAAJ;1P8Zu04AAAAJ", "orcid": ";0000-0002-1810-3855;;0000-0002-3612-8757;0000-0002-0415-4248;;;", "linkedin": ";caizhongang/;;yifu-ding-253614186/;;;;", "or_profile": "qinhaotong@buaa.edu.cn;~Zhongang_Cai1;~Mingyuan_Zhang1;~Yifu_Ding2;~Haiyu_Zhao1;~Shuai_Yi3;~Xianglong_Liu2;~Hao_Su1", "aff": ";Nanyang Technological University;Sensetime;Beihang University;Sensetime International Pte. Ltd.;SenseTime Group Limited;Beihang University;University of California, San Diego", "aff_domain": ";ntu.edu.sg;sensetime.com;buaa.edu.cn;sensetime.com;sensetime.com;buaa.edu.cn;ucsd.edu", "position": ";PhD student;Researcher;Undergrad student;Senior Researcher;Researcher;Associate Professor;Assistant Professor", "bibtex": "@inproceedings{\nqin2021bipointnet,\ntitle={BiPointNet: Binary Neural Network for Point Clouds},\nauthor={Haotong Qin and Zhongang Cai and Mingyuan Zhang and Yifu Ding and Haiyu Zhao and Shuai Yi and Xianglong Liu and Hao Su},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9QLRCVysdlO}\n}", "github": "[![github](/images/github_icon.svg) htqin/BiPointNet](https://github.com/htqin/BiPointNet)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "4;7;7;8", "confidence": "5;3;3;3", "wc_review": "435;135;165;339", "wc_reply_reviewers": "260;0;0;0", "wc_reply_authors": "2955;214;379;330", "reply_reviewers": "1;0;0;0", "reply_authors": "5;1;1;1", "rating_avg": [ 6.5, 1.5 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 268.5, 123.72045101760662 ], "wc_reply_reviewers_avg": [ 65.0, 112.58330249197702 ], "wc_reply_authors_avg": [ 969.5, 1147.8938321987796 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 1.7320508075688772 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.9622504486493763, "gs_citation": 60, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2821902497514525897&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=9QLRCVysdlO", "email": ";ntu.edu.sg;sensetime.com;buaa.edu.cn;sensetime.com;sensetime.com;buaa.edu.cn;ucsd.edu", "author_num": 8, "aff_unique_index": "0;1;2;3;4;2;5", "aff_unique_norm": "Nanyang Technological University;SenseTime;Beihang University;Sensetime International Pte. Ltd.;SenseTime Group Limited;University of California, San Diego", "aff_unique_dep": ";;;;;", "aff_unique_url": "https://www.ntu.edu.sg;https://www.sensetime.com;http://www.buaa.edu.cn/;https://www.sensetime.com;https://www.sensetime.com;https://www.ucsd.edu", "aff_unique_abbr": "NTU;SenseTime;BUAA;;SenseTime;UCSD", "aff_campus_unique_index": "1", "aff_campus_unique": ";San Diego", "aff_country_unique_index": "0;1;1;0;1;1;2", "aff_country_unique": "Singapore;China;United States" }, { "title": "Solving Compositional Reinforcement Learning Problems via Task Reduction", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3368", "id": "9SS69KwomAM", "poster": "", "openreview": "https://openreview.net/forum?id=9SS69KwomAM", "slides": "https://iclr.cc/virtual/2021/poster/3368", "video": "https://iclr.cc/virtual/2021/poster/3368", "author_site": "Yunfei Li, Yilin Wu, Huazhe Xu, Xiaolong Wang, Yi Wu", "tldr": "", "abstract": "We propose a novel learning paradigm, Self-Imitation via Reduction (SIR), for solving compositional reinforcement learning problems. SIR is based on two core ideas: task reduction and self-imitation. Task reduction tackles a hard-to-solve task by actively reducing it to an easier task whose solution is known by the RL agent. Once the original hard task is successfully solved by task reduction, the agent naturally obtains a self-generated solution trajectory to imitate. By continuously collecting and imitating such demonstrations, the agent is able to progressively expand the solved subspace in the entire task space. Experiment results show that SIR can significantly accelerate and improve learning on a variety of challenging sparse-reward continuous-control problems with compositional structures. Code and videos are available at https://sites.google.com/view/sir-compositional.", "keywords": "compositional task;sparse reward;reinforcement learning;task reduction;imitation learning", "primary_area": "", "supplementary_material": "/attachment/54b84ef85ce580bdd6d9d567ff3dabf2d3fcc004.zip", "author": "Yunfei Li;Yilin Wu;Huazhe Xu;Xiaolong Wang;Yi Wu", "authorids": "~Yunfei_Li1;wuyilin98@gmail.com;~Huazhe_Xu1;~Xiaolong_Wang3;~Yi_Wu1", "gender": ";;M;M;M", "homepage": "https://irisli17.github.io/;;http://hxu.rocks;https://xiaolonw.github.io/;https://jxwuyi.weebly.com", "dblp": ";;164/9006;91/952-4;", "google_scholar": "https://scholar.google.com/citations?hl=en;;t9HPFawAAAAJ;Y8O9N_0AAAAJ;dusV5HMAAAAJ", "orcid": "0000-0003-0988-9400;;;;", "linkedin": ";;;;", "or_profile": "~Yunfei_Li1;wuyilin98@gmail.com;~Huazhe_Xu1;~Xiaolong_Wang3;~Yi_Wu1", "aff": "Institute for Interdisciplinary Information Sciences, Tsinghua University;;University of California, Berkeley;University of California, San Diego;Tsinghua University", "aff_domain": "tsinghua.edu.cn;;berkeley.edu;ucsd.edu;tsinghua.edu.cn", "position": "PhD student;;Ph.D. Student;Assistant Professor;Assistant Professor", "bibtex": "@inproceedings{\nli2021solving,\ntitle={Solving Compositional Reinforcement Learning Problems via Task Reduction},\nauthor={Yunfei Li and Yilin Wu and Huazhe Xu and Xiaolong Wang and Yi Wu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9SS69KwomAM}\n}", "github": "[![github](/images/github_icon.svg) IrisLi17/self-imitation-via-reduction](https://github.com/IrisLi17/self-imitation-via-reduction)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "3;5;6;7", "confidence": "3;4;4;3", "wc_review": "605;389;419;680", "wc_reply_reviewers": "0;270;0;0", "wc_reply_authors": "1071;821;332;268", "reply_reviewers": "0;1;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.25, 1.479019945774904 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 523.25, 122.62213299400725 ], "wc_reply_reviewers_avg": [ 67.5, 116.91342951089922 ], "wc_reply_authors_avg": [ 623.0, 335.6389429133634 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.16903085094570333, "gs_citation": 27, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15628616147808752058&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=9SS69KwomAM", "email": "tsinghua.edu.cn;;berkeley.edu;ucsd.edu;tsinghua.edu.cn", "author_num": 5, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "Tsinghua University;University of California, Berkeley;University of California, San Diego", "aff_unique_dep": "Institute for Interdisciplinary Information Sciences;;", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.berkeley.edu;https://www.ucsd.edu", "aff_unique_abbr": "Tsinghua;UC Berkeley;UCSD", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Berkeley;San Diego", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "China;United States" }, { "id": "9UFIOHeVEh", "title": "Identifying the Sources of Uncertainty in Object Classification", "track": "main", "status": "Reject", "tldr": "", "abstract": "In image-based object classification, the visual appearance of objects determines which class they are assigned to. External variables that are independent of the object, such as the perspective or the lighting conditions, can modify the object's appearance resulting in ambiguous images that lead to misclassifications. Previous work has proposed methods for estimating the uncertainty of predictions and measure their confidence. However, such methods do not indicate which variables are the potential sources that cause uncertainty. In this paper, we propose a method for image-based object classification that uses disentangled representations to indicate which are the external variables that contribute the most to the uncertainty of the predictions. This information can be used to identify the external variables that should be modified to decrease the uncertainty and improve the classification.", "keywords": "Classification;Interpretability;Disentangled Representations;Uncertainty Estimation", "primary_area": "", "supplementary_material": "", "author": "Luis Armando P\u00e9rez Rey;Berk \u0130\u015fler;Mike Holenderski;Dmitri Jarnikov", "authorids": "~Luis_Armando_P\u00e9rez_Rey1;berk.isler94@gmail.com;~Mike_Holenderski1;d.s.jarnikov@tue.nl", "gender": "M;;M;", "homepage": ";;;", "dblp": ";;79/6621.html;", "google_scholar": ";;vK0cp1QAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Luis_Armando_P\u00e9rez_Rey1;berk.isler94@gmail.com;~Mike_Holenderski1;d.s.jarnikov@tue.nl", "aff": "Eindhoven University of Technology;;Eindhoven University of Technology;", "aff_domain": "tue.nl;;tue.nl;", "position": "PhD student;;Assistant Professor;", "bibtex": "@misc{\nrey2021identifying,\ntitle={Identifying the Sources of Uncertainty in Object Classification},\nauthor={Luis Armando P{\\'e}rez Rey and Berk {\\.I}{\\c{s}}ler and Mike Holenderski and Dmitri Jarnikov},\nyear={2021},\nurl={https://openreview.net/forum?id=9UFIOHeVEh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=9UFIOHeVEh", "pdf_size": 0, "rating": "3;3;3", "confidence": "4;5;3", "wc_review": "825;394;417", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 3.0, 0.0 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 545.3333333333334, 197.97699080673212 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ByGDTllhTlUJ:scholar.google.com/&scioq=Identifying+the+Sources+of+Uncertainty+in+Object+Classification&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Eindhoven University of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.tue.nl", "aff_unique_abbr": "TU/e", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Netherlands" }, { "id": "9WlOIHve8dU", "title": "Learning Binary Trees via Sparse Relaxation", "track": "main", "status": "Reject", "tldr": "", "abstract": "One of the most classical problems in machine learning is how to learn binary trees that split data into meaningful partitions. From classification/regression via decision trees to hierarchical clustering, binary trees are useful because they (a) are often easy to visualize; (b) make computationally-efficient predictions; and (c) allow for flexible partitioning. Because of this there has been extensive research on how to learn such trees. Optimization generally falls into one of three categories: 1. greedy node-by-node optimization; 2. probabilistic relaxations for differentiability; 3. mixed-integer programming (MIP). Each of these have downsides: greedy can myopically choose poor splits, probabilistic relaxations do not have principled ways to prune trees, MIP methods can be slow on large problems and may not generalize. In this work we derive a novel sparse relaxation for binary tree learning. By sparsely relaxing a new MIP, our approach is able to learn tree splits and tree pruning using state-of-the-art gradient-based approaches. We demonstrate how our approach is easily visualizable, is efficient, and is competitive with current work in classification/regression and hierarchical clustering.", "keywords": "optimization;binary trees", "primary_area": "", "supplementary_material": "", "author": "Valentina Zantedeschi;Matt Kusner;Vlad Niculae", "authorids": "~Valentina_Zantedeschi2;~Matt_Kusner1;~Vlad_Niculae2", "gender": "F;M;M", "homepage": "http://vzantedeschi.com/;http://mkusner.github.io;https://vene.ro", "dblp": "179/2187;120/7700.html;40/10489", "google_scholar": "tdUUrS8AAAAJ;57KRSu8AAAAJ;7_3UAgQAAAAJ", "orcid": ";;", "linkedin": "valentina-zantedeschi-36a65a83/;;", "or_profile": "~Valentina_Zantedeschi2;~Matt_Kusner1;~Vlad_Niculae2", "aff": "INRIA;University College London;University of Amsterdam", "aff_domain": "inria.fr;ucl.ac.uk;uva.nl", "position": "Postdoc;Associate Professor;Assistant Professor", "bibtex": "@misc{\nzantedeschi2021learning,\ntitle={Learning Binary Trees via Sparse Relaxation},\nauthor={Valentina Zantedeschi and Matt Kusner and Vlad Niculae},\nyear={2021},\nurl={https://openreview.net/forum?id=9WlOIHve8dU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=9WlOIHve8dU", "pdf_size": 0, "rating": "3;4;6;7", "confidence": "3;4;4;3", "wc_review": "119;571;672;205", "wc_reply_reviewers": "0;214;0;0", "wc_reply_authors": "291;812;401;170", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.5811388300841898 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 391.75, 234.48813935890232 ], "wc_reply_reviewers_avg": [ 53.5, 92.66471820493493 ], "wc_reply_authors_avg": [ 418.5, 241.4316673512404 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17872604539339185286&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "INRIA;University College London;University of Amsterdam", "aff_unique_dep": ";;", "aff_unique_url": "https://www.inria.fr;https://www.ucl.ac.uk;https://www.uva.nl", "aff_unique_abbr": "INRIA;UCL;UvA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2", "aff_country_unique": "France;United Kingdom;Netherlands" }, { "id": "9Y7_c5ZAd5i", "title": "A Sharp Analysis of Model-based Reinforcement Learning with Self-Play", "track": "main", "status": "Reject", "tldr": "", "abstract": "Model-based algorithms---algorithms that explore the environment through building and utilizing an estimated model---are widely used in reinforcement learning practice and theoretically shown to achieve optimal sample efficiency for single-agent reinforcement learning in Markov Decision Processes (MDPs). However, for multi-agent reinforcement learning in Markov games, the current best known sample complexity for model-based algorithms is rather suboptimal and compares unfavorably against recent model-free approaches. In this paper, we present a sharp analysis of model-based self-play algorithms for multi-agent Markov games. We design an algorithm \\emph{Optimistic Nash Value Iteration} (Nash-VI) for two-player zero-sum Markov games that is able to output an $\\epsilon$-approximate Nash policy in $\\tilde{\\mathcal{O}}(H^3SAB/\\epsilon^2)$ episodes of game playing, where $S$ is the number of states, $A,B$ are the number of actions for the two players respectively, and $H$ is the horizon length. This significantly improves over the best known model-based guarantee of $\\tilde{\\mathcal{O}}(H^4S^2AB/\\epsilon^2)$, and is the first that matches the information-theoretic lower bound $\\Omega(H^3S(A+B)/\\epsilon^2)$ except for a $\\min\\{A,B\\}$ factor. In addition, our guarantee compares favorably against the best known model-free algorithm if $\\min\\{A,B\\}=o(H^3)$, and outputs a single Markov policy while existing sample-efficient model-free algorithms output a nested mixture of Markov policies that is in general non-Markov and rather inconvenient to store and execute. We further adapt our analysis to designing a provably efficient task-agnostic algorithm for zero-sum Markov games, and designing the first line of provably sample-efficient algorithms for multi-player general-sum Markov games.", "keywords": "Reinforcement learning theory;Markov games;model-based RL;task-agnostic RL;multi-agent RL", "primary_area": "", "supplementary_material": "", "author": "Qinghua Liu;Tiancheng Yu;Yu Bai;Chi Jin", "authorids": "~Qinghua_Liu1;~Tiancheng_Yu1;~Yu_Bai1;~Chi_Jin1", "gender": "M;M;;M", "homepage": "http://qinghual2020.github.io/;https://yutc.me;https://yubai.org;https://sites.google.com/view/cjin/home", "dblp": ";215/4910;03/6325-17.html;126/1802-1", "google_scholar": "CotFJJsAAAAJ;mVkGg80AAAAJ;owqhKD8AAAAJ;GINhGvwAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Qinghua_Liu1;~Tiancheng_Yu1;~Yu_Bai1;~Chi_Jin1", "aff": "Princeton University;Massachusetts Institute of Technology;Salesforce Research;Princeton University", "aff_domain": "princeton.edu;mit.edu;salesforce.com;princeton.edu", "position": "PhD student;PhD student;Research Scientist;Assistant Professor", "bibtex": "@misc{\nliu2021a,\ntitle={A Sharp Analysis of Model-based Reinforcement Learning with Self-Play},\nauthor={Qinghua Liu and Tiancheng Yu and Yu Bai and Chi Jin},\nyear={2021},\nurl={https://openreview.net/forum?id=9Y7_c5ZAd5i}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=9Y7_c5ZAd5i", "pdf_size": 0, "rating": "4;5;7;8", "confidence": "5;4;2;4", "wc_review": "1085;679;526;591", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1064;176;346;427", "reply_reviewers": "0;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.0, 1.5811388300841898 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 720.25, 217.4757170352589 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 503.25, 336.182521110185 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5803810000880094, "gs_citation": 168, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10803750602455622490&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "Princeton University;Massachusetts Institute of Technology;Salesforce", "aff_unique_dep": ";;Salesforce Research", "aff_unique_url": "https://www.princeton.edu;https://web.mit.edu;https://research.salesforce.com", "aff_unique_abbr": "Princeton;MIT;Salesforce", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Model Patching: Closing the Subgroup Performance Gap with Data Augmentation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2873", "id": "9YlaeLfuhJF", "poster": "", "openreview": "https://openreview.net/forum?id=9YlaeLfuhJF", "slides": "https://iclr.cc/virtual/2021/poster/2873", "video": "https://iclr.cc/virtual/2021/poster/2873", "author_site": "Karan Goel, Albert Gu, Yixuan Li, Christopher Re", "tldr": "", "abstract": "Classifiers in machine learning are often brittle when deployed. Particularly concerning are models with inconsistent performance on specific subgroups of a class, e.g., exhibiting disparities in skin cancer classification in the presence or absence of a spurious bandage. To mitigate these performance differences, we introduce model patching, a two-stage framework for improving robustness that encourages the model to be invariant to subgroup differences, and focus on class information shared by subgroups. Model patching first models subgroup features within a class and learns semantic transformations between them, and then trains a classifier with data augmentations that deliberately manipulate subgroup features. We instantiate model patching with CAMEL, which (1) uses a CycleGAN to learn the intra-class, inter-subgroup augmentations, and (2) balances subgroup performance using a theoretically-motivated subgroup consistency regularizer, accompanied by a new robust objective. We demonstrate CAMEL\u2019s effectiveness on 3 benchmark datasets, with reductions in robust error of up to 33% relative to the best baseline. Lastly, CAMEL successfully patches a model that fails due to spurious features on a real-world skin cancer dataset.", "keywords": "Robust Machine Learning;Data Augmentation;Consistency Training;Invariant Representations", "primary_area": "", "supplementary_material": "/attachment/3035e1e9b08d5a7cb5e12d5a81ab08a406d30bc4.zip", "author": "Karan Goel;Albert Gu;Yixuan Li;Christopher Re", "authorids": "~Karan_Goel1;~Albert_Gu1;~Yixuan_Li1;~Christopher_Re1", "gender": "M;M;F;", "homepage": "http://krandiash.github.io;;http://pages.cs.wisc.edu/~sharonli/;", "dblp": "175/1290;130/0612;144/6087-1;", "google_scholar": ";DVCHv1kAAAAJ;https://scholar.google.com/citations?hl=en;", "orcid": ";0000-0002-4946-6042;;", "linkedin": ";;liyixuan;", "or_profile": "~Karan_Goel1;~Albert_Gu1;~Yixuan_Li1;~Christopher_Re1", "aff": "Stanford University;Stanford University;Cornell University;", "aff_domain": "stanford.edu;stanford.edu;cornell.edu;", "position": "PhD student;PhD student;Graduate Student;", "bibtex": "@inproceedings{\ngoel2021model,\ntitle={Model Patching: Closing the Subgroup Performance Gap with Data Augmentation},\nauthor={Karan Goel and Albert Gu and Yixuan Li and Christopher Re},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9YlaeLfuhJF}\n}", "github": "[![github](/images/github_icon.svg) HazyResearch/model-patching](https://github.com/HazyResearch/model-patching)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "7;7;7;8", "confidence": "3;2;3;4", "wc_review": "267;549;456;475", "wc_reply_reviewers": "0;0;0;12", "wc_reply_authors": "575;561;328;522", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 7.25, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 436.75, 103.98166905757957 ], "wc_reply_reviewers_avg": [ 3.0, 5.196152422706632 ], "wc_reply_authors_avg": [ 496.5, 99.20307454912877 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 148, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=501938357242145399&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=9YlaeLfuhJF", "email": "stanford.edu;stanford.edu;cornell.edu;", "author_num": 4, "aff_unique_index": "0;0;1", "aff_unique_norm": "Stanford University;Cornell University", "aff_unique_dep": ";", "aff_unique_url": "https://www.stanford.edu;https://www.cornell.edu", "aff_unique_abbr": "Stanford;Cornell", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Stanford;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "9_J4DrgC_db", "title": "Deep Coherent Exploration For Continuous Control", "track": "main", "status": "Reject", "tldr": "", "abstract": "In policy search methods for reinforcement learning (RL), exploration is often performed by injecting noise either in action space at each step independently or in parameter space over each full trajectory. In prior work, it has been shown that with linear policies, a more balanced trade-off between these two exploration strategies is beneficial. However, that method did not scale to policies using deep neural networks. In this paper, we introduce Deep Coherent Exploration, a general and scalable exploration framework for deep RL algorithms on continuous control, that generalizes step-based and trajectory-based exploration. This framework models the last layer parameters of the policy network as latent variables and uses a recursive inference step within the policy update to handle these latent variables in a scalable manner. We find that Deep Coherent Exploration improves the speed and stability of learning of A2C, PPO, and SAC on several continuous control tasks.", "keywords": "reinforcement learning;exploration;latent variable models", "primary_area": "", "supplementary_material": "/attachment/9d141784be0b073b5d701d52a3528bfe8ef93a02.zip", "author": "Yijie Zhang;Herke van Hoof", "authorids": "~Yijie_Zhang1;~Herke_van_Hoof4", "gender": "M;M", "homepage": "https://sites.google.com/view/yijiezhang/home;https://staff.fnwi.uva.nl/h.c.vanhoof/", "dblp": ";123/6759", "google_scholar": "HVR4014AAAAJ;https://scholar.google.ca/citations?user=9owUkLYAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Yijie_Zhang1;~Herke_van_Hoof4", "aff": "University of Copenhagen;University of Amsterdam", "aff_domain": "di.ku.dk;uva.nl", "position": "PhD student;Assistant Professor", "bibtex": "@misc{\nzhang2021deep,\ntitle={Deep Coherent Exploration For Continuous Control},\nauthor={Yijie Zhang and Herke van Hoof},\nyear={2021},\nurl={https://openreview.net/forum?id=9_J4DrgC_db}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=9_J4DrgC_db", "pdf_size": 0, "rating": "4;4;7;7", "confidence": "4;3;2;3", "wc_review": "232;473;287;358", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "306;722;112;145", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 337.5, 90.08468238274473 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 321.25, 242.7358389278353 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3533811578867306615&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1", "aff_unique_norm": "University of Copenhagen;University of Amsterdam", "aff_unique_dep": ";", "aff_unique_url": "https://www.ku.dk;https://www.uva.nl", "aff_unique_abbr": "UCPH;UvA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Denmark;Netherlands" }, { "id": "9az9VKjOx00", "title": "TopoTER: Unsupervised Learning of Topology Transformation Equivariant Representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present the Topology Transformation Equivariant Representation (TopoTER) learning, a general paradigm of unsupervised learning of node representations of graph data for the wide applicability to Graph Convolutional Neural Networks (GCNNs). We formalize the TopoTER from an information-theoretic perspective, by maximizing the mutual information between topology transformations and node representations before and after the transformations. We derive that maximizing such mutual information can be relaxed to minimizing the cross entropy between the applied topology transformation and its estimation from node representations. In particular, we seek to sample a subset of node pairs from the original graph and flip the edge connectivity between each pair to transform the graph topology. Then, we self-train a representation encoder to learn node representations by reconstructing the topology transformations from the feature representations of the original and transformed graphs. In experiments, we apply the TopoTER to the downstream node and graph classification tasks, and results show that the TopoTER outperforms the state-of-the-art unsupervised approaches.", "keywords": "Unsupervised learning;node representations;mutual information", "primary_area": "", "supplementary_material": "", "author": "Xiang Gao;Wei Hu;Guo-Jun Qi", "authorids": "~Xiang_Gao2;~Wei_Hu6;~Guo-Jun_Qi1", "gender": "M;F;M", "homepage": ";http://www.wict.pku.edu.cn/huwei/;http://maple-lab.net/gqi/", "dblp": ";52/173-3.html;41/943", "google_scholar": ";https://scholar.google.com.hk/citations?user=5oFf8Q4AAAAJ;https://scholar.google.com.tw/citations?user=Nut-uvoAAAAJ", "orcid": "0000-0002-2679-4019;0000-0002-9860-0922;0000-0003-3508-1851", "linkedin": "gyshgx868/;;", "or_profile": "~Xiang_Gao2;~Wei_Hu6;~Guo-Jun_Qi1", "aff": "Peking University;;Futurewei Technologies", "aff_domain": "pku.edu.cn;;futurewei.com", "position": "PhD student;;Chief AI Scientist and Technical VP", "bibtex": "@misc{\ngao2021topoter,\ntitle={Topo{\\{}TER{\\}}: Unsupervised Learning of Topology Transformation Equivariant Representations},\nauthor={Xiang Gao and Wei Hu and Guo-Jun Qi},\nyear={2021},\nurl={https://openreview.net/forum?id=9az9VKjOx00}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=9az9VKjOx00", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;2;4;3", "wc_review": "481;135;591;91", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "547;20;713;8", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 324.5, 215.6078616377427 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 322.0, 313.5705662207472 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4264014327112209, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14273766096083605539&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Peking University;Futurewei Technologies", "aff_unique_dep": ";", "aff_unique_url": "http://www.pku.edu.cn;https://www.futurewei.com", "aff_unique_abbr": "Peking U;Futurewei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "China;United States" }, { "id": "9hgEG-k57Zj", "title": "Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent progress in offline reinforcement learning (RL) has made it possible to train strong RL agents from previously-collected, static datasets. However, depending on the quality of the trained agents and the application being considered, it is often desirable to improve such offline RL agents with further online interaction. As it turns out, fine-tuning offline RL agents is a non-trivial challenge, due to distribution shift \u2013 the agent encounters out-of-distribution samples during online interaction, which may cause bootstrapping error in Q-learning and instability during fine-tuning. In order to address the issue, we present a simple yet effective framework, which incorporates a balanced replay scheme and an ensemble distillation scheme. First, we propose to keep separate offline and online replay buffers, and carefully balance the number of samples from each buffer during updates. By utilizing samples from a wider distribution, i.e., both online and offline samples, we stabilize the Q-learning. Next, we present an ensemble distillation scheme, where we train an ensemble of independent actor-critic agents, then distill the policies into a single policy. In turn, we improve the policy using the Q-ensemble during fine-tuning, which allows the policy updates to be more robust to error in each individual Q-function. We demonstrate the superiority of our method on MuJoCo datasets from the recently proposed D4RL benchmark suite.\n", "keywords": "reinforcement learning;offline reinforcement learning;control;distribution shift", "primary_area": "", "supplementary_material": "", "author": "Seunghyun Lee;Younggyo Seo;Kimin Lee;Pieter Abbeel;Jinwoo Shin", "authorids": "~Seunghyun_Lee2;~Younggyo_Seo1;~Kimin_Lee1;~Pieter_Abbeel2;~Jinwoo_Shin1", "gender": "M;M;M;M;M", "homepage": "https://sites.google.com/view/seunghyun-lee/home;https://younggyo.me/;https://sites.google.com/view/kiminlee;https://people.eecs.berkeley.edu/~pabbeel/;https://sites.google.com/site/mijirim/", "dblp": "23/774;265/5586;183/6849;;31/7062", "google_scholar": "NOJNXdAAAAAJ;tI1-YwIAAAAJ;92M8xv4AAAAJ;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ;https://scholar.google.com.tw/citations?user=m3eDp7kAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Seunghyun_Lee2;~Younggyo_Seo1;~Kimin_Lee1;~Pieter_Abbeel2;~Jinwoo_Shin1", "aff": "Korea Advanced Institute of Science & Technology;Microsoft Research Asia;University of California, Berkeley;Covariant;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;microsoft.com;berkeley.edu;covariant.ai;kaist.ac.kr", "position": "MS student;Intern;Postdoc;Founder;Associate Professor", "bibtex": "@misc{\nlee2021addressing,\ntitle={Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets},\nauthor={Seunghyun Lee and Younggyo Seo and Kimin Lee and Pieter Abbeel and Jinwoo Shin},\nyear={2021},\nurl={https://openreview.net/forum?id=9hgEG-k57Zj}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=9hgEG-k57Zj", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "4;4;4;2", "wc_review": "1116;503;580;212", "wc_reply_reviewers": "520;0;0;0", "wc_reply_authors": "2083;1055;916;438", "reply_reviewers": "1;0;0;0", "reply_authors": "4;2;2;1", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 602.75, 326.56498204798385 ], "wc_reply_reviewers_avg": [ 130.0, 225.16660498395404 ], "wc_reply_authors_avg": [ 1123.0, 599.6453118302519 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 1.0897247358851685 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7745966692414834, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7069483268834015318&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;2;3;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology;Microsoft;University of California, Berkeley;Covariant", "aff_unique_dep": ";Research;;", "aff_unique_url": "https://www.kaist.ac.kr;https://www.microsoft.com/en-us/research/group/asia;https://www.berkeley.edu;", "aff_unique_abbr": "KAIST;MSR Asia;UC Berkeley;", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Asia;Berkeley", "aff_country_unique_index": "0;1;2;0", "aff_country_unique": "South Korea;China;United States;" }, { "title": "Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2760", "id": "9l0K4OM-oXE", "poster": "", "openreview": "https://openreview.net/forum?id=9l0K4OM-oXE", "slides": "https://iclr.cc/virtual/2021/poster/2760", "video": "https://iclr.cc/virtual/2021/poster/2760", "author_site": "Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, Xingjun Ma", "tldr": "", "abstract": "Deep neural networks (DNNs) are known vulnerable to backdoor attacks, a training time attack that injects a trigger pattern into a small proportion of training data so as to control the model's prediction at the test time. Backdoor attacks are notably dangerous since they do not affect the model's performance on clean examples, yet can fool the model to make the incorrect prediction whenever the trigger pattern appears during testing. In this paper, we propose a novel defense framework Neural Attention Distillation (NAD) to erase backdoor triggers from backdoored DNNs. NAD utilizes a teacher network to guide the finetuning of the backdoored student network on a small clean subset of data such that the intermediate-layer attention of the student network aligns with that of the teacher network. The teacher network can be obtained by an independent finetuning process on the same clean subset. We empirically show, against 6 state-of-the-art backdoor attacks, NAD can effectively erase the backdoor triggers using only 5\\% clean training data without causing obvious performance degradation on clean examples. Our code is available at https://github.com/bboylyg/NAD.", "keywords": "Backdoor Defense;Deep Neural Networks;Neural Attention Distillation", "primary_area": "", "supplementary_material": "/attachment/ee182e3a36e450b2ef5debb728e2ce5876dea5a0.zip", "author": "Yige Li;Xixiang Lyu;Nodens Koren;Lingjuan Lyu;Bo Li;Xingjun Ma", "authorids": "~Yige_Li1;xxlv@mail.xidian.edu.cn;~Nodens_Koren1;~Lingjuan_Lyu1;~Bo_Li19;~Xingjun_Ma1", "gender": "M;;;F;F;M", "homepage": ";;;https://sites.google.com/view/lingjuan-lyu;http://boli.cs.illinois.edu/;http://xingjunma.com/", "dblp": "01/2511;;;178/9876;50/3402-26;195/8270", "google_scholar": "h0cS2nQAAAAJ;;;;K8vJkTcAAAAJ;https://scholar.google.com.au/citations?user=XQViiyYAAAAJ", "orcid": ";;;;;", "linkedin": "%E4%B8%80%E6%88%88-%E6%9D%8E-78a53b1b8/;;;;;xingjun-ma-173532129/", "or_profile": "~Yige_Li1;xxlv@mail.xidian.edu.cn;~Nodens_Koren1;~Lingjuan_Lyu1;~Bo_Li19;~Xingjun_Ma1", "aff": "Xidian University;;;Sony;University of Illinois, Urbana Champaign;Deakin University", "aff_domain": "xidian.edu.cn;;;sony.com;illinois.edu;deakin.edu.au", "position": "PhD student;;;scientist;Assistant Professor;Assistant Professor", "bibtex": "@inproceedings{\nli2021neural,\ntitle={Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks},\nauthor={Yige Li and Xixiang Lyu and Nodens Koren and Lingjuan Lyu and Bo Li and Xingjun Ma},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9l0K4OM-oXE}\n}", "github": "[![github](/images/github_icon.svg) bboylyg/NAD](https://github.com/bboylyg/NAD)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;5;4", "wc_review": "270;421;373;712", "wc_reply_reviewers": "69;22;21;0", "wc_reply_authors": "722;211;442;517", "reply_reviewers": "1;1;1;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 444.0, 164.06553568620072 ], "wc_reply_reviewers_avg": [ 28.0, 25.248762345905195 ], "wc_reply_authors_avg": [ 473.0, 182.7169942835094 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 554, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11473045902984731830&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=9l0K4OM-oXE", "email": "xidian.edu.cn;;;sony.com;illinois.edu;deakin.edu.au", "author_num": 6, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Xidian University;Sony Corporation;University of Illinois Urbana-Champaign;Deakin University", "aff_unique_dep": ";;;", "aff_unique_url": "http://www.xidian.edu.cn/;https://www.sony.com;https://illinois.edu;https://www.deakin.edu.au", "aff_unique_abbr": "Xidian;Sony;UIUC;Deakin", "aff_campus_unique_index": "1", "aff_campus_unique": ";Urbana-Champaign", "aff_country_unique_index": "0;1;2;3", "aff_country_unique": "China;Japan;United States;Australia" }, { "id": "9l9WD4ahJgs", "title": "Automatic Data Augmentation for Generalization in Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep reinforcement learning (RL) agents often fail to generalize beyond their training environments. To alleviate this problem, recent work has proposed the use of data augmentation. However, different tasks tend to benefit from different types of augmentations and selecting the right one typically requires expert knowledge. In this paper, we introduce three approaches for automatically finding an effective augmentation for any RL task. These are combined with two novel regularization terms for the policy and value function, required to make the use of data augmentation theoretically sound for actor-critic algorithms. We evaluate our method on the Procgen benchmark which consists of 16 procedurally generated environments and show that it improves test performance by 40% relative to standard RL algorithms. Our approach also outperforms methods specifically designed to improve generalization in RL, thus setting a new state-of-the-art on Procgen. In addition, our agent learns policies and representations which are more robust to changes in the environment that are irrelevant for solving the task, such as the background. ", "keywords": "reinforcement learning;generalization;data augmentation", "primary_area": "", "supplementary_material": "/attachment/77ab2cccc5da44d1c07b8c5f66976a187519930d.zip", "author": "Roberta Raileanu;Maxwell Goldstein;Denis Yarats;Ilya Kostrikov;Rob Fergus", "authorids": "~Roberta_Raileanu2;~Maxwell_Goldstein1;~Denis_Yarats1;~Ilya_Kostrikov1;~Rob_Fergus1", "gender": ";M;M;M;F", "homepage": "https://wp.nyu.edu/cilvr/;http://denis-yarats.info/;;http://cs.nyu.edu/fergus/;https://rraileanu.github.io/", "dblp": ";200/8142;https://dblp.org/pers/k/Kostrikov:Ilya.html;77/3763;215/5579", "google_scholar": ";7kaXqgMAAAAJ;PTS2AOgAAAAJ;https://scholar.google.com.tw/citations?user=GgQ9GEkAAAAJ;9hVXpJ0AAAAJ", "orcid": ";;;;", "linkedin": ";;;;roberta-raileanu-44b25660/", "or_profile": "~Maxwell_Goldstein1;~Denis_Yarats1;~Ilya_Kostrikov1;~Rob_Fergus1;~Roberta_Raileanu1", "aff": "New York University;New York University;New York University;Google;New York University", "aff_domain": "nyu.edu;cs.nyu.edu;nyu.edu;google.com;nyu.edu", "position": "PhD student;PhD student;PhD student;Research scientist;PhD student", "bibtex": "@misc{\nraileanu2021automatic,\ntitle={Automatic Data Augmentation for Generalization in Reinforcement Learning},\nauthor={Roberta Raileanu and Maxwell Goldstein and Denis Yarats and Ilya Kostrikov and Rob Fergus},\nyear={2021},\nurl={https://openreview.net/forum?id=9l9WD4ahJgs}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=9l9WD4ahJgs", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "4;3;3;3", "wc_review": "917;288;440;240", "wc_reply_reviewers": "1243;0;0;0", "wc_reply_authors": "2646;438;591;205", "reply_reviewers": "2;0;0;0", "reply_authors": "4;1;1;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 471.25, 267.734359954041 ], "wc_reply_reviewers_avg": [ 310.75, 538.2347884520286 ], "wc_reply_authors_avg": [ 970.0, 977.3517790437586 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.9428090415820632, "gs_citation": 145, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11787479877857738831&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0;0;1;0", "aff_unique_norm": "New York University;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.nyu.edu;https://www.google.com", "aff_unique_abbr": "NYU;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "9nIulvlci5", "title": "Neural Random Projection: From the Initial Task To the Input Similarity Problem", "track": "main", "status": "Reject", "tldr": "", "abstract": "The data representation plays an important role in evaluating similarity between objects. In this paper, we propose a novel approach for implicit data representation to evaluate similarity of input data using a trained neural network. In contrast to the previous approach, which uses gradients for representation, we utilize only the outputs from the last hidden layer of a neural network and do not use a backward step. The proposed technique explicitly takes into account the initial task and significantly reduces the size of the vector representation, as well as the computation time. Generally, a neural network obtains representations related only to the problem being solved, which makes the last hidden layer representation useless for input similarity task.\nIn this paper, we consider two reasons for the decline in the quality of representations: correlation between neurons and insufficient size of the last hidden layer. To reduce the correlation between neurons we use orthogonal weight initialization for each layer and modify the loss function to ensure orthogonality of the weights during training. Moreover, we show that activation functions can potentially increase correlation. To solve this problem, we apply modified Batch-Normalization with Dropout. Using orthogonal weight matrices allow us to consider such neural networks as an application of the Random Projection method and get a lower bound estimate for the size of the last hidden layer. We perform experiments on MNIST and physical examination datasets. In both experiments, initially, we split a set of labels into two disjoint subsets to train a neural network for binary classification problem, and then use this model to measure similarity between input data and define hidden classes. We also cluster the inputs to evaluate how well objects from the same hidden class are grouped together. Our experimental results show that the proposed approach achieves competitive results on the input similarity task while reducing both computation time and the size of the input representation.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/c6a5e905d0153c85b325170778c05234d8b5a600.zip", "author": "Alan Savushkin;Nikita Benkovich;Dmitry Golubev", "authorids": "~Alan_Savushkin1;nikita.benkovich@kaspersky.com;dmitry.s.golubev@kaspersky.com", "gender": ";;", "homepage": "https://vk.com/alansavushkin;;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Alan_Savushkin1;nikita.benkovich@kaspersky.com;dmitry.s.golubev@kaspersky.com", "aff": "Lomonosov Moscow State University;;", "aff_domain": "msu.ru;;", "position": "PhD student;;", "bibtex": "@misc{\nsavushkin2021neural,\ntitle={Neural Random Projection: From the Initial Task To the Input Similarity Problem},\nauthor={Alan Savushkin and Nikita Benkovich and Dmitry Golubev},\nyear={2021},\nurl={https://openreview.net/forum?id=9nIulvlci5}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=9nIulvlci5", "pdf_size": 0, "rating": "3;4;7", "confidence": "4;4;3", "wc_review": "615;440;230", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "607;304;51", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.666666666666667, 1.699673171197595 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 428.3333333333333, 157.39193823770714 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 320.6666666666667, 227.29178505954755 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9707253433941508, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:oibM9GSXBVQJ:scholar.google.com/&scioq=Neural+Random+Projection:+From+the+Initial+Task+To+the+Input+Similarity+Problem&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Lomonosov Moscow State University", "aff_unique_dep": "", "aff_unique_url": "https://www.msu.ru", "aff_unique_abbr": "MSU", "aff_campus_unique_index": "0", "aff_campus_unique": "Moscow", "aff_country_unique_index": "0", "aff_country_unique": "Russian Federation" }, { "id": "9p2CltauWEY", "title": "On Size Generalization in Graph Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph neural networks (GNNs) can process graphs of different sizes but their capacity to generalize across sizes is still not well understood. Size generalization is key to numerous GNN applications, from solving combinatorial optimization problems to learning in molecular biology. In such problems, obtaining labels and training on large graphs can be prohibitively expensive, but training on smaller graphs is possible. \n\nThis paper puts forward the size-generalization question and characterizes important aspects of that problem theoretically and empirically.\nWe prove that even for very simple tasks, such as counting the number of nodes or edges in a graph, GNNs do not naturally generalize to graphs of larger size. Instead, their generalization performance is closely related to the distribution of local patterns of connectivity and features and how that distribution changes from small to large graphs. Specifically, we prove that for many tasks, there are weight assignments for GNNs that can perfectly solve the task on small graphs but fail on large graphs, if there is a discrepancy between their local patterns. We further demonstrate on several tasks, that training GNNs on small graphs results in solutions which do not generalize to larger graphs. We then formalize size generalization as a domain-adaption problem and describe two learning setups where size generalization can be improved. First, as a self-supervised learning problem (SSL) over the target domain of large graphs. Second as a semi-supervised learning problem when few samples are available in the target domain. We demonstrate the efficacy of these solutions on a diverse set of benchmark graph datasets. ", "keywords": "graph neural networks;gnn;generalization;Weisfeiler-Lehman", "primary_area": "", "supplementary_material": "", "author": "Gilad Yehudai;Ethan Fetaya;Eli Meirom;Gal Chechik;Haggai Maron", "authorids": "~Gilad_Yehudai2;~Ethan_Fetaya1;~Eli_Meirom2;~Gal_Chechik1;~Haggai_Maron1", "gender": "M;M;;;M", "homepage": ";http://www.cs.toronto.edu/~ethanf/;;https://chechiklab.biu.ac.il/~gal/;https://haggaim.github.io/", "dblp": "239/4344;01/10046;132/8961;c/GalChechik;181/6629", "google_scholar": "opVT1qkAAAAJ;zLuqh-0AAAAJ;ZYEgD7wAAAAJ;Wk2gAZUAAAAJ;https://scholar.google.co.il/citations?user=4v8uJrIAAAAJ", "orcid": ";0000-0003-3125-1665;;0000-0001-9164-5303;", "linkedin": ";;;;", "or_profile": "~Gilad_Yehudai2;~Ethan_Fetaya1;~Eli_Meirom2;~Gal_Chechik1;~Haggai_Maron1", "aff": "Weizmann Institute of Science;Bar Ilan University;NVIDIA;NVIDIA;NVIDIA", "aff_domain": "weizmann.ac.il;biu.ac.il;nvidia.com;nvidia.com;nvidia.com", "position": "PhD student;Assistant Professor;Researcher;Principal Researcher;Research Scientist", "bibtex": "@misc{\nyehudai2021on,\ntitle={On Size Generalization in Graph Neural Networks},\nauthor={Gilad Yehudai and Ethan Fetaya and Eli Meirom and Gal Chechik and Haggai Maron},\nyear={2021},\nurl={https://openreview.net/forum?id=9p2CltauWEY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=9p2CltauWEY", "pdf_size": 0, "rating": "4;5;5;7", "confidence": "5;4;3;3", "wc_review": "517;563;505;559", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "802;828;743;350", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 536.0, 25.39685019840059 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 680.75, 193.42618100970716 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7608859102526822, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11313912761449354467&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;2;2", "aff_unique_norm": "Weizmann Institute of Science;Bar-Ilan University;NVIDIA", "aff_unique_dep": ";;NVIDIA Corporation", "aff_unique_url": "https://www.weizmann.org.il;https://www.biu.ac.il;https://www.nvidia.com", "aff_unique_abbr": "Weizmann;BIU;NVIDIA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;1;1", "aff_country_unique": "Israel;United States" }, { "title": "Representation Learning via Invariant Causal Mechanisms", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3350", "id": "9p2ekP904Rs", "poster": "", "openreview": "https://openreview.net/forum?id=9p2ekP904Rs", "slides": "https://iclr.cc/virtual/2021/poster/3350", "video": "https://iclr.cc/virtual/2021/poster/3350", "author_site": "Jovana Mitrovic, Brian McWilliams, Jacob C Walker, Lars Buesing, Charles Blundell", "tldr": "", "abstract": "Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data. These methods combine heuristic proxy classification tasks with data augmentations and have achieved significant success, but our theoretical understanding of this success remains limited. In this paper we analyze self-supervised representation learning using a causal framework. We show how data augmentations can be more effectively utilized through explicit invariance constraints on the proxy classifiers employed during pretraining. Based on this, we propose a novel self-supervised objective, Representation Learning via Invariant Causal Mechanisms (ReLIC), that enforces invariant prediction of proxy targets across augmentations through an invariance regularizer which yields improved generalization guarantees. Further, using causality we generalize contrastive learning, a particular kind of self-supervised method, and provide an alternative theoretical explanation for the success of these methods. Empirically, ReLIC significantly outperforms competing methods in terms of robustness and out-of-distribution generalization on ImageNet, while also significantly outperforming these methods on Atari achieving above human-level performance on 51 out of 57 games.", "keywords": "Representation Learning;Self-supervised Learning;Contrastive Methods;Causality", "primary_area": "", "supplementary_material": "/attachment/ad5f69b153c672dbc420d8cb6357c7ca92830209.zip", "author": "Jovana Mitrovic;Brian McWilliams;Jacob C Walker;Lars Holger Buesing;Charles Blundell", "authorids": "~Jovana_Mitrovic1;~Brian_McWilliams2;~Jacob_C_Walker1;~Lars_Holger_Buesing1;~Charles_Blundell1", "gender": ";M;;M;", "homepage": "http://jovana-mitrovic.github.io;https://sites.google.com/view/mcbrian/;;;http://www.gatsby.ucl.ac.uk/~ucgtcbl/", "dblp": "176/5114;;135/1696;https://dblp.uni-trier.de/pers/hd/b/Buesing:Lars;35/8396", "google_scholar": ";https://scholar.google.ch/citations?user=IS4VSXAAAAAJ;0dR_wD0AAAAJ;1h_mxPMAAAAJ;https://scholar.google.co.uk/citations?user=f31mvPsAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Jovana_Mitrovic1;~Brian_McWilliams2;~Jacob_C_Walker1;~Lars_Holger_Buesing1;~Charles_Blundell1", "aff": "Google DeepMind;Deepmind;Google;Deepmind;Google DeepMind", "aff_domain": "google.com;google.com;google.com;google.com;google.com", "position": "Research Scientist;Research Scientist;Research Scientist;Postdoc;Research Scientist", "bibtex": "@inproceedings{\nmitrovic2021representation,\ntitle={Representation Learning via Invariant Causal Mechanisms},\nauthor={Jovana Mitrovic and Brian McWilliams and Jacob C Walker and Lars Holger Buesing and Charles Blundell},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9p2ekP904Rs}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;4;4;4", "wc_review": "997;770;501;174", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "735;635;323;293", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 610.5, 307.1420681052988 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 496.5, 192.08006143272652 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 301, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9927624035495201614&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=9p2ekP904Rs", "email": "google.com;google.com;google.com;google.com;google.com", "author_num": 5, "aff_unique_index": "0;1;0;1;0", "aff_unique_norm": "Google;DeepMind", "aff_unique_dep": "Google DeepMind;", "aff_unique_url": "https://deepmind.com;https://deepmind.com", "aff_unique_abbr": "DeepMind;DeepMind", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;1;0;0", "aff_country_unique": "United Kingdom;United States" }, { "title": "Vulnerability-Aware Poisoning Mechanism for Online RL with Unknown Dynamics", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2665", "id": "9r30XCjf5Dt", "poster": "", "openreview": "https://openreview.net/forum?id=9r30XCjf5Dt", "slides": "https://iclr.cc/virtual/2021/poster/2665", "video": "https://iclr.cc/virtual/2021/poster/2665", "author_site": "Yanchao Sun, Da Huo, Furong Huang", "tldr": "", "abstract": "Poisoning attacks on Reinforcement Learning (RL) systems could take advantage of RL algorithm\u2019s vulnerabilities and cause failure of the learning. However, prior works on poisoning RL usually either unrealistically assume the attacker knows the underlying Markov Decision Process (MDP), or directly apply the poisoning methods in supervised learning to RL. In this work, we build a generic poisoning framework for online RL via a comprehensive investigation of heterogeneous poisoning models in RL. Without any prior knowledge of the MDP, we propose a strategic poisoning algorithm called Vulnerability-Aware Adversarial Critic Poison (VA2C-P), which works for on-policy deep RL agents, closing the gap that no poisoning method exists for policy-based RL agents. VA2C-P uses a novel metric, stability radius in RL, that measures the vulnerability of RL algorithms. Experiments on multiple deep RL agents and multiple environments show that our poisoning algorithm successfully prevents agents from learning a good policy or teaches the agents to converge to a target policy, with a limited attacking budget.", "keywords": "poisoning attack;policy gradient;vulnerability of RL;deep RL", "primary_area": "", "supplementary_material": "/attachment/0efe08dc370f47ca747671cdcd4f823f477ede64.zip", "author": "Yanchao Sun;Da Huo;Furong Huang", "authorids": "~Yanchao_Sun1;~Da_Huo1;~Furong_Huang1", "gender": "F;M;F", "homepage": "https://ycsun2017.github.io/home/index.html;https://github.com/sjtuhuoda;https://furong-huang.com", "dblp": "132/6840;;72/8513", "google_scholar": "bloBY_QAAAAJ;;13yyuCcAAAAJ", "orcid": "0000-0002-1137-9939;;", "linkedin": ";;", "or_profile": "~Yanchao_Sun1;~Da_Huo1;~Furong_Huang1", "aff": "University of Maryland, College Park;Shanghai Jiaotong University;University of Maryland", "aff_domain": "umd.edu;sjtu.edu.cn;cs.umd.edu", "position": "PhD student;Undergrad student;Assistant Professor", "bibtex": "@inproceedings{\nsun2021vulnerabilityaware,\ntitle={Vulnerability-Aware Poisoning Mechanism for Online {\\{}RL{\\}} with Unknown Dynamics},\nauthor={Yanchao Sun and Da Huo and Furong Huang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9r30XCjf5Dt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;3;3;5", "wc_review": "514;302;1104;574", "wc_reply_reviewers": "182;0;30;0", "wc_reply_authors": "2102;588;2606;778", "reply_reviewers": "3;0;1;0", "reply_authors": "7;1;6;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 623.5, 295.24693055136066 ], "wc_reply_reviewers_avg": [ 53.0, 75.47847375245475 ], "wc_reply_authors_avg": [ 1518.5, 856.9275056852825 ], "reply_reviewers_avg": [ 1.0, 1.224744871391589 ], "reply_authors_avg": [ 3.75, 2.7726341266023544 ], "replies_avg": [ 25, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.8703882797784891, "gs_citation": 57, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1597225910956727416&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=9r30XCjf5Dt", "email": "umd.edu;sjtu.edu.cn;cs.umd.edu", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Maryland;Shanghai Jiao Tong University", "aff_unique_dep": ";", "aff_unique_url": "https://www/umd.edu;https://www.sjtu.edu.cn", "aff_unique_abbr": "UMD;SJTU", "aff_campus_unique_index": "0", "aff_campus_unique": "College Park;", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;China" }, { "id": "9sF3n8eAco", "title": "All-You-Can-Fit 8-Bit Flexible Floating-Point Format for Accurate and Memory-Efficient Inference of Deep Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Modern deep neural network (DNN) models generally require a huge amount of weight and activation values to achieve good inference outcomes. Those data inevitably demand a massive off-chip memory capacity/bandwidth, and the situation gets even worse if they are represented in high-precision floating-point formats. Effort has been made for representing those data in different 8-bit floating-point formats, nevertheless, a notable accuracy loss is still unavoidable. In this paper we introduce an extremely flexible 8-bit floating-point (FFP8) format whose defining factors \u2013 the bit width of exponent/fraction field, the exponent bias, and even the presence of the sign bit \u2013 are all configurable. We also present a methodology to properly determine those factors so that the accuracy of model inference can be maximized. The foundation of this methodology is based on a key observation \u2013 both the maximum magnitude and the value distribution are quite dissimilar between weights and activations in most DNN models. Experimental results demonstrate that the proposed FFP8 format achieves an extremely low accuracy loss of $0.1\\%\\sim 0.3\\%$ for several representative image classification models even without the need of model retraining. Besides, it is easy to turn a classical floating-point processing unit into an FFP8-compliant one, and the extra hardware cost is minor.", "keywords": "8-bit floating-point format;accuracy loss minimization;numerics;memory-efficient inference;deep learning", "primary_area": "", "supplementary_material": "", "author": "Juinn-Dar Huang;Cheng-Wei Huang;Tim-Wei Chen", "authorids": "~Juinn-Dar_Huang1;~Cheng-Wei_Huang1;~Tim-Wei_Chen1", "gender": "M;;M", "homepage": ";https://www.facebook.com/haha123465/;https://www.facebook.com/profile.php?id=100000427022940", "dblp": "43/453.html;;", "google_scholar": "wldtZtsAAAAJ;;", "orcid": "0000-0001-5961-7863;;", "linkedin": ";;", "or_profile": "~Juinn-Dar_Huang1;~Tim-Wei_Chen1;~Cheng_Wei_Huang1", "aff": "National Yang Ming Chiao Tung University;National Yang Ming Chiao Tung University;National Chiao Tung University", "aff_domain": "nycu.edu.tw;nctu.edu.tw;nctu.edu.tw", "position": "Full Professor;MS student;MS student", "bibtex": "@misc{\nhuang2021allyoucanfit,\ntitle={All-You-Can-Fit 8-Bit Flexible Floating-Point Format for Accurate and Memory-Efficient Inference of Deep Neural Networks},\nauthor={Juinn-Dar Huang and Cheng-Wei Huang and Tim-Wei Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=9sF3n8eAco}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=9sF3n8eAco", "pdf_size": 0, "rating": "3;4;6;7", "confidence": "5;4;4;3", "wc_review": "413;399;186;387", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1958;1348;582;199", "reply_reviewers": "0;0;0;0", "reply_authors": "3;3;1;1", "rating_avg": [ 5.0, 1.5811388300841898 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 346.25, 92.97681162526493 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1021.75, 680.6799449814869 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.894427190999916, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11963830791453999847&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "National Yang Ming Chiao Tung University;National Chiao Tung University", "aff_unique_dep": ";", "aff_unique_url": "https://www.nycu.edu.tw;https://www.nctu.edu.tw", "aff_unique_abbr": "NYCU;NCTU", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Taiwan", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "9t0CV2iD5gE", "title": "Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is detected, that is, the iterates are likely to bounce at around a vicinity of a local minimum. The detection is performed by splitting the single thread into two and using the inner product of the gradients from the two threads as a measure of stationarity. Owing to this simple yet provably valid stationarity detection, SplitSGD is easy- to-implement and essentially does not incur additional computational cost than standard SGD. Through a series of extensive experiments, we show that this method is appropriate for both convex problems and training (non-convex) neural networks, with performance compared favorably to other stochastic optimization methods. Importantly, this method is observed to be very robust with a set of default parameters for a wide range of problems and, moreover, yields better generalization performance than other adaptive gradient methods such as Adam.", "keywords": "Optimization;Deep Learning;Stationarity;Adaptive", "primary_area": "", "supplementary_material": "/attachment/d589c6e4e170bcb77e2db70004603b7b50a3fb6e.zip", "author": "Matteo Sordello;Hangfeng He;Weijie J Su", "authorids": "~Matteo_Sordello1;~Hangfeng_He3;~Weijie_J_Su1", "gender": "M;M;M", "homepage": "https://www.matteosordello.com/;https://hornhehhf.github.io;http://stat.wharton.upenn.edu/~suw/", "dblp": ";190/7762-1.html;228/9127", "google_scholar": "fDNznZoAAAAJ;BbpI6QoAAAAJ;Uhf4nBkAAAAJ", "orcid": ";0000-0001-5136-1218;", "linkedin": "matteosordello/;;", "or_profile": "~Matteo_Sordello1;~Hangfeng_He3;~Weijie_J_Su1", "aff": "University of Pennsylvania;University of Pennsylvania;University of Pennsylvania", "aff_domain": "upenn.edu;upenn.edu;upenn.edu", "position": "PhD student;PhD student;Assistant Professor", "bibtex": "@misc{\nsordello2021robust,\ntitle={Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic},\nauthor={Matteo Sordello and Hangfeng He and Weijie J Su},\nyear={2021},\nurl={https://openreview.net/forum?id=9t0CV2iD5gE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=9t0CV2iD5gE", "pdf_size": 0, "rating": "3;5;7;7", "confidence": "4;4;3;3", "wc_review": "507;507;455;621", "wc_reply_reviewers": "432;0;0;0", "wc_reply_authors": "1438;834;405;537", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.5, 1.6583123951777 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 522.5, 60.70214164261422 ], "wc_reply_reviewers_avg": [ 108.0, 187.06148721743875 ], "wc_reply_authors_avg": [ 803.5, 397.91487783193026 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=580558404104635569&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Pennsylvania", "aff_unique_dep": "", "aff_unique_url": "https://www.upenn.edu", "aff_unique_abbr": "UPenn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Evaluation of Similarity-based Explanations", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2619", "id": "9uvhpyQwzM_", "poster": "", "openreview": "https://openreview.net/forum?id=9uvhpyQwzM_", "slides": "https://iclr.cc/virtual/2021/poster/2619", "video": "https://iclr.cc/virtual/2021/poster/2619", "author_site": "Kazuaki Hanawa, Sho Yokoi, Satoshi Hara, Kentaro Inui", "tldr": "", "abstract": "Explaining the predictions made by complex machine learning models helps users to understand and accept the predicted outputs with confidence. One promising way is to use similarity-based explanation that provides similar instances as evidence to support model predictions. Several relevance metrics are used for this purpose. In this study, we investigated relevance metrics that can provide reasonable explanations to users. Specifically, we adopted three tests to evaluate whether the relevance metrics satisfy the minimal requirements for similarity-based explanation. Our experiments revealed that the cosine similarity of the gradients of the loss performs best, which would be a recommended choice in practice. In addition, we showed that some metrics perform poorly in our tests and analyzed the reasons of their failure. We expect our insights to help practitioners in selecting appropriate relevance metrics and also aid further researches for designing better relevance metrics for explanations.", "keywords": "Interpretability;Explainability", "primary_area": "", "supplementary_material": "", "author": "Kazuaki Hanawa;Sho Yokoi;Satoshi Hara;Kentaro Inui", "authorids": "~Kazuaki_Hanawa1;~Sho_Yokoi1;~Satoshi_Hara1;~Kentaro_Inui1", "gender": "M;;M;M", "homepage": ";http://www.cl.ecei.tohoku.ac.jp/~yokoi/;https://sites.google.com/site/sato9hara/;http://www.cl.ecei.tohoku.ac.jp/~inui/", "dblp": "199/1765;184/8316;08/778-1;90/3315", "google_scholar": ";https://scholar.google.co.jp/citations?user=EW2QPKoAAAAJ;https://scholar.google.co.jp/citations?user=ELhfkiMAAAAJ;https://scholar.google.co.jp/citations?user=38_o3-kAAAAJ", "orcid": ";0009-0002-4437-5245;;0000-0001-6510-604X", "linkedin": ";shoyokoi/;;kentaro-inui-52401a31/", "or_profile": "~Kazuaki_Hanawa1;~Sho_Yokoi1;~Satoshi_Hara1;~Kentaro_Inui1", "aff": "Tohoku University;Tohoku University;Osaka University;Tohoku University", "aff_domain": "tohoku.ac.jp;tohoku.ac.jp;osaka-u.ac.jp;tohoku.ac.jp", "position": "PhD student;Assistant Professor;Associate Professor;Full Professor", "bibtex": "@inproceedings{\nhanawa2021evaluation,\ntitle={Evaluation of Similarity-based Explanations},\nauthor={Kazuaki Hanawa and Sho Yokoi and Satoshi Hara and Kentaro Inui},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9uvhpyQwzM_}\n}", "github": "[![github](/images/github_icon.svg) k-hanawa/criteria_for_instance_based_explanation](https://github.com/k-hanawa/criteria_for_instance_based_explanation) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=9uvhpyQwzM_)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;3;3;4", "wc_review": "475;559;302;192", "wc_reply_reviewers": "0;36;0;0", "wc_reply_authors": "457;679;500;264", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 382.0, 143.59491634455588 ], "wc_reply_reviewers_avg": [ 9.0, 15.588457268119896 ], "wc_reply_authors_avg": [ 475.0, 147.55168585956582 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 67, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2157018204021335072&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=9uvhpyQwzM_", "email": "tohoku.ac.jp;tohoku.ac.jp;osaka-u.ac.jp;tohoku.ac.jp", "author_num": 4, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "Tohoku University;Osaka University", "aff_unique_dep": ";", "aff_unique_url": "https://www.tohoku.ac.jp;https://www.osaka-u.ac.jp", "aff_unique_abbr": "Tohoku U;Osaka U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Japan" }, { "id": "9vCLOXwprc", "title": "Iterated graph neural network system", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present Iterated Graph Neural Network System (IGNNS), a new framework of Graph Neural Networks (GNNs), which can deal with undirected graph and directed graph in a unified way. The core component of IGNNS is the Iterated Function System (IFS), which is an important research field in fractal geometry. The key idea of IGNNS is to use a pair of affine transformations to characterize the process of message passing between graph nodes and assign an adjoint probability vector to them to form an IFS layer with probability. After embedding in the latent space, the node features are sent to IFS layer for iterating, and then obtain the high-level representation of graph nodes. We also analyze the geometric properties of IGNNS from the perspective of dynamical system. We prove that if the IFS induced by IGNNS is contractive, then the fractal representation of graph nodes converges to the fractal set of IFS in Hausdorff distance and the ergodic representation of that converges to a constant matrix in Frobenius norm. We have carried out a series of semi supervised node classification experiments on citation network datasets such as citeser, Cora and PubMed. The experimental results show that the performance of our method is obviously better than the related methods.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/b7bdc400b82470a3a975401de0d7845a2d4b6543.zip", "author": "Hanju Li", "authorids": "~Hanju_Li1", "gender": "", "homepage": "https://xueshu.baidu.com/scholarID/CN-BZF0S81K", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~Hanju_Li1", "aff": "", "aff_domain": "", "position": "", "bibtex": "@misc{\nli2021iterated,\ntitle={Iterated graph neural network system},\nauthor={Hanju Li},\nyear={2021},\nurl={https://openreview.net/forum?id=9vCLOXwprc}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=9vCLOXwprc", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;3;4;2", "wc_review": "219;572;489;298", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1078;1550;1242;538", "reply_reviewers": "0;0;0;0", "reply_authors": "2;3;3;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 394.5, 141.90577859974553 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1102.0, 367.0749242320973 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.4545454545454545, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:mEIe3h1UNv8J:scholar.google.com/&scioq=Iterated+graph+neural+network+system&hl=en&as_sdt=0,5", "gs_version_total": 0 }, { "id": "9w03rTs7w5", "title": "Transfer among Agents: An Efficient Multiagent Transfer Learning Framework", "track": "main", "status": "Reject", "tldr": "", "abstract": "Transfer Learning has shown great potential to enhance the single-agent Reinforcement Learning (RL) efficiency, by sharing learned policies of previous tasks. Similarly, in multiagent settings, the learning performance can also be promoted if agents can share knowledge between each other. However, it remains an open question of how an agent should learn from other agents' knowledge. In this paper, we propose a novel multiagent option-based policy transfer (MAOPT) framework to improve multiagent learning efficiency. Our framework learns what advice to give to each agent and when to terminate it by modeling multiagent policy transfer as the option learning problem. MAOPT provides different kinds of variants which can be classified into two types in terms of the experience used during training. One type is the MAOPT with the Global Option Advisor which has the access to the global information of the environment. However, in many realistic scenarios, we can only obtain each agent's local information due to the partial observation. The other type contains MAOPT with the Local Option Advisor and MAOPT with the Successor Representation Option (SRO) which are suitable for this setting and collect each agent's local experience for the update. In many cases, each agent's experience is inconsistent with each other which causes the option-value estimation to oscillate and to become inaccurate. SRO is used to handle the experience inconsistency by decoupling the dynamics of the environment from the rewards to learn the option-value function under each agent's preference. MAOPT can be easily combined with existing deep RL approaches. Experimental results show it significantly boosts the performance of existing deep RL methods in both discrete and continuous state spaces.", "keywords": "Multiagent learning;transfer learning;reinforcement learning", "primary_area": "", "supplementary_material": "", "author": "Tianpei Yang;Jianye HAO;Weixun Wang;Hongyao Tang;Zhaopeng Meng;Hangyu Mao;Dong Li;Wulong Liu;Yujing Hu;Yingfeng Chen;Changjie Fan", "authorids": "~Tianpei_Yang1;~Jianye_HAO1;~Weixun_Wang1;~Hongyao_Tang1;~Zhaopeng_Meng1;maohangyu1@huawei.com;lidong106@huawei.com;~Wulong_Liu1;huyujing@corp.netease.com;~Yingfeng_Chen1;~Changjie_Fan1", "gender": "F;M;;M;;;;M;;;M", "homepage": "https://tianpeiyang.github.io/;http://www.icdai.org/jianye.html;http://n.musk.ndu.com;https://bluecontra.github.io/;http://cic.tju.edu.cn/info/1104/1205.htm;;;;;;", "dblp": "184/8221;21/7664.html;84/998;220/4275;67/8175;;;36/9257.html;;;71/882", "google_scholar": "https://scholar.google.com/citations?hl=zh-CN;;;yIqzRH4AAAAJ;;;;https://scholar.google.ca/citations?user=od00FfIAAAAJ;;;", "orcid": "0000-0002-5497-7146;0000-0002-0422-8235;;;;;;;;;0000-0001-5420-0516", "linkedin": "tianpei-yang/;;;;;;;wulong-liu-28006155/;;;", "or_profile": "~Tianpei_Yang1;~Jianye_HAO1;~Weixun_Wang1;~Hongyao_Tang1;~Zhaopeng_Meng1;maohangyu1@huawei.com;lidong106@huawei.com;~Wulong_Liu1;huyujing@corp.netease.com;~Yingfeng_Chen1;~Changjie_Fan1", "aff": "Tianjin University;Tianjin University;Tianjin University;Noah's Ark Lab, Huawei;Tianjin University;;;Huawei Noah's Ark Lab;;;Netease, Fuxi AI Lab", "aff_domain": "tju.edu.cn;tju.edu.cn;tju.edu.cn;huawei.com;tju.edu.cn;;;huawei.com;;;corp.netease.com", "position": "PhD student;Associate Professor;PhD student;Researcher;Full Professor;;;Researcher;;;Principal Researcher", "bibtex": "@misc{\nyang2021transfer,\ntitle={Transfer among Agents: An Efficient Multiagent Transfer Learning Framework},\nauthor={Tianpei Yang and Jianye HAO and Weixun Wang and Hongyao Tang and Zhaopeng Meng and Hangyu Mao and Dong Li and Wulong Liu and Yujing Hu and Yingfeng Chen and Changjie Fan},\nyear={2021},\nurl={https://openreview.net/forum?id=9w03rTs7w5}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer5", "site": "https://openreview.net/forum?id=9w03rTs7w5", "pdf_size": 0, "rating": "4;6;6;6;6", "confidence": "4;3;3;5;3", "wc_review": "375;448;403;433;410", "wc_reply_reviewers": "0;0;0;0;182", "wc_reply_authors": "1310;321;185;389;280", "reply_reviewers": "0;0;0;0;1", "reply_authors": "3;1;2;1;2", "rating_avg": [ 5.6, 0.7999999999999999 ], "confidence_avg": [ 3.6, 0.8 ], "wc_review_avg": [ 413.8, 25.198412648418945 ], "wc_reply_reviewers_avg": [ 36.4, 72.8 ], "wc_reply_authors_avg": [ 497.0, 411.83540401475926 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 1.8, 0.7483314773547883 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 11, 0 ], "corr_rating_confidence": -0.25, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2789720231803497797&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;1;0;1;2", "aff_unique_norm": "Tianjin University;Huawei;Netease", "aff_unique_dep": ";Noah's Ark Lab;Fuxi AI Lab", "aff_unique_url": "http://www.tju.edu.cn;https://www.huawei.com;https://www.netease.com", "aff_unique_abbr": "TJU;Huawei;Netease", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0;0", "aff_country_unique": "China" }, { "id": "9wHe4F-lpp", "title": "FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond", "track": "main", "status": "Reject", "tldr": "", "abstract": "Binary neural networks (BNNs), where both weights and activations are binarized into 1 bit, have been widely studied in recent years due to its great benefit of highly accelerated computation and substantially reduced memory footprint that appeal to the development of resource constrained devices. In contrast to previous methods tending to reduce the quantization error for training BNN structures, we argue that the binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability. In this paper, we re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance on the large-scale ImageNet dataset in terms of accuracy and training efficiency. To go further, we find that the proposed BNN model still has much potential to be compressed by making a better use of the efficient binary operations, without losing accuracy. In addition, the limited capacity of the BNN model can also be increased with the help of group execution. Based on these insights, we are able to improve the baseline with an additional 4$\\sim$5% top-1 accuracy gain even with less computational cost. Our code and all trained models will be made public.", "keywords": "Binary neural networks;network quantization;network compression", "primary_area": "", "supplementary_material": "", "author": "Zhuo Su;Linpu Fang;Deke Guo;Dewen Hu;Matti Pietik\u00e4inen;Li Liu", "authorids": "~Zhuo_Su2;~Linpu_Fang1;~Deke_Guo1;~Dewen_Hu2;~Matti_Pietik\u00e4inen2;~Li_Liu9", "gender": ";M;M;M;M;F", "homepage": "https://zhuogege1943.com/homepage;https://github.com/fanglinpu;https://dekeguo.github.io/;;https://en.wikipedia.org/wiki/Matti_Pietik%C3%A4inen_(academic);http://lilyliliu.com/Default.aspx", "dblp": "02/10578-2;;74/6501;10/6414.html;https://dblp.org/pers/p/Pietik=auml=inen:Matti.html;33/4528-2.html", "google_scholar": "EgeikIgAAAAJ;;;;bjEpXBoAAAAJ;https://scholar.google.com.au/citations?user=9cMQrVsAAAAJ", "orcid": "0000-0002-6448-0651;;;;;0000-0002-2011-2873", "linkedin": ";;;;;", "or_profile": "~Zhuo_Su2;~Linpu_Fang1;~Deke_Guo1;~Dewen_Hu2;~Matti_Pietik\u00e4inen2;~Li_Liu9", "aff": "Oulu University;South China University of Technology;National University of Defense Technology;National University of Defense Technology;University of Oulu;National University of Defense Technology", "aff_domain": "oulu.fi;scut.edu.cn;nudt.edu.cn;nudt.edu.cn;oulu.fi;nudt.edu.cn", "position": "PhD student;PhD student;Full Professor;Full Professor;Emeritus;Full Professor", "bibtex": "@misc{\nsu2021ftbnn,\ntitle={{\\{}FTBNN{\\}}: Rethinking Non-linearity for 1-bit {\\{}CNN{\\}}s and Going Beyond},\nauthor={Zhuo Su and Linpu Fang and Deke Guo and Dewen Hu and Matti Pietik{\\\"a}inen and Li Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=9wHe4F-lpp}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=9wHe4F-lpp", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "5;5;4;2", "wc_review": "252;518;422;294", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "612;634;563;385", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 1.224744871391589 ], "wc_review_avg": [ 371.5, 105.23663810669743 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 548.5, 97.83276547251437 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.9128709291752768, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1977468195366882796&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;2;2;0;2", "aff_unique_norm": "University of Oulu;South China University of Technology;National University of Defense Technology", "aff_unique_dep": ";;", "aff_unique_url": "https://www.oulu.fi;https://www.scut.edu.cn;http://www.nudt.edu.cn/", "aff_unique_abbr": "UOulu;SCUT;NUDT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;1;0;1", "aff_country_unique": "Finland;China" }, { "title": "A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3283", "id": "9xC2tWEwBD", "poster": "", "openreview": "https://openreview.net/forum?id=9xC2tWEwBD", "slides": "https://iclr.cc/virtual/2021/poster/3283", "video": "https://iclr.cc/virtual/2021/poster/3283", "author_site": "Sanghyun Hong, Yigitcan Kaya, Ionut-Vlad Modoranu, Tudor Dumitras", "tldr": "", "abstract": "Recent increases in the computational demands of deep neural networks (DNNs), combined with the observation that most input samples require only simple models, have sparked interest in input-adaptive multi-exit architectures, such as MSDNets or Shallow-Deep Networks. These architectures enable faster inferences and could bring DNNs to low-power devices, e.g., in the Internet of Things (IoT). However, it is unknown if the computational savings provided by this approach are robust against adversarial pressure. In particular, an adversary may aim to slowdown adaptive DNNs by increasing their average inference time\u2014a threat analogous to the denial-of-service attacks from the Internet. In this paper, we conduct a systematic evaluation of this threat by experimenting with three generic multi-exit DNNs (based on VGG16, MobileNet, and ResNet56) and a custom multi-exit architecture, on two popular image classification benchmarks (CIFAR-10 and Tiny ImageNet). To this end, we show that adversarial example-crafting techniques can be modified to cause slowdown, and we propose a metric for comparing their impact on different architectures. We show that a slowdown attack reduces the efficacy of multi-exit DNNs by 90\u2013100%, and it amplifies the latency by 1.5\u20135\u00d7 in a typical IoT deployment. We also show that it is possible to craft universal, reusable perturbations and that the attack can be effective in realistic black-box scenarios, where the attacker has limited knowledge about the victim. Finally, we show that adversarial training provides limited protection against slowdowns. These results suggest that further research is needed for defending multi-exit architectures against this emerging threat. Our code is available at https://github.com/sanghyun-hong/deepsloth. ", "keywords": "Slowdown attacks;efficient inference;input-adaptive multi-exit neural networks;adversarial examples", "primary_area": "", "supplementary_material": "", "author": "Sanghyun Hong;Yigitcan Kaya;Ionu\u021b-Vlad Modoranu;Tudor Dumitras", "authorids": "~Sanghyun_Hong1;~Yigitcan_Kaya1;modoranu.ionut.vlad@hotmail.com;~Tudor_Dumitras1", "gender": "M;;;M", "homepage": "http://www.sanghyun-hong.com;;;http://users.umiacs.umd.edu/~tdumitra/", "dblp": "135/8991;;;01/4921", "google_scholar": "https://scholar.google.com/citations?hl=en;;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Sanghyun_Hong1;~Yigitcan_Kaya1;modoranu.ionut.vlad@hotmail.com;~Tudor_Dumitras1", "aff": "Department of Computer Science, University of Maryland, College Park;;;University of Maryland, College Park", "aff_domain": "cs.umd.edu;;;umd.edu", "position": "PhD student;;;Associate Professor", "bibtex": "@inproceedings{\nhong2021a,\ntitle={A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference},\nauthor={Sanghyun Hong and Yigitcan Kaya and Ionu\u021b-Vlad Modoranu and Tudor Dumitras},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9xC2tWEwBD}\n}", "github": "[![github](/images/github_icon.svg) sanghyun-hong/deepsloth](https://github.com/sanghyun-hong/deepsloth)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "3;6;7;8", "confidence": "3;3;4;5", "wc_review": "321;257;276;231", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "513;918;308;407", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.0, 1.8708286933869707 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 271.25, 32.86620604815834 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 536.5, 231.88197428864538 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.8058229640253803, "gs_citation": 82, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7387967890679036055&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=9xC2tWEwBD", "email": "cs.umd.edu;;;umd.edu", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "University of Maryland, College Park;University of Maryland", "aff_unique_dep": "Department of Computer Science;", "aff_unique_url": "https://www/umd.edu;https://www/umd.edu", "aff_unique_abbr": "UMD;UMD", "aff_campus_unique_index": "0;0", "aff_campus_unique": "College Park", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "9y4qOAIfA9r", "title": "Does injecting linguistic structure into language models lead to better alignment with brain recordings?", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neuroscientists evaluate deep neural networks for natural language processing as possible candidate models for how language is processed in the brain. These models are often trained without explicit linguistic supervision, but have been shown to learn some linguistic structure in the absence of such supervision (Manning et. al, 2020), potentially questioning the relevance of symbolic linguistic theories in modeling such cognitive processes (Warstadt & Bowman, 2020). We evaluate across two fMRI datasets whether language models align better with brain recordings, if their attention is biased by annotations from syntactic or semantic formalisms. Using structure from dependency or minimal recursion semantic annotations, we find alignments improve significantly for one of the datasets. For another dataset, we see more mixed results. We present an extensive analysis of these results. Our proposed approach enables the evaluation of more targeted hypotheses about the composition of meaning in the brain, expanding the range of possible scientific inferences a neuroscientist could make, and opens up new opportunities for cross-pollination between computational neuroscience and linguistics.\n\n", "keywords": "neurolinguistics;natural language processing;computational neuroscience", "primary_area": "", "supplementary_material": "", "author": "Mostafa Abdou;Ana Valeria Gonz\u00e1lez;Mariya K Toneva;Daniel Hershcovich;Anders S\u00f8gaard", "authorids": "~Mostafa_Abdou2;ana@di.ku.dk;~Mariya_K_Toneva1;~Daniel_Hershcovich1;~Anders_S\u00f8gaard1", "gender": "M;;F;M;M", "homepage": "https://scholar.google.nl/citations?user=qgbKJ24AAAAJ&hl=en;;https://mtoneva.com;http://danielhers.github.io/;https://anderssoegaard.github.io/", "dblp": ";;160/4677;145/9324;30/2756", "google_scholar": "https://scholar.google.nl/citations?user=qgbKJ24AAAAJ;;https://scholar.google.ca/citations?user=a61sk-4AAAAJ;479qIucAAAAJ;https://scholar.google.com.tw/citations?user=x3I4CrYAAAAJ", "orcid": ";;0000-0002-2407-9871;0000-0002-3966-8708;", "linkedin": ";;;danielhershcovich;", "or_profile": "~Mostafa_Abdou2;ana@di.ku.dk;~Mariya_K_Toneva1;~Daniel_Hershcovich1;~Anders_S\u00f8gaard1", "aff": "University of Copenhagen;;Carnegie Mellon University;University of Copenhagen;Copenhagen University", "aff_domain": "ku.dk;;cmu.edu;ku.dk;ku.dk", "position": "PhD student;;PhD student;Assistant Professor;Full Professor", "bibtex": "@misc{\nabdou2021does,\ntitle={Does injecting linguistic structure into language models lead to better alignment with brain recordings?},\nauthor={Mostafa Abdou and Ana Valeria Gonz{\\'a}lez and Mariya K Toneva and Daniel Hershcovich and Anders S{\\o}gaard},\nyear={2021},\nurl={https://openreview.net/forum?id=9y4qOAIfA9r}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=9y4qOAIfA9r", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "3;3;4;3", "wc_review": "316;684;517;213", "wc_reply_reviewers": "0;682;0;0", "wc_reply_authors": "567;1760;279;258", "reply_reviewers": "0;2;0;0", "reply_authors": "1;4;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 432.5, 181.7587687018153 ], "wc_reply_reviewers_avg": [ 170.5, 295.3146626904936 ], "wc_reply_authors_avg": [ 716.0, 614.9939024087962 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8886228561372097309&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "University of Copenhagen;Carnegie Mellon University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ku.dk;https://www.cmu.edu", "aff_unique_abbr": "UCPH;CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "Denmark;United States" }, { "title": "MetaNorm: Learning to Normalize Few-Shot Batches Across Domains", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3313", "id": "9z_dNsC4B5t", "poster": "", "openreview": "https://openreview.net/forum?id=9z_dNsC4B5t", "slides": "https://iclr.cc/virtual/2021/poster/3313", "video": "https://iclr.cc/virtual/2021/poster/3313", "author_site": "Yingjun Du, Xiantong Zhen, Ling Shao, Cees G Snoek", "tldr": "", "abstract": "Batch normalization plays a crucial role when training deep neural networks. However, batch statistics become unstable with small batch sizes and are unreliable in the presence of distribution shifts. We propose MetaNorm, a simple yet effective meta-learning normalization. It tackles the aforementioned issues in a unified way by leveraging the meta-learning setting and learns to infer adaptive statistics for batch normalization. MetaNorm is generic, flexible and model-agnostic, making it a simple plug-and-play module that is seamlessly embedded into existing meta-learning approaches. It can be efficiently implemented by lightweight hypernetworks with low computational cost. We verify its effectiveness by extensive evaluation on representative tasks suffering from the small batch and domain shift problems: few-shot learning and domain generalization. We further introduce an even more challenging setting: few-shot domain generalization. Results demonstrate that MetaNorm consistently achieves better, or at least competitive, accuracy compared to existing batch normalization methods. ", "keywords": "Meta-learning;batch normalization;few-shot domain generalization", "primary_area": "", "supplementary_material": "", "author": "Yingjun Du;Xiantong Zhen;Ling Shao;Cees G. M. Snoek", "authorids": "~Yingjun_Du1;~Xiantong_Zhen1;~Ling_Shao1;~Cees_G._M._Snoek1", "gender": "M;M;M;M", "homepage": "https://yingjundu.github.io/;;;http://www.ceessnoek.info", "dblp": "263/6794;78/10651;;s/CeesSnoek", "google_scholar": "oAeW6rAAAAAJ;https://scholar.google.ca/citations?user=DnBb3e0AAAAJ;z84rLjoAAAAJ;https://scholar.google.nl/citations?user=0uKdbscAAAAJ", "orcid": ";;;0000-0001-9092-1556", "linkedin": "%E8%8B%B1%E5%86%9B-%E6%9D%9C-a938a0174/;;;cgmsnoek/", "or_profile": "~Yingjun_Du1;~Xiantong_Zhen1;~Ling_Shao1;~Cees_Snoek1", "aff": "University of Amsterdam;Inception Institute of Artificial Intelligence;Inception Institute of Artificial Intelligence;University of Amsterdam", "aff_domain": "uva.nl;inceptioniai.org;inceptioniai.org;uva.nl", "position": "PhD student;Senior Scientist;CEO and Chief Scientist;Full Professor", "bibtex": "@inproceedings{\ndu2021metanorm,\ntitle={MetaNorm: Learning to Normalize Few-Shot Batches Across Domains},\nauthor={Yingjun Du and Xiantong Zhen and Ling Shao and Cees G. M. Snoek},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=9z_dNsC4B5t}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "5;3;4;3", "wc_review": "1416;195;784;565", "wc_reply_reviewers": "848;0;0;0", "wc_reply_authors": "1433;70;725;467", "reply_reviewers": "3;0;0;0", "reply_authors": "4;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 740.0, 443.44165343368456 ], "wc_reply_reviewers_avg": [ 212.0, 367.194771204602 ], "wc_reply_authors_avg": [ 673.75, 496.57495657755436 ], "reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.899228803025897, "gs_citation": 83, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17414587175282129999&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=9z_dNsC4B5t", "email": "uva.nl;inceptioniai.org;inceptioniai.org;uva.nl", "author_num": 4, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "University of Amsterdam;Inception Institute of Artificial Intelligence", "aff_unique_dep": ";", "aff_unique_url": "https://www.uva.nl;https://www.inceptioniai.org", "aff_unique_abbr": "UvA;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "Netherlands;United Arab Emirates" }, { "id": "A-Sp6CR9-AA", "title": "Sandwich Batch Normalization", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present Sandwich Batch Normalization ($\\textbf{SaBN}$), a frustratingly easy improvement of Batch Normalization (BN) with only a few lines of code changes. SaBN is motivated by addressing the inherent $\\textit{feature distribution heterogeneity}$ that one can be identified in many tasks, which can arise from model heterogeneity (dynamic architectures, model conditioning, etc.), or data heterogeneity (multiple input domains). A SaBN factorizes the BN affine layer into one shared $\\textit{sandwich affine}$ layer, cascaded by several parallel $\\textit{independent affine}$ layers. Its variants include further decomposing the normalization layer into multiple parallel ones, and extending similar ideas to instance normalization. We demonstrate the prevailing effectiveness of SaBN (as well as its variants) as a $\\textbf{drop-in replacement in four tasks}$: neural architecture search (NAS), image generation, adversarial training, and style transfer. Leveraging SaBN immediately boosts two state-of-the-art weight-sharing NAS algorithms significantly on NAS-Bench-201; achieves better Inception Score and FID on CIFAR-10 and ImageNet conditional image generation with three state-of-the art GANs; substantially improves the robust and standard accuracy for adversarial defense; and produces superior arbitrary stylized results. We also provide visualizations and analysis to help understand why SaBN works. All our codes and pre-trained models will be released upon acceptance. ", "keywords": "normalization", "primary_area": "", "supplementary_material": "", "author": "Xinyu Gong;Wuyang Chen;Tianlong Chen;Zhangyang Wang", "authorids": "~Xinyu_Gong1;~Wuyang_Chen1;~Tianlong_Chen1;~Zhangyang_Wang1", "gender": "M;;M;M", "homepage": "https://gongxinyuu.github.io;;https://tianlong-chen.github.io;https://vita-group.github.io", "dblp": "215/5405;;;119/4026", "google_scholar": "A8e8UNAAAAAJ;;LE3ctn0AAAAJ;pxFyKAIAAAAJ", "orcid": "0000-0002-6993-136X;;0000-0001-7774-8197;", "linkedin": "xinyu-gong-b4ab73191/;;tianlong-chen-783862167/;", "or_profile": "~Xinyu_Gong1;~Wuyang_Chen1;~Tianlong_Chen1;~Zhangyang_Wang1", "aff": "University of Texas, Austin;;University of Texas, Austin;University of Texas, Austin", "aff_domain": "utexas.edu;;utexas.edu;utexas.edu", "position": "PhD student;;PhD student;Assistant Professor", "bibtex": "@misc{\ngong2021sandwich,\ntitle={Sandwich Batch Normalization},\nauthor={Xinyu Gong and Wuyang Chen and Tianlong Chen and Zhangyang Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=A-Sp6CR9-AA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=A-Sp6CR9-AA", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "4;4;5;5", "wc_review": "476;166;322;223", "wc_reply_reviewers": "290;213;0;0", "wc_reply_authors": "1208;1166;728;521", "reply_reviewers": "1;1;0;0", "reply_authors": "2;3;1;1", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 296.75, 117.58268367408527 ], "wc_reply_reviewers_avg": [ 125.75, 128.66307745425647 ], "wc_reply_authors_avg": [ 905.75, 290.9951674856474 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.6882472016116854, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:DI8NRowQzRUJ:scholar.google.com/&scioq=Sandwich+Batch+Normalization&hl=en&as_sdt=0,5", "gs_version_total": 2, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Texas at Austin", "aff_unique_dep": "", "aff_unique_url": "https://www.utexas.edu", "aff_unique_abbr": "UT Austin", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Austin", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Filtered Inner Product Projection for Crosslingual Embedding Alignment", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2958", "id": "A2gNouoXE7", "poster": "", "openreview": "https://openreview.net/forum?id=A2gNouoXE7", "slides": "https://iclr.cc/virtual/2021/poster/2958", "video": "https://iclr.cc/virtual/2021/poster/2958", "author_site": "Vin Sachidananda, Ziyi Yang, Chenguang Zhu", "tldr": "", "abstract": "Due to widespread interest in machine translation and transfer learning, there are numerous algorithms for mapping multiple embeddings to a shared representation space. Recently, these algorithms have been studied in the setting of bilingual lexicon induction where one seeks to align the embeddings of a source and a target language such that translated word pairs lie close to one another in a common representation space. In this paper, we propose a method, Filtered Inner Product Projection (FIPP), for mapping embeddings to a common representation space. As semantic shifts are pervasive across languages and domains, FIPP first identifies the common geometric structure in both embeddings and then, only on the common structure, aligns the Gram matrices of these embeddings. FIPP is applicable even when the source and target embeddings are of differing dimensionalities. Additionally, FIPP provides computational benefits in ease of implementation and is faster to compute than current approaches. Following the baselines in Glavas et al. 2019, we evaluate FIPP both in the context of bilingual lexicon induction and downstream language tasks. We show that FIPP outperforms existing methods on the XLING BLI dataset for most language pairs while also providing robust performance across downstream tasks. ", "keywords": "multilingual representations;word embeddings;natural language processing", "primary_area": "", "supplementary_material": "", "author": "Vin Sachidananda;Ziyi Yang;Chenguang Zhu", "authorids": "~Vin_Sachidananda1;~Ziyi_Yang1;~Chenguang_Zhu1", "gender": ";M;M", "homepage": "https://vinsachi.com;;", "dblp": "231/7682;;48/7536-1.html", "google_scholar": "r1zoZEYAAAAJ;JkyLIM0AAAAJ;1b2kKWoAAAAJ", "orcid": ";;", "linkedin": "vin-sachidananda-bb33a138;ziyi-yang;", "or_profile": "~Vin_Sachidananda1;~Ziyi_Yang1;~Chenguang_Zhu1", "aff": "Amazon;Stanford University;", "aff_domain": "amazon.com;stanford.edu;", "position": "Researcher;PhD;", "bibtex": "@inproceedings{\nsachidananda2021filtered,\ntitle={Filtered Inner Product Projection for Crosslingual Embedding Alignment},\nauthor={Vin Sachidananda and Ziyi Yang and Chenguang Zhu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=A2gNouoXE7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;6;8", "confidence": "4;4;4", "wc_review": "984;531;968", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "2620;1091;870", "reply_reviewers": "0;0;0", "reply_authors": "4;2;2", "rating_avg": [ 6.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 827.6666666666666, 209.87668336959737 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1527.0, 778.1161010200641 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.6666666666666665, 0.9428090415820634 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=564348295131586637&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=A2gNouoXE7", "email": "amazon.com;stanford.edu;", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Amazon;Stanford University", "aff_unique_dep": "Amazon.com, Inc.;", "aff_unique_url": "https://www.amazon.com;https://www.stanford.edu", "aff_unique_abbr": "Amazon;Stanford", "aff_campus_unique_index": "1", "aff_campus_unique": ";Stanford", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Explainable Deep One-Class Classification", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2521", "id": "A5VV3UyIQz", "poster": "", "openreview": "https://openreview.net/forum?id=A5VV3UyIQz", "slides": "https://iclr.cc/virtual/2021/poster/2521", "video": "https://iclr.cc/virtual/2021/poster/2521", "author_site": "Philipp Liznerski, Lukas Ruff, Robert A Vandermeulen, Billy J Franks, Marius Kloft, Klaus R Muller", "tldr": "", "abstract": "Deep one-class classification variants for anomaly detection learn a mapping that concentrates nominal samples in feature space causing anomalies to be mapped away. Because this transformation is highly non-linear, finding interpretations poses a significant challenge. In this paper we present an explainable deep one-class classification method, Fully Convolutional Data Description (FCDD), where the mapped samples are themselves also an explanation heatmap. FCDD yields competitive detection performance and provides reasonable explanations on common anomaly detection benchmarks with CIFAR-10 and ImageNet. On MVTec-AD, a recent manufacturing dataset offering ground-truth anomaly maps, FCDD sets a new state of the art in the unsupervised setting. Our method can incorporate ground-truth anomaly maps during training and using even a few of these (~5) improves performance significantly. Finally, using FCDD's explanations we demonstrate the vulnerability of deep one-class classification models to spurious image features such as image watermarks.", "keywords": "anomaly-detection;deep-learning;explanations;interpretability;xai;one-class-classification;deep-anomaly-detection;novelty-detection;outlier-detection", "primary_area": "", "supplementary_material": "", "author": "Philipp Liznerski;Lukas Ruff;Robert A. Vandermeulen;Billy Joe Franks;Marius Kloft;Klaus Robert Muller", "authorids": "~Philipp_Liznerski1;~Lukas_Ruff1;~Robert_A._Vandermeulen2;b_franks12@cs.uni-kl.de;~Marius_Kloft1;~Klaus_Robert_Muller1", "gender": "M;M;;;M;M", "homepage": "https://ml.informatik.uni-kl.de/;;;;http://ml.informatik.uni-kl.de/;https://www.ml.tu-berlin.de/menue/members/klaus-robert_mueller/", "dblp": "268/8258;222/9848;;;73/2217;m/KRMuller.html", "google_scholar": "6Kf3wfAAAAAJ;https://scholar.google.de/citations?user=40QzNXMAAAAJ;;;https://scholar.google.de/citations?user=l-BJCdAAAAAJ;https://scholar.google.de/citations?hl=de", "orcid": ";0000-0002-9707-297X;;;;0000-0002-3861-7685", "linkedin": ";lukasruff/;;;;", "or_profile": "~Philipp_Liznerski1;~Lukas_Ruff1;~Robert_A._Vandermeulen2;b_franks12@cs.uni-kl.de;~Marius_Kloft1;~Klaus_Robert_Muller1", "aff": "University of Kaiserslautern-Landau;TU Berlin;;;RPTU Kaiserslautern-Landau;TU Berlin", "aff_domain": "rptu.de;tu-berlin.de;;;uni-kl.de;tu-berlin.de", "position": "PhD student;PhD student;;;Professor;Full Professor", "bibtex": "@inproceedings{\nliznerski2021explainable,\ntitle={Explainable Deep One-Class Classification},\nauthor={Philipp Liznerski and Lukas Ruff and Robert A. Vandermeulen and Billy Joe Franks and Marius Kloft and Klaus Robert Muller},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=A5VV3UyIQz}\n}", "github": "[![github](/images/github_icon.svg) liznerski/fcdd](https://github.com/liznerski/fcdd) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=A5VV3UyIQz)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "4;7;8", "confidence": "1;4;4", "wc_review": "148;338;369", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "144;229;105", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.333333333333333, 1.699673171197595 ], "confidence_avg": [ 3.0, 1.4142135623730951 ], "wc_review_avg": [ 285.0, 97.69680991038892 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 159.33333333333334, 51.77086267604802 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.9707253433941511, "gs_citation": 297, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1382712243609022780&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=A5VV3UyIQz", "email": "rptu.de;tu-berlin.de;;;uni-kl.de;tu-berlin.de", "author_num": 6, "aff_unique_index": "0;1;2;1", "aff_unique_norm": "University of Kaiserslautern-Landau;Technische Universit\u00e4t Berlin;Rheinland-Pfalz Technical University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.uni-kl.de;https://www.tu-berlin.de;https://www.rptu.de", "aff_unique_abbr": "Uni KL;TU Berlin;RPTU", "aff_campus_unique_index": "1;2;1", "aff_campus_unique": ";Berlin;Kaiserslautern-Landau", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Germany" }, { "id": "A7-rYAC-np1", "title": "Syntactic representations in the human brain: beyond effort-based metrics", "track": "main", "status": "Reject", "tldr": "", "abstract": "We are far from having a complete mechanistic understanding of the brain computations involved in language processing and of the role that syntax plays in those computations. Most language studies do not computationally model syntactic structure, and most studies that do model syntactic processing use effort-based metrics. These metrics capture the effort needed to process the syntactic information given by every word (Brennan et al., 2012; Hale et al., 2018; Brennan et al.,2016). They can reveal where in the brain syntactic processing occurs, but not what features of syntax are processed by different brain regions. Here, we move beyond effort-based metrics and propose explicit features capturing the syntactic structure that is incrementally built while a sentence is being read. Using these features and functional Magnetic Resonance Imaging (fMRI) recordings of participants reading a natural text, we study the brain representation of syntax. We find that our syntactic structure-based features are better than effort-based metrics at predicting brain activity in various parts of the language system. We show evidence of the brain representation of complex syntactic information such as phrase and clause structures. We see that regions well-predicted by syntactic features are distributed in the language system and are not distinguishable from those processing semantics. Our results call for a shift in the approach used for studying syntactic processing.", "keywords": "neuroscience;fMRI;syntactic representations;graph embeddings", "primary_area": "", "supplementary_material": "/attachment/e88614582e7d536880ae406a07e238ee1b4ac494.zip", "author": "Aniketh Janardhan Reddy;Leila Wehbe", "authorids": "ajreddy@cs.cmu.edu;~Leila_Wehbe1", "gender": ";F", "homepage": ";http://www.cs.cmu.edu/~lwehbe/", "dblp": ";125/4359", "google_scholar": ";YezyUawAAAAJ", "orcid": ";0000-0001-8545-2062", "linkedin": ";", "or_profile": "ajreddy@cs.cmu.edu;~Leila_Wehbe1", "aff": ";Carnegie Mellon University", "aff_domain": ";cmu.edu", "position": ";Assistant Professor", "bibtex": "@misc{\nreddy2021syntactic,\ntitle={Syntactic representations in the human brain: beyond effort-based metrics},\nauthor={Aniketh Janardhan Reddy and Leila Wehbe},\nyear={2021},\nurl={https://openreview.net/forum?id=A7-rYAC-np1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=A7-rYAC-np1", "pdf_size": 0, "rating": "4;5;6;8", "confidence": "3;4;4;4", "wc_review": "114;573;681;324", "wc_reply_reviewers": "0;235;0;94", "wc_reply_authors": "458;874;1304;206", "reply_reviewers": "0;1;0;1", "reply_authors": "1;2;2;2", "rating_avg": [ 5.75, 1.479019945774904 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 423.0, 220.42345610211268 ], "wc_reply_reviewers_avg": [ 82.25, 96.17789506950129 ], "wc_reply_authors_avg": [ 710.5, 417.5077843585674 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.6831300510639732, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16723570128001186791&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "A993YzEUKB7", "title": "Extrapolatable Relational Reasoning With Comparators in Low-Dimensional Manifolds", "track": "main", "status": "Reject", "tldr": "", "abstract": "While modern deep neural architectures generalise well when test data is sampled from the same distribution as training data, they fail badly for cases when the test data distribution differs from the training distribution even along a few dimensions. This lack of out-of-distribution generalisation is increasingly manifested when the tasks become more abstract and complex, such as in relational reasoning. In this paper we propose a neuroscience-inspired inductive-biased module that can be readily amalgamated with current neural network architectures to improve out-of-distribution (o.o.d) generalisation performance on relational reasoning tasks. This module learns to project high-dimensional object representations to low-dimensional manifolds for more efficient and generalisable relational comparisons. We show that neural nets with this inductive bias achieve considerably better o.o.d generalisation performance for a range of relational reasoning tasks. We finally analyse the proposed inductive bias module to understand the importance of lower dimension projection, and propose an augmentation to the algorithmic alignment theory to better measure algorithmic alignment with generalisation.", "keywords": "Visual Reasoning;Relational Reasoning;Generalisation", "primary_area": "", "supplementary_material": "", "author": "Duo Wang;Mateja Jamnik;Pietro Li\u00f2", "authorids": "~Duo_Wang1;~Mateja_Jamnik1;~Pietro_Li\u00f21", "gender": "M;F;", "homepage": "https://www.cl.cam.ac.uk/~wd263/;http://www.cl.cam.ac.uk/~mj201;", "dblp": ";41/1392;l/PietroLio", "google_scholar": "https://scholar.google.co.uk/citations?user=8532hHAAAAAJ;d5QiyJkAAAAJ;", "orcid": ";0000-0003-2772-2532;", "linkedin": ";;", "or_profile": "~Duo_Wang1;~Mateja_Jamnik1;~Pietro_Li\u00f21", "aff": "University of Cambridge;University of Cambridge;", "aff_domain": "cam.ac.uk;cam.ac.uk;", "position": "PhD student;Professor in Artificial Intelligence;", "bibtex": "@misc{\nwang2021extrapolatable,\ntitle={Extrapolatable Relational Reasoning With Comparators in Low-Dimensional Manifolds},\nauthor={Duo Wang and Mateja Jamnik and Pietro Li{\\`o}},\nyear={2021},\nurl={https://openreview.net/forum?id=A993YzEUKB7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=A993YzEUKB7", "pdf_size": 0, "rating": "4;4;5;5;6", "confidence": "4;5;3;4;4", "wc_review": "520;358;229;494;1018", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "498;432;299;422;736", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 4.8, 0.7483314773547882 ], "confidence_avg": [ 4.0, 0.6324555320336759 ], "wc_review_avg": [ 523.8, 268.1539856127445 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 477.4, 144.3988919625078 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.42257712736425823, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:_q6AxFftp-gJ:scholar.google.com/&scioq=Extrapolatable+Relational+Reasoning+With+Comparators+in+Low-Dimensional+Manifolds&hl=en&as_sdt=0,5", "gs_version_total": 8, "aff_unique_index": "0;0", "aff_unique_norm": "University of Cambridge", "aff_unique_dep": "", "aff_unique_url": "https://www.cam.ac.uk", "aff_unique_abbr": "Cambridge", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Cambridge", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "title": "Wasserstein Embedding for Graph Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3182", "id": "AAes_3W-2z", "poster": "", "openreview": "https://openreview.net/forum?id=AAes_3W-2z", "slides": "https://iclr.cc/virtual/2021/poster/3182", "video": "https://iclr.cc/virtual/2021/poster/3182", "author_site": "Soheil Kolouri, Navid Naderializadeh, Gustavo K Rohde, Heiko Hoffmann", "tldr": "", "abstract": "We present Wasserstein Embedding for Graph Learning (WEGL), a novel and fast framework for embedding entire graphs in a vector space, in which various machine learning models are applicable for graph-level prediction tasks. We leverage new insights on defining similarity between graphs as a function of the similarity between their node embedding distributions. Specifically, we use the Wasserstein distance to measure the dissimilarity between node embeddings of different graphs. Unlike prior work, we avoid pairwise calculation of distances between graphs and reduce the computational complexity from quadratic to linear in the number of graphs. WEGL calculates Monge maps from a reference distribution to each node embedding and, based on these maps, creates a fixed-sized vector representation of the graph. We evaluate our new graph embedding approach on various benchmark graph-property prediction tasks, showing state-of-the-art classification performance while having superior computational efficiency. The code is available at https://github.com/navid-naderi/WEGL.", "keywords": "Wasserstein;graph embedding;graph-level prediction", "primary_area": "", "supplementary_material": "/attachment/d2fa2201388071573f2e27d0dc9726250533a09c.zip", "author": "Soheil Kolouri;Navid Naderializadeh;Gustavo K. Rohde;Heiko Hoffmann", "authorids": "~Soheil_Kolouri1;nnaderializadeh@hrl.com;~Gustavo_K._Rohde1;hhoffmann@hrl.com", "gender": ";;;", "homepage": ";;;", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": ";;;", "aff": ";;;", "aff_domain": ";;;", "position": ";;;", "bibtex": "@inproceedings{\nkolouri2021wasserstein,\ntitle={Wasserstein Embedding for Graph Learning},\nauthor={Soheil Kolouri and Navid Naderializadeh and Gustavo K. Rohde and Heiko Hoffmann},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=AAes_3W-2z}\n}", "github": "[![github](/images/github_icon.svg) navid-naderi/WEGL](https://github.com/navid-naderi/WEGL)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "4;3;4;5", "wc_review": "698;383;244;1169", "wc_reply_reviewers": "0;0;0;67", "wc_reply_authors": "1399;478;428;1211", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;1;3", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 623.5, 355.3100758492503 ], "wc_reply_reviewers_avg": [ 16.75, 29.011851026778693 ], "wc_reply_authors_avg": [ 879.0, 431.51651185093715 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.8528028654224417, "gs_citation": 108, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=318944885595116091&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=AAes_3W-2z", "email": ";;;", "author_num": 4 }, { "id": "ABZSAe9gNeg", "title": "Differentially Private Synthetic Data: Applied Evaluations and Enhancements", "track": "main", "status": "Reject", "tldr": "", "abstract": "Machine learning practitioners frequently seek to leverage the most informative available data, without violating the data owner's privacy, when building predictive models. Differentially private data synthesis protects personal details from exposure, and allows for the training of differentially private machine learning models on privately generated datasets. But how can we effectively assess the efficacy of differentially private synthetic data? In this paper, we survey four differentially private generative adversarial networks for data synthesis. We evaluate each of them at scale on five standard tabular datasets, and in two applied industry scenarios. We benchmark with novel metrics from recent literature and other standard machine learning tools. Our results suggest some synthesizers are more applicable for different privacy budgets, and we further demonstrate complicating domain-based tradeoffs in selecting an approach. We offer experimental learning on applied machine learning scenarios with private internal data to researchers and practitioners alike. In addition, we propose QUAIL, a two model hybrid approach to generating synthetic data. We examine QUAIL's tradeoffs, and note circumstances in which it outperforms baseline differentially private supervised learning models under the same budget constraint.", "keywords": "privacy;differential privacy;generative adversarial networks;gan;security;synthetic data;evaluation;benchmarking;ensemble", "primary_area": "", "supplementary_material": "/attachment/7d9b7cc0f5d91f8133f34d7ef865d50067163846.zip", "author": "Lucas Rosenblatt;Xiaoyan Liu;Samira Pouyanfar;Eduardo de Leon;Anuj Desai;Joshua Allen", "authorids": "~Lucas_Rosenblatt1;~Xiaoyan_Liu1;sapouyan@microsoft.com;eddeleon@microsoft.com;andesai@microsoft.com;joshuaa@microsoft.com", "gender": "M;;;;;", "homepage": "https://www.lucasrosenblatt.com;https://www.linkedin.com/in/april-xiaoyan-liu-867a0479/;;;;", "dblp": "163/0926;;;;;", "google_scholar": "cDwuS6gAAAAJ;;;;;", "orcid": "0000-0001-6952-4361;;;;;", "linkedin": ";;;;;", "or_profile": "~Lucas_Rosenblatt1;~Xiaoyan_Liu1;sapouyan@microsoft.com;eddeleon@microsoft.com;andesai@microsoft.com;joshuaa@microsoft.com", "aff": "New York University;Microsoft;;;;", "aff_domain": "nyu.edu;microsoft.com;;;;", "position": "PhD student;data scientist;;;;", "bibtex": "@misc{\nrosenblatt2021differentially,\ntitle={Differentially Private Synthetic Data: Applied Evaluations and Enhancements},\nauthor={Lucas Rosenblatt and Xiaoyan Liu and Samira Pouyanfar and Eduardo de Leon and Anuj Desai and Joshua Allen},\nyear={2021},\nurl={https://openreview.net/forum?id=ABZSAe9gNeg}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=ABZSAe9gNeg", "pdf_size": 0, "rating": "4;4;4", "confidence": "4;3;4", "wc_review": "214;351;277", "wc_reply_reviewers": "174;0;0", "wc_reply_authors": "799;697;369", "reply_reviewers": "1;0;0", "reply_authors": "2;1;1", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 280.6666666666667, 55.99007848618261 ], "wc_reply_reviewers_avg": [ 58.0, 82.02438661763951 ], "wc_reply_authors_avg": [ 621.6666666666666, 183.45087141309037 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 84, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9103518805746204078&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1", "aff_unique_norm": "New York University;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "https://www.nyu.edu;https://www.microsoft.com", "aff_unique_abbr": "NYU;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Lifelong Learning of Compositional Structures", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2733", "id": "ADWd4TJO13G", "poster": "", "openreview": "https://openreview.net/forum?id=ADWd4TJO13G", "slides": "https://iclr.cc/virtual/2021/poster/2733", "video": "https://iclr.cc/virtual/2021/poster/2733", "author_site": "Jorge Mendez, ERIC EATON", "tldr": "", "abstract": "A hallmark of human intelligence is the ability to construct self-contained chunks of knowledge and adequately reuse them in novel combinations for solving different yet structurally related problems. Learning such compositional structures has been a significant challenge for artificial systems, due to the combinatorial nature of the underlying search problem. To date, research into compositional learning has largely proceeded separately from work on lifelong or continual learning. We integrate these two lines of work to present a general-purpose framework for lifelong learning of compositional structures that can be used for solving a stream of related tasks. Our framework separates the learning process into two broad stages: learning how to best combine existing components in order to assimilate a novel problem, and learning how to adapt the set of existing components to accommodate the new problem. This separation explicitly handles the trade-off between the stability required to remember how to solve earlier tasks and the flexibility required to solve new tasks, as we show empirically in an extensive evaluation.", "keywords": "lifelong learning;continual learning;compositional learning;modular networks", "primary_area": "", "supplementary_material": "", "author": "Jorge A Mendez;ERIC EATON", "authorids": "~Jorge_A_Mendez1;~ERIC_EATON1", "gender": ";M", "homepage": ";https://www.seas.upenn.edu/~mendezme/", "dblp": "22/2336;255/6609", "google_scholar": "QIZWnnQAAAAJ;87sQtnsAAAAJ", "orcid": ";0000-0002-2537-598X", "linkedin": ";", "or_profile": "~ERIC_EATON1;~Jorge_Armando_Mendez_Mendez1", "aff": "University of Pennsylvania;University of Pennsylvania", "aff_domain": "upenn.edu;upenn.edu", "position": "Faculty;PhD student", "bibtex": "@inproceedings{\nmendez2021lifelong,\ntitle={Lifelong Learning of Compositional Structures},\nauthor={Jorge A Mendez and ERIC EATON},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=ADWd4TJO13G}\n}", "github": "[![github](/images/github_icon.svg) GRASP-ML/Mendez2020Compositional](https://github.com/GRASP-ML/Mendez2020Compositional)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer5;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;6;6;7;9", "confidence": "3;4;3;3;4", "wc_review": "301;405;797;634;459", "wc_reply_reviewers": "26;34;405;0;0", "wc_reply_authors": "1226;1256;1779;319;85", "reply_reviewers": "1;1;1;0;0", "reply_authors": "2;2;3;1;1", "rating_avg": [ 6.8, 1.16619037896906 ], "confidence_avg": [ 3.4, 0.4898979485566356 ], "wc_review_avg": [ 519.2, 175.85721480792307 ], "wc_reply_reviewers_avg": [ 93.0, 156.59629625249764 ], "wc_reply_authors_avg": [ 933.0, 632.7707325722326 ], "reply_reviewers_avg": [ 0.6, 0.48989794855663565 ], "reply_authors_avg": [ 1.8, 0.7483314773547883 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4900980294098034, "gs_citation": 45, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11061523929398124661&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=ADWd4TJO13G", "email": "upenn.edu;upenn.edu", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Pennsylvania", "aff_unique_dep": "", "aff_unique_url": "https://www.upenn.edu", "aff_unique_abbr": "UPenn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "ADwLLmSda3", "title": "Neural Nonnegative CP Decomposition for Hierarchical Tensor Analysis", "track": "main", "status": "Reject", "tldr": "", "abstract": "There is a significant demand for topic modeling on large-scale data with complex multi-modal structure in applications such as multi-layer network analysis, temporal document classification, and video data analysis; frequently this multi-modal data has latent hierarchical structure. We propose a new hierarchical nonnegative CANDECOMP/PARAFAC (CP) decomposition (hierarchical NCPD) model and a training method, Neural NCPD, for performing hierarchical topic modeling on multi-modal tensor data. Neural NCPD utilizes a neural network architecture and backpropagation to mitigate error propagation through hierarchical NCPD. ", "keywords": "nonnegative tensor decompositions;topic modeling;hierarchical model;CP decomposition;neural network;backpropagation", "primary_area": "", "supplementary_material": "/attachment/37a5ca9d4c925a55b23e69b9862404e9c317fa4a.zip", "author": "Joshua Vendrow;Jamie Haddock;Deanna Needell", "authorids": "jvendrow@math.ucla.edu;~Jamie_Haddock1;~Deanna_Needell2", "gender": ";F;Not Specified", "homepage": ";https://jamiehaddock.com/;https://www.math.ucla.edu/~deanna/index.html", "dblp": ";207/8176.html;03/2691", "google_scholar": ";ONLOj-oAAAAJ;", "orcid": ";0000-0002-1449-2574;0000-0002-8058-8638", "linkedin": ";;", "or_profile": "jvendrow@math.ucla.edu;~Jamie_Haddock1;~Deanna_Needell2", "aff": ";University of California, Los Angeles;University of California, Los Angeles", "aff_domain": ";ucla.edu;ucla.edu", "position": ";Postdoc;Full Professor", "bibtex": "@misc{\nvendrow2021neural,\ntitle={Neural Nonnegative {\\{}CP{\\}} Decomposition for Hierarchical Tensor Analysis},\nauthor={Joshua Vendrow and Jamie Haddock and Deanna Needell},\nyear={2021},\nurl={https://openreview.net/forum?id=ADwLLmSda3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer6", "site": "https://openreview.net/forum?id=ADwLLmSda3", "pdf_size": 0, "rating": "4;4;6", "confidence": "2;4;4", "wc_review": "369;386;1114", "wc_reply_reviewers": "0;0;175", "wc_reply_authors": "345;921;752", "reply_reviewers": "0;0;1", "reply_authors": "1;2;2", "rating_avg": [ 4.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.3333333333333335, 0.9428090415820634 ], "wc_review_avg": [ 623.0, 347.25878918562546 ], "wc_reply_reviewers_avg": [ 58.333333333333336, 82.49579113843053 ], "wc_reply_authors_avg": [ 672.6666666666666, 241.7496409281488 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10766760303775022911&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0", "aff_unique_norm": "University of California, Los Angeles", "aff_unique_dep": "", "aff_unique_url": "https://www.ucla.edu", "aff_unique_abbr": "UCLA", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "AFm2njNEE1", "title": "Explicit Learning Topology for Differentiable Neural Architecture Search", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Differentiable neural architecture search (NAS) has gained much success in discovering more \ufb02exible and diverse cell types. Current methods couple the operations and topology during search, and simply derive optimal topology by a hand-craft rule. However, topology also matters for neural architectures since it controls the interactions between features of operations. In this paper, we highlight the topology learning in differentiable NAS, and propose an explicit topology modeling method, named TopoNAS, to directly decouple the operation selection and topology during search. Concretely, we introduce a set of topological variables and a combinatorial probabilistic distribution to explicitly indicate the target topology. Besides, we also leverage a passive-aggressive regularization to suppress invalid topology within supernet. Our introduced topological variables can be jointly learned with operation variables and supernet weights, and apply to various DARTS variants. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed TopoNAS. The results show that TopoNAS does enable to search cells with more diverse and complex topology, and boost the performance signi\ufb01cantly. For example, TopoNAS can improve DARTS by 0.16% accuracy on CIFAR-10 dataset with 40% parameters reduced or 0.35% with similar parameters.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Tao Huang;Shan You;Yibo Yang;Zhuozhuo Tu;Fei Wang;Chen Qian;Changshui Zhang", "authorids": "~Tao_Huang5;~Shan_You3;~Yibo_Yang2;~Zhuozhuo_Tu1;~Fei_Wang9;~Chen_Qian1;~Changshui_Zhang1", "gender": "M;M;M;M;M;M;M", "homepage": "https://taohuang.info;https://shanyou92.github.io/;https://iboing.github.io/;;;;http://bigeye.au.tsinghua.edu.cn/english/Introduction.html", "dblp": "34/808-20;179/2548;28/7717/;230/4649;;;z/ChangshuiZhang", "google_scholar": "jkcRdBgAAAAJ;https://scholar.google.com/citations?hl=en;DxXXnCcAAAAJ;;ljt16JkAAAAJ;AerkT0YAAAAJ;GL9M37YAAAAJ", "orcid": ";0000-0003-1964-0430;;;;;", "linkedin": ";;;;;;", "or_profile": "~Tao_Huang5;~Shan_You3;~Yibo_Yang2;~Zhuozhuo_Tu1;~Fei_Wang9;~Chen_Qian1;~Changshui_Zhang2", "aff": "SenseTime Research;Tsinghua University;Peking University;;University of Science and Technology of China;Tsinghua University;Tsinghua University", "aff_domain": "sensetime.com;tsinghua.edu.cn;pku.edu.cn;;mail.ustc.edu.cn;mails.tsinghua.edu.cn;mail.tsinghua.edu.cn", "position": "Researcher;Postdoc;PhD student;;PhD student;PhD student;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=AFm2njNEE1", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "5;5;3;4", "wc_review": "548;476;227;216", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "109;217;95;185", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 366.75, 147.5150416059325 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 151.5, 51.01715397785337 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14590785762316974676&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;2;3;1;1", "aff_unique_norm": "SenseTime;Tsinghua University;Peking University;University of Science and Technology of China", "aff_unique_dep": "SenseTime Research;;;", "aff_unique_url": "https://www.sensetime.com;https://www.tsinghua.edu.cn;http://www.pku.edu.cn;http://www.ustc.edu.cn", "aff_unique_abbr": "SenseTime;THU;Peking U;USTC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "China" }, { "id": "AGQGZkLBKK", "title": "The Effectiveness of Memory Replay in Large Scale Continual Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We study continual learning in the large scale setting where tasks in the input sequence are not limited to classification, and the outputs can be of high dimension. Among multiple state-of-the-art methods, we found vanilla experience replay (ER) still very competitive in terms of both performance and scalability, despite its simplicity.\nHowever, a degraded performance is observed for ER with small memory. A further visualization of the feature space reveals that the intermediate representation undergoes a distributional drift.\nWhile existing methods usually replay only the input-output pairs, we hypothesize that their regularization effect is inadequate for complex deep models and diverse tasks with small replay buffer size. Following this observation, we propose to replay the activation of the intermediate layers in addition to the input-output pairs. Considering that saving raw activation maps can dramatically increase memory and compute cost, we propose the Compressed Activation Replay technique, where compressed representations of layer activation are saved to the replay buffer. We show that this approach can achieve superior regularization effect while adding negligible memory overhead to replay method. Experiments on both the large-scale Taskonomy benchmark with a diverse set of tasks and standard common datasets (Split-CIFAR and Split-miniImageNet) demonstrate the effectiveness of the proposed method.", "keywords": "Continual learning;memory replay;regularization;lifelong learning;multi-task learning", "primary_area": "", "supplementary_material": "", "author": "Yogesh Balaji;Mehrdad Farajtabar;Dong Yin;Alex Mott;Ang Li", "authorids": "~Yogesh_Balaji1;~Mehrdad_Farajtabar1;~Dong_Yin1;~Alex_Mott1;~Ang_Li1", "gender": "M;M;M;M;M", "homepage": "https://yogeshbalaji.github.io/;https://www.cc.gatech.edu/~mfarajta/;https://dongyin92.github.io/;;https://angli.ai", "dblp": "185/6906;21/9988;85/4137;;33/2805-1", "google_scholar": "0I2qH0oAAAAJ;shkKxnQAAAAJ;YtM8P88AAAAJ;;6bRXWXEAAAAJ", "orcid": ";;;;", "linkedin": ";;dong-yin-6747137b/;;angli-ai", "or_profile": "~Yogesh_Balaji1;~Mehrdad_Farajtabar1;~Dong_Yin1;~Alex_Mott1;~Ang_Li1", "aff": "Department of Computer Science, University of Maryland, College Park;Google;Google DeepMind;Google DeepMind;Google DeepMind", "aff_domain": "cs.umd.edu;google.com;google.com;deepmind.com;google.com", "position": "PhD student;Research Scientist;Research scientist;Research Engineer;Principal Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=AGQGZkLBKK", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;5;4;4", "wc_review": "184;292;528;671", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "124;145;189;271", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 418.75, 191.5324711374027 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 182.25, 56.35323859371349 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 40, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6920055151161991459&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;1;1;1", "aff_unique_norm": "University of Maryland, College Park;Google", "aff_unique_dep": "Department of Computer Science;Google", "aff_unique_url": "https://www/umd.edu;https://www.google.com", "aff_unique_abbr": "UMD;Google", "aff_campus_unique_index": "0;1", "aff_campus_unique": "College Park;Mountain View;", "aff_country_unique_index": "0;0;1;1;1", "aff_country_unique": "United States;United Kingdom" }, { "title": "Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3291", "id": "AHOs7Sm5H7R", "poster": "", "openreview": "https://openreview.net/forum?id=AHOs7Sm5H7R", "slides": "https://iclr.cc/virtual/2021/poster/3291", "video": "https://iclr.cc/virtual/2021/poster/3291", "author_site": "Zhiyuan Li, Yuping Luo, Kaifeng Lyu", "tldr": "", "abstract": "Matrix factorization is a simple and natural test-bed to investigate the implicit regularization of gradient descent. Gunasekar et al. (2017) conjectured that gradient flow with infinitesimal initialization converges to the solution that minimizes the nuclear norm, but a series of recent papers argued that the language of norm minimization is not sufficient to give a full characterization for the implicit regularization. In this work, we provide theoretical and empirical evidence that for depth-2 matrix factorization, gradient flow with infinitesimal initialization is mathematically equivalent to a simple heuristic rank minimization algorithm, Greedy Low-Rank Learning, under some reasonable assumptions. This generalizes the rank minimization view from previous works to a much broader setting and enables us to construct counter-examples to refute the conjecture from Gunasekar et al. (2017). We also extend the results to the case where depth >= 3, and we show that the benefit of being deeper is that the above convergence has a much weaker dependence over initialization magnitude so that this rank minimization is more likely to take effect for initialization with practical scale.", "keywords": "matrix factorization;gradient descent;implicit regularization;implicit bias", "primary_area": "", "supplementary_material": "", "author": "Zhiyuan Li;Yuping Luo;Kaifeng Lyu", "authorids": "~Zhiyuan_Li2;~Yuping_Luo1;~Kaifeng_Lyu2", "gender": "M;M;M", "homepage": "https://zhiyuanli.ttic.edu;http://www.yuping.me;https://kaifeng.ac/", "dblp": "l/ZhiyuanLi;70/4804;220/3283", "google_scholar": "https://scholar.google.com/citations?hl=en;https://scholar.google.com/citations?hl=en;843JJtgAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Zhiyuan_Li2;~Yuping_Luo1;~Kaifeng_Lyu2", "aff": "Department of Computer Science, Princeton University;Princeton University;Princeton University", "aff_domain": "cs.princeton.edu;princeton.edu;princeton.edu", "position": "PhD student;PhD student;PhD student", "bibtex": "@inproceedings{\nli2021towards,\ntitle={Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning},\nauthor={Zhiyuan Li and Yuping Luo and Kaifeng Lyu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=AHOs7Sm5H7R}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "4;3;3;3", "wc_review": "454;716;596;1162", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "289;694;337;509", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 732.0, 265.01698058803703 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 457.25, 159.2927729057411 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 155, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1379045037218310775&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=AHOs7Sm5H7R", "email": "cs.princeton.edu;princeton.edu;princeton.edu", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Princeton University", "aff_unique_dep": "Department of Computer Science", "aff_unique_url": "https://www.princeton.edu", "aff_unique_abbr": "Princeton", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "SEED: Self-supervised Distillation For Visual Representation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2707", "id": "AHm3dbp7D1D", "poster": "", "openreview": "https://openreview.net/forum?id=AHm3dbp7D1D", "slides": "https://iclr.cc/virtual/2021/poster/2707", "video": "https://iclr.cc/virtual/2021/poster/2707", "author_site": "Zhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu", "tldr": "", "abstract": "This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named $\\textbf{SE}$lf-Sup$\\textbf{E}$rvised $\\textbf{D}$istillation (${\\large S}$EED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that ${\\large S}$EED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, ${\\large S}$EED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNet-v3-Large on the ImageNet-1k dataset. ", "keywords": "Self Supervised Learning;Knowledge Distillation;Representation Learning", "primary_area": "", "supplementary_material": "", "author": "Zhiyuan Fang;Jianfeng Wang;Lijuan Wang;Lei Zhang;Yezhou Yang;Zicheng Liu", "authorids": "~Zhiyuan_Fang1;~Jianfeng_Wang4;~Lijuan_Wang1;~Lei_Zhang23;~Yezhou_Yang1;~Zicheng_Liu1", "gender": "M;M;F;M;M;M", "homepage": "https://www.public.asu.edu/~zfang29/;;https://www.microsoft.com/en-us/research/people/lijuanw/;https://yezhouyang.engineering.asu.edu;https://sites.google.com/view/zichengliu/home?pli=1;https://www.leizhang.org/", "dblp": "75/4027;;51/2527.html;78/7455;l/ZichengLiu;z/LeiZhang", "google_scholar": "https://scholar.google.com.au/citations?hl=en;vJWEw_8AAAAJ;cDcWXuIAAAAJ;k2suuZgAAAAJ;bkALdvsAAAAJ;fIlGZToAAAAJ", "orcid": ";;;;0000-0001-5894-7828;", "linkedin": ";;;;;", "or_profile": "~Zhiyuan_Fang1;~Jianfeng_Wang4;~Lijuan_Wang1;~Yezhou_Yang1;~Zicheng_Liu1;~Lei_Zhang1", "aff": "Arizona State University;Microsoft;Microsoft;Arizona State University;Microsoft;Microsoft", "aff_domain": "asu.edu;microsoft.com;microsoft.com;asu.edu;microsoft.com;microsoft.com", "position": "PhD student;Principal Researcher;Principal Researcher;Assistant Professor;partner research manager;Principal Researcher", "bibtex": "@inproceedings{\nfang2021seed,\ntitle={{\\{}SEED{\\}}: Self-supervised Distillation For Visual Representation},\nauthor={Zhiyuan Fang and Jianfeng Wang and Lijuan Wang and Lei Zhang and Yezhou Yang and Zicheng Liu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=AHm3dbp7D1D}\n}", "github": "[![github](/images/github_icon.svg) jacobswan1/SEED](https://github.com/jacobswan1/SEED)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7", "confidence": "5;5;4", "wc_review": "338;253;360", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "67;708;642", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 317.0, 46.13747572924495 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 472.3333333333333, 287.8776746389958 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 233, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8472207324878329601&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=AHm3dbp7D1D", "email": "asu.edu;microsoft.com;microsoft.com;asu.edu;microsoft.com;microsoft.com", "author_num": 6, "aff_unique_index": "0;1;1;0;1;1", "aff_unique_norm": "Arizona State University;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "https://www.asu.edu;https://www.microsoft.com", "aff_unique_abbr": "ASU;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2923", "id": "AICNpd8ke-m", "poster": "", "openreview": "https://openreview.net/forum?id=AICNpd8ke-m", "slides": "https://iclr.cc/virtual/2021/poster/2923", "video": "https://iclr.cc/virtual/2021/poster/2923", "author_site": "Kanil Patel, William H Beluch, Bin Yang, Michael Pfeiffer, Dan Zhang", "tldr": "", "abstract": "Post-hoc multi-class calibration is a common approach for providing high-quality confidence estimates of deep neural network predictions. Recent work has shown that widely used scaling methods underestimate their calibration error, while alternative Histogram Binning (HB) methods often fail to preserve classification accuracy. When classes have small prior probabilities, HB also faces the issue of severe sample-inefficiency after the conversion into K one-vs-rest class-wise calibration problems. The goal of this paper is to resolve the identified issues of HB in order to provide calibrated confidence estimates using only a small holdout calibration dataset for bin optimization while preserving multi-class ranking accuracy. From an information-theoretic perspective, we derive the I-Max concept for binning, which maximizes the mutual information between labels and quantized logits. This concept mitigates potential loss in ranking performance due to lossy quantization, and by disentangling the optimization of bin edges and representatives allows simultaneous improvement of ranking and calibration performance. To improve the sample efficiency and estimates from a small calibration set, we propose a shared class-wise (sCW) calibration strategy, sharing one calibrator among similar classes (e.g., with similar class priors) so that the training sets of their class-wise calibration problems can be merged to train the single calibrator. The combination of sCW and I-Max binning outperforms the state of the art calibration methods on various evaluation metrics across different benchmark datasets and models, using a small calibration set (e.g., 1k samples for ImageNet).", "keywords": "uncertainty calibration;post-hoc calibration;histogram binning;mutual information;deep neural networks", "primary_area": "", "supplementary_material": "", "author": "Kanil Patel;William H. Beluch;Bin Yang;Michael Pfeiffer;Dan Zhang", "authorids": "~Kanil_Patel1;~William_H._Beluch1;~Bin_Yang5;~Michael_Pfeiffer1;~Dan_Zhang1", "gender": "M;;;M;", "homepage": ";;https://www.iss.uni-stuttgart.de/en/;http://www.bosch-ai.com;", "dblp": ";230/1399;77/377-9;https://dblp.uni-trier.de/pers/hd/p/Pfeiffer_0001:Michael;21/802-17", "google_scholar": "iQuvoY4AAAAJ;;sm9O9OYAAAAJ;https://scholar.google.de/citations?user=jDE5tIQAAAAJ;https://scholar.google.de/citations?user=yazO-mMAAAAJ", "orcid": ";;;0000-0001-7159-3622;0000-0003-0930-9162", "linkedin": ";;;michael-pfeiffer-0a098449/;", "or_profile": "~Kanil_Patel1;~William_H._Beluch1;~Bin_Yang5;~Michael_Pfeiffer1;~Dan_Zhang1", "aff": "Robert Bosch GmbH, Bosch;Robert Bosch GmbH, Bosch;University of Stuttgart;Bosch Center for Artificial Intelligence;Robert Bosch GmbH, Bosch", "aff_domain": "de.bosch.com;de.bosch.com;uni-stuttgart.de;bosch.com;de.bosch.com", "position": "PhD student;Research Engineer;Full Professor;Postdoc;Research Scientist", "bibtex": "@inproceedings{\npatel2021multiclass,\ntitle={Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning},\nauthor={Kanil Patel and William H. Beluch and Bin Yang and Michael Pfeiffer and Dan Zhang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=AICNpd8ke-m}\n}", "github": "[![github](/images/github_icon.svg) boschresearch/imax-calibration](https://github.com/boschresearch/imax-calibration)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "5;7;7", "confidence": "4;3;4", "wc_review": "392;500;532", "wc_reply_reviewers": "0;91;0", "wc_reply_authors": "1286;1490;1063", "reply_reviewers": "0;2;0", "reply_authors": "2;6;2", "rating_avg": [ 6.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 474.6666666666667, 59.896206520576534 ], "wc_reply_reviewers_avg": [ 30.333333333333332, 42.897811391983886 ], "wc_reply_authors_avg": [ 1279.6666666666667, 174.37953498682757 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 3.3333333333333335, 1.8856180831641267 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.49999999999999983, "gs_citation": 47, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10820552692700202554&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=AICNpd8ke-m", "email": "de.bosch.com;de.bosch.com;uni-stuttgart.de;bosch.com;de.bosch.com", "author_num": 5, "aff_unique_index": "0;0;1;2;0", "aff_unique_norm": "Robert Bosch GmbH;University of Stuttgart;Bosch Center for Artificial Intelligence", "aff_unique_dep": ";;Center for Artificial Intelligence", "aff_unique_url": "https://www.bosch.com;https://www.uni-stuttgart.de;https://www.bosch-ai.com", "aff_unique_abbr": "Bosch;USTuttgart;BCAI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "Germany" }, { "id": "AJTAcS7SZzf", "title": "AUTOSAMPLING: SEARCH FOR EFFECTIVE DATA SAMPLING SCHEDULES", "track": "main", "status": "Reject", "tldr": "", "abstract": "Data sampling acts as a pivotal role in training deep learning models. However, an effective sampling schedule is difficult to learn due to its inherent high-dimension as a hyper-parameter. In this paper, we propose the AutoSampling method to automatically learn sampling schedules for model training, which consists of the multi-exploitation step aiming for optimal local sampling schedules and the exploration step for the ideal sampling distribution. More specifically, we achieve sampling schedule search with shortened exploitation cycle to provide enough supervision. In addition, we periodically estimate the sampling distribution from the learned sampling schedules and perturb it to search in the distribution space. The combination of two searches allows us to learn a robust sampling schedule. We apply our AutoSampling method to a variety of image classification tasks illustrating the effectiveness of the proposed method.", "keywords": "Hyper-parameter Learning;AutoML;Computer Vision", "primary_area": "", "supplementary_material": "", "author": "Ming Sun;Haoxuan Dou;Baopu Li;Junjie Yan;Wanli Ouyang", "authorids": "~Ming_Sun4;~Haoxuan_Dou1;~Baopu_Li1;~Junjie_Yan4;~Wanli_Ouyang1", "gender": "M;M;;M;", "homepage": "https://msunming.github.io/;;;https://yan-junjie.github.io/;", "dblp": "39/1471-8.html;;;115/9656;", "google_scholar": "https://scholar.google.com.hk/citations?hl=zh-CN;;;rEYarG0AAAAJ;", "orcid": ";;;;", "linkedin": ";haoxuan-dou-3a502281;;;", "or_profile": "~Ming_Sun4;~Haoxuan_Dou1;~Baopu_Li1;~Junjie_Yan4;~Wanli_Ouyang1", "aff": "Sensetime Tech;Sensetime;;;", "aff_domain": "sensetime.com;sensetime.com;;;", "position": "Researcher;Researcher;;;", "bibtex": "@misc{\nsun2021autosampling,\ntitle={{\\{}AUTOSAMPLING{\\}}: {\\{}SEARCH{\\}} {\\{}FOR{\\}} {\\{}EFFECTIVE{\\}} {\\{}DATA{\\}} {\\{}SAMPLING{\\}} {\\{}SCHEDULES{\\}}},\nauthor={Ming Sun and Haoxuan Dou and Baopu Li and Junjie Yan and Wanli Ouyang},\nyear={2021},\nurl={https://openreview.net/forum?id=AJTAcS7SZzf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=AJTAcS7SZzf", "pdf_size": 0, "rating": "3;5;6", "confidence": "5;4;4", "wc_review": "301;241;242", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "535;613;192", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.666666666666667, 1.247219128924647 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 261.3333333333333, 28.05153986662566 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 446.6666666666667, 182.87032442568574 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.944911182523068, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3746069801375272827&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "SenseTime", "aff_unique_dep": "", "aff_unique_url": "https://www.sensetime.com", "aff_unique_abbr": "SenseTime", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "AJY3fGPF1DC", "title": "Selecting Treatment Effects Models for Domain Adaptation Using Causal Knowledge", "track": "main", "status": "Reject", "tldr": "", "abstract": "Selecting causal inference models for estimating individualized treatment effects (ITE) from observational data presents a unique challenge since the counterfactual outcomes are never observed. The problem is challenged further in the unsupervised domain adaptation (UDA) setting where we only have access to labeled samples in the source domain, but desire selecting a model that achieves good performance on a target domain for which only unlabeled samples are available. Existing techniques for UDA model selection are designed for the predictive setting. These methods examine discriminative density ratios between the input covariates in the source and target domain and do not factor in the model's predictions in the target domain. Because of this, two models with identical performance on the source domain would receive the same risk score by existing methods, but in reality, have significantly different performance in the test domain. We leverage the invariance of causal structures across domains to propose a novel model selection metric specifically designed for ITE methods under the UDA setting. In particular, we propose selecting models whose predictions of interventions' effects satisfy known causal structures in the target domain. Experimentally, our method selects ITE models that are more robust to covariate shifts on several healthcare datasets, including estimating the effect of ventilation in COVID-19 patients from different geographic locations.", "keywords": "causal inference;treatment effects;healthcare", "primary_area": "", "supplementary_material": "", "author": "Trent Kyono;Ioana Bica;Zhaozhi Qian;Mihaela van der Schaar", "authorids": "~Trent_Kyono1;~Ioana_Bica1;~Zhaozhi_Qian1;~Mihaela_van_der_Schaar2", "gender": "M;F;;F", "homepage": ";https://ioanabica.github.io/;;https://www.vanderschaar-lab.com", "dblp": "https://dblp.uni-trier.de/pers/hd/k/Kyono:Trent;;194/2443;", "google_scholar": "vJxuKwgAAAAJ;;PuTDB5gAAAAJ;DZ3S--MAAAAJ", "orcid": ";;0000-0002-4561-0342;", "linkedin": ";;;", "or_profile": "~Trent_Kyono1;~Ioana_Bica1;~Zhaozhi_Qian1;~Mihaela_van_der_Schaar2", "aff": "University of California, Los Angeles;University of Oxford;University of Cambridge;University of California, Los Angeles", "aff_domain": "ucla.edu;ox.ac.uk;cam.ac.uk;ucla.edu", "position": "PhD student;PhD student;PhD student;Full Professor", "bibtex": "@misc{\nkyono2021selecting,\ntitle={Selecting Treatment Effects Models for Domain Adaptation Using Causal Knowledge},\nauthor={Trent Kyono and Ioana Bica and Zhaozhi Qian and Mihaela van der Schaar},\nyear={2021},\nurl={https://openreview.net/forum?id=AJY3fGPF1DC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=AJY3fGPF1DC", "pdf_size": 0, "rating": "4;6;6;8", "confidence": "4;3;5;4", "wc_review": "676;349;218;538", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1306;446;565;550", "reply_reviewers": "0;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 445.25, 175.1818697810935 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 716.75, 343.2764010240145 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3987934023917646009&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "University of California, Los Angeles;University of Oxford;University of Cambridge", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ucla.edu;https://www.ox.ac.uk;https://www.cam.ac.uk", "aff_unique_abbr": "UCLA;Oxford;Cambridge", "aff_campus_unique_index": "0;2;0", "aff_campus_unique": "Los Angeles;;Cambridge", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "ALSupSRaBH", "title": "Deep Goal-Oriented Clustering", "track": "main", "status": "Reject", "tldr": "", "abstract": "Clustering and prediction are two primary tasks in the fields of unsupervised and supervised learning, respectively. Although much of the recent advances in machine learning have been centered around those two tasks, the interdependent, mutually beneficial relationship between them is rarely explored. One could reasonably expect appropriately clustering the data would aid the downstream prediction task and, conversely, a better prediction performance for the downstream task could potentially inform a more appropriate clustering strategy. In this work, we focus on the latter part of this mutually beneficial relationship. To this end, we introduce Deep Goal-Oriented Clustering (DGC), a probabilistic framework that clusters the data by jointly using supervision via side-information and unsupervised modeling of the inherent data structure in an end-to-end fashion. We show the effectiveness of our model on a range of datasets by achieving prediction accuracies comparable to the state-of-the-art, while, more importantly in our setting, simultaneously learning congruent clustering strategies.", "keywords": "clustering;variational inference", "primary_area": "", "supplementary_material": "/attachment/399702691d5a646e03185116f0d380db2a3a267f.zip", "author": "Yifeng Shi;Christopher M Bender;Linnea Olsson;Melissa Troester;Katherine A Hoadley;Junier Oliva;Marc Niethammer", "authorids": "~Yifeng_Shi3;~Christopher_M_Bender1;lolsson@live.unc.edu;troester@unc.edu;hoadley@med.unc.edu;~Junier_Oliva1;~Marc_Niethammer1", "gender": "M;M;;;;M;M", "homepage": ";;;;;http://lupalab.com;http://wwwx.cs.unc.edu/~mn/", "dblp": ";;;;;137/8390;88/3304", "google_scholar": "u9mELXIAAAAJ;;;;;;https://scholar.google.com.au/citations?user=KqtBi6MAAAAJ", "orcid": ";;;;;;", "linkedin": "https://www.linkedin.com/feed/;;;;;;", "or_profile": "~Yifeng_Shi3;~Christopher_M_Bender1;lolsson@live.unc.edu;troester@unc.edu;hoadley@med.unc.edu;~Junier_Oliva1;~Marc_Niethammer1", "aff": "Department of Computer Science, University of North Carolina, Chapel Hill;Department of Computer Science, University of North Carolina, Chapel Hill;;;;;The University of North Carolina at Chapel Hill", "aff_domain": "cs.unc.edu;cs.unc.edu;;;;;unc.edu", "position": "PhD student;PhD student;;;;;Full Professor", "bibtex": "@misc{\nshi2021deep,\ntitle={Deep Goal-Oriented Clustering},\nauthor={Yifeng Shi and Christopher M Bender and Linnea Olsson and Melissa Troester and Katherine A Hoadley and Junier Oliva and Marc Niethammer},\nyear={2021},\nurl={https://openreview.net/forum?id=ALSupSRaBH}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=ALSupSRaBH", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "4;4;3;4", "wc_review": "116;269;474;1126", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "376;686;393;509", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 496.25, 385.1339864255036 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 491.0, 123.67093433786292 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.2581988897471611, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:t5iVnPb_LScJ:scholar.google.com/&scioq=Deep+Goal-Oriented+Clustering&hl=en&as_sdt=0,5", "gs_version_total": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of North Carolina;University of North Carolina at Chapel Hill", "aff_unique_dep": "Department of Computer Science;", "aff_unique_url": "https://www.unc.edu;https://www.unc.edu", "aff_unique_abbr": "UNC;UNC Chapel Hill", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Chapel Hill", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "AM0PBmqmojH", "title": "Warpspeed Computation of Optimal Transport, Graph Distances, and Embedding Alignment", "track": "main", "status": "Reject", "tldr": "", "abstract": "Optimal transport (OT) is a cornerstone of many machine learning tasks. The current best practice for computing OT is via entropy regularization and Sinkhorn iterations. This algorithm runs in quadratic time and requires calculating the full pairwise cost matrix, which is prohibitively expensive for large sets of objects. To alleviate this limitation we propose to instead use a sparse approximation of the cost matrix based on locality sensitive hashing (LSH). Moreover, we fuse this sparse approximation with the Nystr\u00f6m method, resulting in the locally corrected Nystr\u00f6m method (LCN). These approximations enable general log-linear time algorithms for entropy-regularized OT that perform well even in complex, high-dimensional spaces. We thoroughly demonstrate these advantages via a theoretical analysis and by evaluating multiple approximations both directly and as a component of two real-world models. Using approximate Sinkhorn for unsupervised word embedding alignment enables us to train the model full-batch in a fraction of the time while improving upon the original on average by 3.1 percentage points without any model changes. For graph distance regression we propose the graph transport network (GTN), which combines graph neural networks (GNNs) with enhanced Sinkhorn and outcompetes previous models by 48%. LCN-Sinkhorn enables GTN to achieve this while still scaling log-linearly in the number of nodes.", "keywords": "Optimal transport;sinkhorn distance;locality sensitive hashing;nystr\u00f6m method;graph neural networks;embedding alignment", "primary_area": "", "supplementary_material": "/attachment/c79b4e1d32703ddeffd0da7752812896b93485f9.zip", "author": "Johannes Klicpera;Marten Lienen;Stephan G\u00fcnnemann", "authorids": "~Johannes_Klicpera1;marten.lienen@in.tum.de;~Stephan_G\u00fcnnemann1", "gender": "M;;M", "homepage": ";;http://www.daml.in.tum.de", "dblp": "228/7897;;43/3011", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Johannes_Klicpera1;marten.lienen@in.tum.de;~Stephan_G\u00fcnnemann1", "aff": "Meta Facebook;;Technical University Munich", "aff_domain": "fb.com;;tum.de", "position": "Intern;;Professor", "bibtex": "@misc{\nklicpera2021warpspeed,\ntitle={Warpspeed Computation of Optimal Transport, Graph Distances, and Embedding Alignment},\nauthor={Johannes Klicpera and Marten Lienen and Stephan G{\\\"u}nnemann},\nyear={2021},\nurl={https://openreview.net/forum?id=AM0PBmqmojH}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=AM0PBmqmojH", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;3;3;4", "wc_review": "477;807;232;528", "wc_reply_reviewers": "0;0;151;47", "wc_reply_authors": "399;417;1061;275", "reply_reviewers": "0;0;1;1", "reply_authors": "2;2;2;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 511.0, 204.26820604293758 ], "wc_reply_reviewers_avg": [ 49.5, 61.662387238899534 ], "wc_reply_authors_avg": [ 538.0, 306.86316168611705 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:djV5aBzC_C0J:scholar.google.com/&scioq=Warpspeed+Computation+of+Optimal+Transport,+Graph+Distances,+and+Embedding+Alignment&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Meta;Technical University of Munich", "aff_unique_dep": "Meta Platforms, Inc.;", "aff_unique_url": "https://meta.com;https://www.tum.de", "aff_unique_abbr": "Meta;TUM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Germany" }, { "id": "AMoDLAx6GCC", "title": "Uncertainty Prediction for Deep Sequential Regression Using Meta Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Generating high quality uncertainty estimates for sequential regression, particularly deep recurrent networks, remains a challenging and open problem.\nExisting approaches often make restrictive assumptions (such as stationarity) yet still perform poorly in practice, particularly in presence of real world non-stationary signals and drift. \nThis paper describes a flexible method that can generate symmetric and asymmetric uncertainty estimates, makes no assumptions about stationarity, and outperforms competitive baselines on both drift and non drift scenarios.\nThis work helps make sequential regression more effective and practical for use in real-world applications, and is a powerful new addition to the modeling toolbox for sequential uncertainty quantification in general.", "keywords": "Uncertainty Quantification;Uncertainty Prediction;Deep Learning;Regression;Meta Modeling", "primary_area": "", "supplementary_material": "/attachment/a72863e9c101cdf93a642966ffadd373527096c9.zip", "author": "Jiri Navratil;Matthew Arnold;Benjamin Elder", "authorids": "~Jiri_Navratil1;marnold@us.ibm.com;benjamin.elder@ibm.com", "gender": ";;", "homepage": "https://researcher.watson.ibm.com/researcher/view.php?person=us-jiri;;", "dblp": "00/680-1.html;;", "google_scholar": "H41S5AgAAAAJ;;", "orcid": "0009-0007-5230-7679;;", "linkedin": "jiri-navratil-62641497/;;", "or_profile": "~Jiri_Navratil1;marnold@us.ibm.com;benjamin.elder@ibm.com", "aff": "International Business Machines;;", "aff_domain": "ibm.com;;", "position": "Principal Research Staff Member;;", "bibtex": "@misc{\nnavratil2021uncertainty,\ntitle={Uncertainty Prediction for Deep Sequential Regression Using Meta Models},\nauthor={Jiri Navratil and Matthew Arnold and Benjamin Elder},\nyear={2021},\nurl={https://openreview.net/forum?id=AMoDLAx6GCC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=AMoDLAx6GCC", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "3;4;4;3", "wc_review": "329;486;696;226", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 434.25, 177.22637360167363 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6343925781534768498&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "International Business Machines Corporation", "aff_unique_dep": "", "aff_unique_url": "https://www.ibm.com", "aff_unique_abbr": "IBM", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "ANednkwrr8s", "title": "Indirect Supervision to Mitigate Perturbations", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Vulnerability of state-of-the-art computer vision models to image perturbations has drawn considerable attention recently. Often these perturbations are imperceptible to humans because they target the perception of deep neural networks (DNNs) employed in the corresponding computer vision task. Recent studies have revealed that DNNs, which are unable to handle targeted perturbation often fail to handle untargeted perturbations as well such as Gaussian noise. Various techniques in past have been explored to mitigate both these types of perturbations ranging from classical preprocessing to current supervised and self-supervised deep discriminative and generative models. However, a common challenge with most of these techniques is that they approach the problem from a quality enhancement point of view, which is primarily driven by human perception. In addition, the supervised models require a large volume of gold standard unperturbed data, whereas others fail to take into account the feedback of the targeted downstream DNN. We propose to model this problem in indirect supervision framework, where we assume that the gold standard data is missing, however, a variable dependent on it is available and the dependency of the observed variable is stated by the considered downstream DNN. The proposed method maintains the advantages of supervised models while relaxing the requirement of gold standard unperturbed data. To prove its utility, we conduct several experiments on various network architectures for downstream tasks of classification and medical image segmentation. We used MNIST, CIFAR-10-C and ISIC skin lesion dataset in our experiments. In all the experiments, a considerable restoration in the performance of the considered downstream model is observed.", "keywords": "Indirect supervision;Perturbation;Downstream models;Image enhancement", "primary_area": "", "supplementary_material": "", "author": "Mayank Kumar Kundalwal;Azad Singh;Deepak Mishra", "authorids": "~Mayank_Kumar_Kundalwal1;singh.63@iitj.ac.in;~Deepak_Mishra5", "gender": "M;;M", "homepage": ";;http://home.iitj.ac.in/~dmishra/", "dblp": ";;65/6758-3", "google_scholar": "V017_oQAAAAJ;;-rOCu6sAAAAJ", "orcid": ";;", "linkedin": "hostingshades/;;", "or_profile": "~Mayank_Kumar_Kundalwal1;singh.63@iitj.ac.in;~Deepak_Mishra5", "aff": "Indian Institute of Technology Jodhpur, India, Dhirubhai Ambani Institute Of Information and Communication Technology;;Indian Institute of Technology Jodhpur, India", "aff_domain": "iitj.ac.in;;iitj.ac.in", "position": "PhD student;;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=ANednkwrr8s", "pdf_size": 0, "rating": "2;3;4;4", "confidence": "4;4;5;4", "wc_review": "382;442;677;539", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.25, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 510.0, 111.5100892296298 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:KcfERLjYuZIJ:scholar.google.com/&scioq=Indirect+Supervision+to+Mitigate+Perturbations&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Indian Institute of Technology Jodhpur", "aff_unique_dep": "", "aff_unique_url": "https://www.iitj.ac.in", "aff_unique_abbr": "IIT Jodhpur", "aff_campus_unique_index": "1", "aff_campus_unique": ";Jodhpur", "aff_country_unique_index": "0;0", "aff_country_unique": "India" }, { "id": "ARFshOO1Iu", "title": "Adaptive Self-training for Neural Sequence Labeling with Few Labels", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural sequence labeling is an important technique employed for many Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER), slot tagging for dialog systems and semantic parsing. Large-scale pre-trained language models obtain very good performance on these tasks when fine-tuned on large amounts of task-specific labeled data. However, such large-scale labeled datasets are difficult to obtain for several tasks and domains due to the high cost of human annotation as well as privacy and data access constraints for sensitive user applications. This is exacerbated for sequence labeling tasks requiring such annotations at token-level. In this work, we develop techniques to address the label scarcity challenge for neural sequence labeling models. Specifically, we develop self-training and meta-learning techniques for training neural sequence taggers with few labels. While self-training serves as an effective mechanism to learn from large amounts of unlabeled data -- meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels. Extensive experiments on six benchmark datasets including two for massive multilingual NER and four slot tagging datasets for task-oriented dialog systems demonstrate the effectiveness of our method. With only 10 labeled examples for each class for each task, our method obtains 10% improvement over state-of-the-art systems demonstrating its effectiveness for the low-resource setting. ", "keywords": "Self-training;Neural Sequence Labeling;Meta Learning", "primary_area": "", "supplementary_material": "/attachment/331c5ababd872522161f370c49535630ca4d512e.zip", "author": "Yaqing Wang;Subhabrata Mukherjee;Haoda Chu;Yuancheng Tu;Ming Wu;Jing Gao;Ahmed Hassan Awadallah", "authorids": "~Yaqing_Wang1;~Subhabrata_Mukherjee2;haochu@microsoft.com;yuantu@microsoft.com;mingwu@microsoft.com;~Jing_Gao1;~Ahmed_Hassan_Awadallah1", "gender": "M;;;;;;M", "homepage": "https://yaqingwang.github.io/;https://subhomukherjee.com/;;;;;https://www.microsoft.com/en-us/research/people/hassanam/publications/", "dblp": "147/1393;37/11030.html;;;;;147/9148", "google_scholar": "_Rfg2CAAAAAJ;T4iBN5cAAAAJ;;;;;sNGk-9MAAAAJ", "orcid": ";;;;;;", "linkedin": ";subho87;;;;;ahmed-hassan-awadallah-a355a27/", "or_profile": "~Yaqing_Wang1;~Subhabrata_Mukherjee2;haochu@microsoft.com;yuantu@microsoft.com;mingwu@microsoft.com;~Jing_Gao1;~Ahmed_Hassan_Awadallah1", "aff": "Purdue University;Microsoft;;;;;Microsoft Research", "aff_domain": "purdue.edu;microsoft.com;;;;;microsoft.com", "position": "PhD student;Principal Researcher;;;;;Principal Researcher", "bibtex": "@misc{\nwang2021adaptive,\ntitle={Adaptive Self-training for Neural Sequence Labeling with Few Labels},\nauthor={Yaqing Wang and Subhabrata Mukherjee and Haoda Chu and Yuancheng Tu and Ming Wu and Jing Gao and Ahmed Hassan Awadallah},\nyear={2021},\nurl={https://openreview.net/forum?id=ARFshOO1Iu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=ARFshOO1Iu", "pdf_size": 0, "rating": "4;7;7", "confidence": "4;3;2", "wc_review": "254;257;378", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1916;331;1085", "reply_reviewers": "0;0;0", "reply_authors": "3;1;2", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 296.3333333333333, 57.76004001229762 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1110.6666666666667, 647.3280123365245 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.816496580927726 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.8660254037844387, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11053630532779213478&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1", "aff_unique_norm": "Purdue University;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "https://www.purdue.edu;https://www.microsoft.com", "aff_unique_abbr": "Purdue;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "ARQAdp7F8OQ", "title": "Brain-like approaches to unsupervised learning of hidden representations - a comparative study", "track": "main", "status": "Reject", "tldr": "", "abstract": "Unsupervised learning of hidden representations has been one of the most vibrant research directions in machine learning in recent years. In this work we study the brain-like Bayesian Confidence Propagating Neural Network (BCPNN) model, recently extended to extract sparse distributed high-dimensional representations. The saliency and separability of the hidden representations when trained on MNIST dataset is studied using an external linear classifier and compared with other unsupervised learning methods that include restricted Boltzmann machines and autoencoders. ", "keywords": "neural networks;bio-inspired;brain-like;unsupervised learning;structural plasticity", "primary_area": "", "supplementary_material": "", "author": "Naresh Balaji;Anders Lansner;Pawel Herman", "authorids": "~Naresh_Balaji1;ala@kth.se;paherman@kth.se", "gender": ";;", "homepage": "https://www.kth.se/profile/nbrav?l=en;;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Naresh_Balaji1;ala@kth.se;paherman@kth.se", "aff": "KTH Royal Institute of Technology, Stockholm, Sweden;;", "aff_domain": "kth.se;;", "position": "PhD student;;", "bibtex": "@misc{\nbalaji2021brainlike,\ntitle={Brain-like approaches to unsupervised learning of hidden representations - a comparative study },\nauthor={Naresh Balaji and Anders Lansner and Pawel Herman},\nyear={2021},\nurl={https://openreview.net/forum?id=ARQAdp7F8OQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer5;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=ARQAdp7F8OQ", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;4;3;3", "wc_review": "314;659;356;396", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "637;700;468;64", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 431.25, 134.65024136628944 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 467.25, 247.78960329279354 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8944271909999159, "gs_citation": 21, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14732388538289976626&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "KTH Royal Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kth.se", "aff_unique_abbr": "KTH", "aff_campus_unique_index": "0", "aff_campus_unique": "Stockholm", "aff_country_unique_index": "0", "aff_country_unique": "Sweden" }, { "id": "ARaF-70QBJ1", "title": "Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing. Our approach shows significant improvements over all these core tasks in comparison with the baselines. For instance, on image recognition, our 50-layers network outperforms in terms of recognition performance on ImageNet dataset its counterpart baseline ResNet with 152 layers, while having 2.39 times less parameters, 2.52 times lower computational complexity and more than 3 times less layers. On image segmentation, our novel framework sets a new state-of-the-art on the challenging ADE20K benchmark for scene parsing. We will make the code and models publicly available.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/78649fbb41b835f01bfa4fcd2c268b7beb98f284.zip", "author": "Ionut Cosmin Duta;Li Liu;Fan Zhu;Ling Shao", "authorids": "~Ionut_Cosmin_Duta2;~Li_Liu12;~Fan_Zhu5;~Ling_Shao1", "gender": "M;M;M;M", "homepage": ";;;", "dblp": ";;;http://dblp.org/pid/144/5577", "google_scholar": "https://scholar.google.com/citations?hl=en;vD-ezyQAAAAJ;z84rLjoAAAAJ;o-BR-2IAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Li_Liu12;~Fan_Zhu5;~Ling_Shao1;~Ionut_Cosmin_Duta1", "aff": ";Inception Institute of Artificial Intelligence;Inception Institute of Artificial Intelligence;IIAI", "aff_domain": ";inceptioniai.org;inceptioniai.org;inceptioniai.org", "position": ";Director;CEO and Chief Scientist;Research Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=ARaF-70QBJ1", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "5;4;5;4", "wc_review": "117;224;431;292", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 266.0, 113.87054052739013 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 230, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13415109536324055849&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Inception Institute of Artificial Intelligence;International Institute of Artificial Intelligence", "aff_unique_dep": ";", "aff_unique_url": "https://www.inceptioniai.org;https://www.iiai.org", "aff_unique_abbr": ";IIAI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United Arab Emirates;United States" }, { "id": "ASAJvUPWaDI", "title": "A Near-Optimal Recipe for Debiasing Trained Machine Learning Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present an efficient and scalable algorithm for debiasing trained models, including deep neural networks (DNNs), which we prove to be near-optimal by bounding its excess Bayes risk. Unlike previous black-box reduction methods to cost-sensitive classification rules, the proposed algorithm operates on models that have been trained without having to retrain the model. Furthermore, as the algorithm is based on projected stochastic gradient descent (SGD), it is particularly attractive for deep learning applications. We empirically validate the proposed algorithm on standard benchmark datasets across both classical algorithms and modern DNN architectures and demonstrate that it outperforms previous post-processing approaches for unbiased classification.", "keywords": "Fairness;Classification;Statistical Parity;Deep Learning", "primary_area": "", "supplementary_material": "", "author": "Ibrahim Alabdulmohsin;Mario Lucic", "authorids": "~Ibrahim_Alabdulmohsin1;~Mario_Lucic1", "gender": "M;M", "homepage": "http://ibomohsin.com;http://lucic.ai", "dblp": "153/5393;155/1945", "google_scholar": "8WNMsPYAAAAJ;SzZRlcMAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Ibrahim_Alabdulmohsin1;~Mario_Lucic1", "aff": "Google;Google", "aff_domain": "google.com;deepmind.com", "position": "Research Scientist;Senior Staff Research Scientist", "bibtex": "@misc{\nalabdulmohsin2021a,\ntitle={A Near-Optimal Recipe for Debiasing Trained Machine Learning Models},\nauthor={Ibrahim Alabdulmohsin and Mario Lucic},\nyear={2021},\nurl={https://openreview.net/forum?id=ASAJvUPWaDI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=ASAJvUPWaDI", "pdf_size": 0, "rating": "4;6;7", "confidence": "4;4;4", "wc_review": "678;173;396", "wc_reply_reviewers": "683;0;0", "wc_reply_authors": "1072;179;170", "reply_reviewers": "3;0;0", "reply_authors": "2;1;1", "rating_avg": [ 5.666666666666667, 1.247219128924647 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 415.6666666666667, 206.6338683651728 ], "wc_reply_reviewers_avg": [ 227.66666666666666, 321.96928770027466 ], "wc_reply_authors_avg": [ 473.6666666666667, 423.10151132900586 ], "reply_reviewers_avg": [ 1.0, 1.4142135623730951 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:tTLPIkjdlqsJ:scholar.google.com/&scioq=A+Near-Optimal+Recipe+for+Debiasing+Trained+Machine+Learning+Models&hl=en&as_sdt=0,5", "gs_version_total": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "AT7jak63NNK", "title": "Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement learning algorithms can acquire policies for complex tasks autonomously. However, the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning methods have enabled agents to leverage prior experience to adapt quickly to new tasks, their performance depends crucially on how close the new task is to the previously experienced tasks. Current approaches are either not able to extrapolate well, or can do so at the expense of requiring extremely large amounts of data for on-policy meta-training. In this work, we present model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data, more easily than policies and value functions. These dynamics models can then be used to continue training policies and value functions for out-of-distribution tasks without using meta-reinforcement learning at all, by generating synthetic experience for the new task.", "keywords": "Meta-Reinforcement Learning;Meta Learning;Reinforcement Learning", "primary_area": "", "supplementary_material": "/attachment/0d87590519c412b004658444aae848c3ad0f8f91.zip", "author": "Russell Mendonca;Xinyang Geng;Chelsea Finn;Sergey Levine", "authorids": "~Russell_Mendonca1;~Xinyang_Geng1;~Chelsea_Finn1;~Sergey_Levine1", "gender": "M;M;F;M", "homepage": "https://russellmendonca.github.io/;http://young-geng.xyz/;https://ai.stanford.edu/~cbfinn/;https://people.eecs.berkeley.edu/~svlevine/", "dblp": "215/5062;186/8221;131/1783;80/7594", "google_scholar": "Uly5spMAAAAJ;vYougn0AAAAJ;vfPE6hgAAAAJ;8R35rCwAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Russell_Mendonca1;~Xinyang_Geng1;~Chelsea_Finn1;~Sergey_Levine1", "aff": "Carnegie Mellon University;University of California, Berkeley;Google;Google", "aff_domain": "cmu.edu;berkeley.edu;google.com;google.com", "position": "PhD student;PhD student;Research Scientist;Research Scientist", "bibtex": "@misc{\nmendonca2021metareinforcement,\ntitle={Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling},\nauthor={Russell Mendonca and Xinyang Geng and Chelsea Finn and Sergey Levine},\nyear={2021},\nurl={https://openreview.net/forum?id=AT7jak63NNK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=AT7jak63NNK", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;3;5;4", "wc_review": "920;1813;598;644", "wc_reply_reviewers": "45;0;0;0", "wc_reply_authors": "782;745;758;612", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 993.75, 488.7618924384347 ], "wc_reply_reviewers_avg": [ 11.25, 19.48557158514987 ], "wc_reply_authors_avg": [ 724.25, 66.15275882380114 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 51, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1955125557605636048&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "Carnegie Mellon University;University of California, Berkeley;Google", "aff_unique_dep": ";;Google", "aff_unique_url": "https://www.cmu.edu;https://www.berkeley.edu;https://www.google.com", "aff_unique_abbr": "CMU;UC Berkeley;Google", "aff_campus_unique_index": "1;2;2", "aff_campus_unique": ";Berkeley;Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "ATgKbzY1UPh", "title": "On the Capability of CNNs to Generalize to Unseen Category-Viewpoint Combinations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Object recognition and viewpoint estimation lie at the heart of visual understanding. Recent works suggest that convolutional neural networks (CNNs) fail to generalize to category-viewpoint combinations not seen during training. However, it is unclear when and how such generalization may be possible. Does the number of combinations seen during training impact generalization? What architectures better enable generalization in the multi-task setting of simultaneous category and viewpoint classification? Furthermore, what are the underlying mechanisms that drive the network\u2019s generalization? In this paper, we answer these questions by analyzing state-of-the-art CNNs trained to classify both object category and 3D viewpoint, with quantitative control over the number of category-viewpoint combinations seen during training. We also investigate the emergence of two types of specialized neurons that can explain generalization to unseen combinations\u2014neurons selective to category and invariant to viewpoint, and vice versa. We perform experiments on MNIST extended with position or scale, the iLab dataset with vehicles at different viewpoints, and a challenging new dataset for car model recognition and viewpoint estimation that we introduce in this paper - the Biased-Cars dataset. Our results demonstrate that as the number of combinations seen during training increase, networks generalize better to unseen category-viewpoint combinations, facilitated by an increase in the selectivity and invariance of individual neurons. We find that learning category and viewpoint in separate networks compared to a shared one leads to an increase in selectivity and invariance, as separate networks are not forced to preserve information about both category and viewpoint. This enables separate networks to significantly outperform shared ones at classifying unseen category-viewpoint combinations.", "keywords": "systematic generalization;category-viewpoint classification;multi-task learning", "primary_area": "", "supplementary_material": "", "author": "Spandan Madan;Timothy Henry;Jamell Arthur Dozier;Helen Ho;Nishchal Bhandari;Tomotake Sasaki;Fredo Durand;Hanspeter Pfister;Xavier Boix", "authorids": "~Spandan_Madan1;timhenry@mit.edu;~Jamell_Arthur_Dozier1;helenwh@mit.edu;nishchalb@alum.mit.edu;~Tomotake_Sasaki1;~Fredo_Durand1;~Hanspeter_Pfister1;~Xavier_Boix1", "gender": "M;;M;;;;M;M;", "homepage": ";;;;;;http://people.csail.mit.edu/fredo/;https://vcg.seas.harvard.edu;", "dblp": "205/2937;;;;;;87/2617;p/HanspeterPfister;", "google_scholar": "QY5OAIMAAAAJ;;;;;;https://scholar.google.com.tw/citations?user=NJ9c4ygAAAAJ;tvBEoaMAAAAJ;", "orcid": ";;;;;;0000-0001-9919-069X;0000-0002-3620-2582;", "linkedin": ";;jamell-dozier-a06b5619b/;;;;;hpfister/;", "or_profile": "~Spandan_Madan1;timhenry@mit.edu;~Jamell_Arthur_Dozier1;helenwh@mit.edu;nishchalb@alum.mit.edu;~Tomotake_Sasaki1;~Fredo_Durand1;~Hanspeter_Pfister1;~Xavier_Boix1", "aff": "Harvard University;;;;;;Massachusetts Institute of Technology;Harvard University;", "aff_domain": "harvard.edu;;;;;;mit.edu;harvard.edu;", "position": "PhD student;;;;;;Full Professor;Full Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=ATgKbzY1UPh", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;3;3;5", "wc_review": "504;492;606;721", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "255;227;230;203", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 580.75, 92.29673612864109 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 228.75, 18.417043736713012 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.20751433915982243, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:uJQXCL0KkVgJ:scholar.google.com/&scioq=On+the+Capability+of+CNNs+to+Generalize+to+Unseen+Category-Viewpoint+Combinations&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Harvard University;Massachusetts Institute of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.harvard.edu;https://web.mit.edu", "aff_unique_abbr": "Harvard;MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Neural Learning of One-of-Many Solutions for Combinatorial Problems in Structured Output Spaces", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2941", "id": "ATp1nW2FuZL", "poster": "", "openreview": "https://openreview.net/forum?id=ATp1nW2FuZL", "slides": "https://iclr.cc/virtual/2021/poster/2941", "video": "https://iclr.cc/virtual/2021/poster/2941", "author_site": "Yatin Nandwani, Deepanshu Jindal, Mausam ., Parag Singla", "tldr": "", "abstract": "Recent research has proposed neural architectures for solving combinatorial problems in structured output spaces. In many such problems, there may exist multiple solutions for a given input, e.g. a partially filled Sudoku puzzle may have many completions satisfying all constraints. Further, we are often interested in finding any \"one\" of the possible solutions, without any preference between them. Existing approaches completely ignore this solution multiplicity. In this paper, we argue that being oblivious to the presence of multiple solutions can severely hamper their training ability. Our contribution is two-fold. First, we formally define the task of learning one-of-many solutions for combinatorial problems in structured output spaces, which is applicable for solving several problems of interest such as N-Queens, and Sudoku. Second, we present a generic learning framework that adapts an existing prediction network for a combinatorial problem to handle solution multiplicity. Our framework uses a selection module, whose goal is to dynamically determine, for every input, the solution that is most effective for training the network parameters in any given learning iteration. We propose an RL based approach to jointly train the selection module with the prediction network. Experiments on three different domains, and using two different prediction networks, demonstrate that our framework significantly improves the accuracy in our setting, obtaining up to 21 pt gain over the baselines.\n", "keywords": "Neuro symbolic;constraint satisfaction;reasoning", "primary_area": "", "supplementary_material": "", "author": "Yatin Nandwani;Deepanshu Jindal;Mausam .;Parag Singla", "authorids": "~Yatin_Nandwani1;deepanshujindal.99@gmail.com;~Mausam_.1;~Parag_Singla1", "gender": "M;;;M", "homepage": "http://www.cse.iitd.ac.in/~yatin;;;http://www.cse.iitd.ac.in/~parags", "dblp": "255/7046;;;14/167", "google_scholar": "https://scholar.google.com/citations?hl=en;;;https://scholar.google.co.in/citations?user=V49BsgMAAAAJ", "orcid": ";;;", "linkedin": "yatin-nandwani-0804ba9/;;;", "or_profile": "~Yatin_Nandwani1;deepanshujindal.99@gmail.com;~Mausam_.1;~Parag_Singla1", "aff": "Indian Institute of Technology Delhi;;;Indian Institute of Technology, Delhi", "aff_domain": "iitd.ac.in;;;iitd.ac.in", "position": "PhD student;;;Associate Professor", "bibtex": "@inproceedings{\nnandwani2021neural,\ntitle={Neural Learning of One-of-Many Solutions for Combinatorial Problems in Structured Output Spaces},\nauthor={Yatin Nandwani and Deepanshu Jindal and Mausam . and Parag Singla},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=ATp1nW2FuZL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;5;6;8", "confidence": "3;3;3;4", "wc_review": "1002;245;295;442", "wc_reply_reviewers": "604;0;0;0", "wc_reply_authors": "2100;685;794;709", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 496.0, 300.97923516415545 ], "wc_reply_reviewers_avg": [ 151.0, 261.5396719429005 ], "wc_reply_authors_avg": [ 1072.0, 594.8962094348896 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.9428090415820632, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15472093108260435552&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=ATp1nW2FuZL", "email": "iitd.ac.in;;;iitd.ac.in", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Indian Institute of Technology Delhi", "aff_unique_dep": "", "aff_unique_url": "https://www.iitd.ac.in", "aff_unique_abbr": "IIT Delhi", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Delhi", "aff_country_unique_index": "0;0", "aff_country_unique": "India" }, { "id": "AVKFuhH1Fo4", "title": "Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite their ubiquity in core AI fields like natural language processing, the mechanics of deep attention-based neural networks like the ``Transformer'' model are not fully understood. In this article, we present a new perspective towards understanding how Transformers work. In particular, we show that the ``\"dot-product attention\" that is the core of the Transformer's operation can be characterized as a kernel learning method on a pair of Banach spaces. In particular, the Transformer's kernel is characterized as having an infinite feature dimension. Along the way we generalize the standard kernel learning problem to what we term a \"binary\" kernel learning problem, where data come from two input domains and a response is defined for every cross-domain pair. We prove a new representer theorem for these binary kernel machines with non-Mercer (indefinite, asymmetric) kernels (implying that the functions learned are elements of reproducing kernel Banach spaces rather than Hilbert spaces), and also prove a new universal approximation theorem showing that the Transformer calculation can learn any binary non-Mercer reproducing kernel Banach space pair. We experiment with new kernels in Transformers, and obtain results that suggest the infinite dimensionality of the standard Transformer kernel is partially responsible for its performance. This paper's results provide a new theoretical understanding of a very important but poorly understood model in modern machine learning.", "keywords": "Transformer models;attention models;kernel methods;reproducing kernel Banach spaces", "primary_area": "", "supplementary_material": "", "author": "Matthew A Wright;Joseph E. Gonzalez", "authorids": "~Matthew_A_Wright1;~Joseph_E._Gonzalez1", "gender": "M;M", "homepage": ";http://eecs.berkeley.edu/~jegonzal", "dblp": ";61/8262", "google_scholar": "AYPlwA0AAAAJ;https://scholar.google.com.tw/citations?user=gM2WW9UAAAAJ", "orcid": "0000-0002-9973-8800;0000-0003-2921-956X", "linkedin": "mattawright/;", "or_profile": "~Matthew_A_Wright1;~Joseph_E._Gonzalez1", "aff": "University of California, Berkeley;University of California, Berkeley", "aff_domain": "berkeley.edu;berkeley.edu", "position": "Postdoc;Assistant Professor", "bibtex": "@misc{\nwright2021transformers,\ntitle={Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines},\nauthor={Matthew A Wright and Joseph E. Gonzalez},\nyear={2021},\nurl={https://openreview.net/forum?id=AVKFuhH1Fo4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=AVKFuhH1Fo4", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;4;4;4", "wc_review": "500;221;140;335", "wc_reply_reviewers": "184;0;0;0", "wc_reply_authors": "685;75;251;183", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 299.0, 135.14991675913086 ], "wc_reply_reviewers_avg": [ 46.0, 79.67433714816836 ], "wc_reply_authors_avg": [ 298.5, 231.80325709532212 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 29, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8099896661072471858&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0", "aff_unique_norm": "University of California, Berkeley", "aff_unique_dep": "", "aff_unique_url": "https://www.berkeley.edu", "aff_unique_abbr": "UC Berkeley", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Berkeley", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3036", "id": "AWOSz_mMAPx", "poster": "", "openreview": "https://openreview.net/forum?id=AWOSz_mMAPx", "slides": "https://iclr.cc/virtual/2021/poster/3036", "video": "https://iclr.cc/virtual/2021/poster/3036", "author_site": "Tanner Fiez, Lillian J Ratliff", "tldr": "", "abstract": "We study the role that a finite timescale separation parameter $\\tau$ has on gradient descent-ascent in non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by $\\gamma_1$ and the learning rate of player 2 is defined to be $\\gamma_2=\\tau\\gamma_1$. We provide a non-asymptotic construction of the finite timescale separation parameter $\\tau^{\\ast}$ such that gradient descent-ascent locally converges to $x^{\\ast}$ for all $\\tau \\in (\\tau^{\\ast}, \\infty)$ if and only if it is a strict local minmax equilibrium. Moreover, we provide explicit local convergence rates given the finite timescale separation. The convergence results we present are complemented by a non-convergence result: given a critical point $x^{\\ast}$ that is not a strict local minmax equilibrium, we present a non-asymptotic construction of a finite timescale separation $\\tau_{0}$ such that gradient descent-ascent with timescale separation $\\tau\\in (\\tau_0, \\infty)$ does not converge to $x^{\\ast}$. Finally, we extend the results to gradient penalty regularization methods for generative adversarial networks and empirically demonstrate on CIFAR-10 and CelebA the significant impact timescale separation has on training performance. ", "keywords": "game theory;continuous games;generative adversarial networks;theory;gradient descent-ascent;equilibrium;convergence", "primary_area": "", "supplementary_material": "/attachment/e5e6e558c5bfd0b74b76d3820fd5d31d08f24ad4.zip", "author": "Tanner Fiez;Lillian J Ratliff", "authorids": "~Tanner_Fiez1;~Lillian_J_Ratliff1", "gender": ";F", "homepage": ";https://faculty.washington.edu/ratliffl/", "dblp": "195/5645;127/7426", "google_scholar": "_B6SVAcAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";0000-0001-8936-0229", "linkedin": "tannerfiez/;", "or_profile": "~Tanner_Fiez1;~Lillian_Ratliff1", "aff": "University of Washington;University of Washington, Seattle", "aff_domain": "washington.edu;uw.edu", "position": "PhD student;Assistant Professor", "bibtex": "@inproceedings{\nfiez2021local,\ntitle={Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation},\nauthor={Tanner Fiez and Lillian J Ratliff},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=AWOSz_mMAPx}\n}", "github": "[![github](/images/github_icon.svg) fiezt/Finite-Learning-Ratio](https://github.com/fiezt/Finite-Learning-Ratio)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "3;4;4;4", "wc_review": "414;617;845;1082", "wc_reply_reviewers": "0;0;0;26", "wc_reply_authors": "770;1589;28;1138", "reply_reviewers": "0;0;0;1", "reply_authors": "1;3;1;3", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 739.5, 249.6963155515115 ], "wc_reply_reviewers_avg": [ 6.5, 11.258330249197702 ], "wc_reply_authors_avg": [ 881.25, 571.6735847491993 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 57, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 39, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15691048292207002780&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=AWOSz_mMAPx", "email": "washington.edu;uw.edu", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Washington", "aff_unique_dep": "", "aff_unique_url": "https://www.washington.edu", "aff_unique_abbr": "UW", "aff_campus_unique_index": "1", "aff_campus_unique": ";Seattle", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Randomized Ensembled Double Q-Learning: Learning Fast Without a Model", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3320", "id": "AY8zfZm0tDd", "poster": "", "openreview": "https://openreview.net/forum?id=AY8zfZm0tDd", "slides": "https://iclr.cc/virtual/2021/poster/3320", "video": "https://iclr.cc/virtual/2021/poster/3320", "author_site": "Xinyue Chen, Che Wang, Zijian Zhou, Keith Ross", "tldr": "", "abstract": "Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved much higher sample efficiency than previous model-free methods for continuous-action DRL benchmarks. In this paper, we introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ), and show that its performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark. Moreover, REDQ can achieve this performance using fewer parameters than the model-based method, and with less wall-clock run time. REDQ has three carefully integrated ingredients which allow it to achieve its high performance: (i) a UTD ratio $\\gg 1$; (ii) an ensemble of Q functions; (iii) in-target minimization across a random subset of Q functions from the ensemble. Through carefully designed experiments, we provide a detailed analysis of REDQ and related model-free algorithms. To our knowledge, REDQ is the first successful model-free DRL algorithm for continuous-action spaces using a UTD ratio $\\gg 1$. ", "keywords": "Artificial Integlligence;Machine Learning;Deep Reinforcement Learning", "primary_area": "", "supplementary_material": "/attachment/60d908843f7fdbb8868156ff3c93985657859fb9.zip", "author": "Xinyue Chen;Che Wang;Zijian Zhou;Keith W. Ross", "authorids": "~Xinyue_Chen1;~Che_Wang1;~Zijian_Zhou1;~Keith_W._Ross1", "gender": "F;M;M;M", "homepage": ";https://watchernyu.github.io/me/;;http://www.nyu.edu/projects/keithwross/", "dblp": "124/5261;130/6621;;r/KWRoss", "google_scholar": "83MbL0IAAAAJ;cx_Kg8MAAAAJ;KjC2xroAAAAJ;https://scholar.google.com.tw/citations?user=RhUcYmQAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Xinyue_Chen1;~Che_Wang1;~Zijian_Zhou1;~Keith_W._Ross1", "aff": "New York University;New York University;Carnegie Mellon University;New York University", "aff_domain": "nyu.edu;nyu.edu;andrew.cmu.edu;nyu.edu", "position": "Undergrad student;PhD student;MS student;Full Professor", "bibtex": "@inproceedings{\nchen2021randomized,\ntitle={Randomized Ensembled Double Q-Learning: Learning Fast Without a Model},\nauthor={Xinyue Chen and Che Wang and Zijian Zhou and Keith W. Ross},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=AY8zfZm0tDd}\n}", "github": "[![github](/images/github_icon.svg) watchernyu/REDQ](https://github.com/watchernyu/REDQ) + [![Papers with Code](/images/pwc_icon.svg) 5 community implementations](https://paperswithcode.com/paper/?openreview=AY8zfZm0tDd)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "3;3;3;3", "wc_review": "225;249;355;577", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "196;289;263;421", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 351.5, 139.0782154041387 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 292.25, 81.71099987149833 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 323, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14970286903447223266&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=AY8zfZm0tDd", "email": "nyu.edu;nyu.edu;andrew.cmu.edu;nyu.edu", "author_num": 4, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "New York University;Carnegie Mellon University", "aff_unique_dep": ";", "aff_unique_url": "https://www.nyu.edu;https://www.cmu.edu", "aff_unique_abbr": "NYU;CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "AZ4vmLoJft", "title": "(Updated submission 11/20/2020) MISIM: A Novel Code Similarity System", "track": "main", "status": "Reject", "tldr": "", "abstract": "Semantic code similarity systems are integral to a range of applications from code recommendation to automated software defect correction. Yet, these systems still lack the maturity in accuracy for general and reliable wide-scale usage. To help address this, we present Machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure (CASS), which is designed to aid in lifting semantic meaning from code syntax. We compare CASS with the abstract syntax tree (AST) and show CASS is more accurate than AST by up to 1.67x. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to four state-of-the-art systems: (i) Aroma, (ii) code2seq, (iii) code2vec, and (iv) Neural Code Comprehension. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy across all four systems.\n", "keywords": "Machine Programming;Machine Learning;Code Similarity;Code Representation", "primary_area": "", "supplementary_material": "/attachment/2f8606bd6e2a36b00bb26e222898947b50547f42.zip", "author": "Fangke Ye;Shengtian Zhou;Anand Venkat;Ryan Marcus;Nesime Tatbul;Jesmin Jahan Tithi;Niranjan Hasabnis;Paul Petersen;Timothy G Mattson;Tim Kraska;Pradeep Dubey;Vivek Sarkar;Justin Gottschlich", "authorids": "yefangke@gatech.edu;~Shengtian_Zhou1;anand.venkat@intel.com;~Ryan_Marcus1;~Nesime_Tatbul1;jesmin.jahan.tithi@intel.com;niranjan.hasabnis@intel.com;paul.petersen@intel.com;timothy.g.mattson@intel.com;~Tim_Kraska1;~Pradeep_Dubey1;~Vivek_Sarkar2;~Justin_Gottschlich1", "gender": ";M;;M;;;;;;M;M;;", "homepage": ";;;https://rmarcus.info;https://people.csail.mit.edu/tatbul/;;;;;;https://newsroom.intel.com/biography/pradeep-k-dubey/;;", "dblp": ";;;https://dblp.uni-trier.de/pid/175/1473.html;t/NesimeTatbul;;;;;26/6037;https://dblp.uni-trier.de/pers/d/Dubey:Pradeep.html;;", "google_scholar": ";2z2FiKAAAAAJ;;vPOl-IwAAAAJ;YlsHgYQAAAAJ;;;;;;-ad5RSQAAAAJ;;", "orcid": ";;;0000-0002-1279-1124;0000-0002-0416-7022;;;;;;;;", "linkedin": ";shengtian-zhou/;;;nesime-tatbul-0724964;;;;;;pradeep-dubey-a5592a53/;;", "or_profile": "yefangke@gatech.edu;~Shengtian_Zhou1;anand.venkat@intel.com;~Ryan_Marcus1;~Nesime_Tatbul1;jesmin.jahan.tithi@intel.com;niranjan.hasabnis@intel.com;paul.petersen@intel.com;timothy.g.mattson@intel.com;~Tim_Kraska1;~Pradeep_Dubey1;~Vivek_Sarkar2;~Justin_Gottschlich1", "aff": ";Intel;;Computer Science and Artificial Intelligence Laboratory, Electrical Engineering & Computer Science;Massachusetts Institute of Technology;;;;;Massachusetts Institute of Technology;;Georgia Tech Research Corporation;", "aff_domain": ";intel.com;;csail.mit.edu;mit.edu;;;;;mit.edu;;;", "position": ";Researcher;;Postdoc;Sr. Research Scientist;;;;;Associate Professor;;;", "bibtex": "@misc{\nye2021updated,\ntitle={(Updated submission 11/20/2020) {\\{}MISIM{\\}}: A Novel Code Similarity System},\nauthor={Fangke Ye and Shengtian Zhou and Anand Venkat and Ryan Marcus and Nesime Tatbul and Jesmin Jahan Tithi and Niranjan Hasabnis and Paul Petersen and Timothy G Mattson and Tim Kraska and Pradeep Dubey and Vivek Sarkar and Justin Gottschlich},\nyear={2021},\nurl={https://openreview.net/forum?id=AZ4vmLoJft}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=AZ4vmLoJft", "pdf_size": 0, "rating": "4;5;5;7", "confidence": "5;5;4;3", "wc_review": "236;424;488;187", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "726;736;692;268", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 1.0897247358851685 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 333.75, 125.52763639932044 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 605.5, 195.53708088237383 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 13, 0 ], "corr_rating_confidence": -0.899228803025897, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:HFPeszlNKQMJ:scholar.google.com/&scioq=(Updated+submission+11/20/2020)+MISIM:+A+Novel+Code+Similarity+System&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;1;2", "aff_unique_norm": "Intel;Massachusetts Institute of Technology;Georgia Institute of Technology", "aff_unique_dep": "Intel Corporation;Computer Science and Artificial Intelligence Laboratory;Research Corporation", "aff_unique_url": "https://www.intel.com;https://www.csail.mit.edu;https://www.gatech.edu", "aff_unique_abbr": "Intel;CSAIL;Georgia Tech", "aff_campus_unique_index": "1", "aff_campus_unique": ";Cambridge", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "AZWHo-jkA_Q", "title": "Generative modeling with one recursive network", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We propose to train a multilayer perceptron simultaneously as an encoder and a decoder in order to create a high quality generative model. In one call a network is optimized as either an encoder or decoder, and in a second recursive call the network uses its own outputs to learn the remaining corresponding function, allowing for the minimization of popular statistical divergence measures over a single feed-forward function. This new approach derives from a simple reformulation of variational bayes and extends naturally to the domain of Generative Adversarial Nets. Here we demonstrate a single network which learns a generative model via an adversarial minimax game played against itself. Experiments demonstrate comparable efficacy for the single-network approach versus corresponding multi-network formulations.", "keywords": "Generative model;GAN;VAE;Recursive Neural Network;self-play", "primary_area": "", "supplementary_material": "/attachment/0095b9dd74d0153ccd0f8d5db69b55b6d893298f.zip", "author": "Benjamin Lincoln Brimacombe", "authorids": "~Benjamin_Lincoln_Brimacombe1", "gender": "M", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "benjamin-brimacombe/", "or_profile": "~Benjamin_Lincoln_Brimacombe1", "aff": "", "aff_domain": "", "position": "", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=AZWHo-jkA_Q", "pdf_size": 0, "rating": "2;2;4;4", "confidence": "5;4;3;4", "wc_review": "368;706;243;357", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.0, 1.0 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 418.5, 173.05273762642415 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.7071067811865475, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:3N3Ou9BGm6oJ:scholar.google.com/&scioq=Generative+modeling+with+one+recursive+network&hl=en&as_sdt=0,5", "gs_version_total": 0 }, { "id": "A_MbFRk3qT", "title": "Complex neural networks have no spurious local minima", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Most non-linear neural networks are known to have poor local minima (Yun et al. (2019)) and it is shown that training a neural network is NP-hard (Blum & Rivest (1988)). A line of work has studied the global optimality of neural networks in various settings but unfortunately all previous networks without spurious local minima are linear networks or networks with unrealistic assumptions. In this work we demonstrate for the first time that a non-linear neural network can have no poor local minima under no assumptions.\nRecently, a number of papers considered complex-valued neural networks (CVNNs) in various settings and suggest that CVNNs have competitive or even preferable behaviour compared to real-valued networks. Unfortunately, there is currently no theoretical analysis on the optimization of complex-valued networks, given that complex functions usually have a disparate optimization landscape. This is the first work towards analysing the optimization landscape of CVNNs. We prove a surprising result that no spurious local minima exist for one hidden layer complex-valued neural networks with quadratic activation. Since CVNNs can have real-valued datasets and there are no assumptions, our results are applicable to practical networks. Along the way, we develop a novel set of tools and techniques for analyzing the optimization of CVNNs, which may be useful in other contexts. Lastly, we prove spurious local minima exist for CVNNs with non-analytic CReLU activation.", "keywords": "Deep learning;Non-convex optimization;Complex-valued neural networks;Optimization landscape;Wirtinger calculus", "primary_area": "", "supplementary_material": "", "author": "Xingtu Liu", "authorids": "~Xingtu_Liu1", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=A_MbFRk3qT", "pdf_size": 0, "rating": "4;4;4", "confidence": "4;3;2", "wc_review": "619;769;229", "wc_reply_reviewers": "315;268;0", "wc_reply_authors": "1101;616;550", "reply_reviewers": "1;1;0", "reply_authors": "2;1;1", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 539.0, 227.59613353482084 ], "wc_reply_reviewers_avg": [ 194.33333333333334, 138.7475725513383 ], "wc_reply_authors_avg": [ 755.6666666666666, 245.6696064953 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:rpoZj2kgucIJ:scholar.google.com/&scioq=Complex+neural+networks+have+no+spurious+local+minima&hl=en&as_sdt=0,5", "gs_version_total": 0 }, { "id": "AcH9xD24Hd", "title": "Learning the Step-size Policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm", "track": "main", "status": "Reject", "tldr": "", "abstract": "We consider the problem of how to learn a step-size policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm. This is a limited computational memory quasi-Newton method widely used for deterministic unconstrained optimization but currently avoided in large-scale problems for requiring step sizes to be provided at each iteration. Existing methodologies for the step size selection for L-BFGS use heuristic tuning of design parameters and massive re-evaluations of the objective function and gradient to find appropriate step-lengths. We propose a neural network architecture with local information of the current iterate as the input. The step-length policy is learned from data of similar optimization problems, avoids additional evaluations of the objective function, and guarantees that the output step remains inside a pre-defined interval. The corresponding training procedure is formulated as a stochastic optimization problem using the backpropagation through time algorithm. The performance of the proposed method is evaluated on the training of classifiers for the MNIST database for handwritten digits and for CIFAR-10. The results show that the proposed algorithm outperforms heuristically tuned optimizers such as ADAM, RMSprop, L-BFGS with a backtracking line search and L-BFGS with a constant step size. The numerical results also show that a learned policy can be used as a warm-start to train new policies for different problems after a few additional training steps, highlighting its potential use in multiple large-scale optimization problems.", "keywords": "Unconstrained optimization;Step-size policy;L-BFGS;Learned optimizers", "primary_area": "", "supplementary_material": "/attachment/b0358397bb478fe8dca7fefb7ca7e1885643ba00.zip", "author": "Lucas N. Egidio;Anders Hansson;Bo Wahlberg", "authorids": "~Lucas_N._Egidio1;anders.g.hansson@liu.se;~Bo_Wahlberg1", "gender": "M;;M", "homepage": ";;https://www.kth.se/profile/bo", "dblp": ";;87/1451", "google_scholar": "_BWreAgAAAAJ;;https://scholar.google.se/citations?user=fDeSgLwAAAAJ", "orcid": "0000-0003-4096-5969;;0000-0002-1927-1690", "linkedin": ";;", "or_profile": "~Lucas_N._Egidio1;anders.g.hansson@liu.se;~Bo_Wahlberg1", "aff": "Link\u00f6ping University;;KTH Royal Institute of Technology, Stockholm, Sweden", "aff_domain": "liu.se;;kth.se", "position": "Postdoc;;Full Professor", "bibtex": "@misc{\negidio2021learning,\ntitle={Learning the Step-size Policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm},\nauthor={Lucas N. Egidio and Anders Hansson and Bo Wahlberg},\nyear={2021},\nurl={https://openreview.net/forum?id=AcH9xD24Hd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=AcH9xD24Hd", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "3;4;4;3", "wc_review": "510;818;390;208", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "775;1251;763;663", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 481.5, 222.04672931615093 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 863.0, 228.19290085364182 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8997035435900193474&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1", "aff_unique_norm": "Link\u00f6ping University;KTH Royal Institute of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.liu.se;https://www.kth.se", "aff_unique_abbr": "LiU;KTH", "aff_campus_unique_index": "1", "aff_campus_unique": ";Stockholm", "aff_country_unique_index": "0;0", "aff_country_unique": "Sweden" }, { "title": "On InstaHide, Phase Retrieval, and Sparse Matrix Factorization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2969", "id": "AhElGnhU2BV", "poster": "", "openreview": "https://openreview.net/forum?id=AhElGnhU2BV", "slides": "https://iclr.cc/virtual/2021/poster/2969", "video": "https://iclr.cc/virtual/2021/poster/2969", "author_site": "Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo", "tldr": "", "abstract": "In this work, we examine the security of InstaHide, a scheme recently proposed by \\cite{hsla20} for preserving the security of private datasets in the context of distributed learning. To generate a synthetic training example to be shared among the distributed learners, InstaHide takes a convex combination of private feature vectors and randomly flips the sign of each entry of the resulting vector with probability 1/2. A salient question is whether this scheme is secure in any provable sense, perhaps under a plausible complexity-theoretic assumption. \n\nThe answer to this turns out to be quite subtle and closely related to the average-case complexity of a multi-task, missing-data version of the classic problem of phase retrieval that is interesting in its own right. Motivated by this connection, under the standard distributional assumption that the public/private feature vectors are isotropic Gaussian, we design an algorithm that can actually recover a private vector using only the public vectors and a sequence of synthetic vectors generated by InstaHide.", "keywords": "Distributed learning;InstaHide;phase retrieval;matrix factorization", "primary_area": "", "supplementary_material": "/attachment/9ec530bd63b5fbc55077903ecec080592e41d77f.zip", "author": "Sitan Chen;Xiaoxiao Li;Zhao Song;Danyang Zhuo", "authorids": "~Sitan_Chen1;~Xiaoxiao_Li1;~Zhao_Song3;~Danyang_Zhuo1", "gender": "M;Unspecified;M;M", "homepage": "https://sitanchen.com;https://xxlya.github.io/;https://www.youtube.com/@zhaosong2031;https://danyangzhuo.com/", "dblp": "141/7670;71/8042;76/4051-2;151/7537", "google_scholar": "YnJVsp4AAAAJ;sdENOQ4AAAAJ;yDZct7UAAAAJ;E3yOuvEAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Sitan_Chen1;~Xiaoxiao_Li1;~Zhao_Song3;~Danyang_Zhuo1", "aff": "Massachusetts Institute of Technology;Princeton University;Princeton University;Duke University", "aff_domain": "mit.edu;princeton.edu;princeton.edu;duke.edu", "position": "PhD student;Postdoc;Postdoc;Assistant Professor", "bibtex": "@inproceedings{\nchen2021on,\ntitle={On InstaHide, Phase Retrieval, and Sparse Matrix Factorization},\nauthor={Sitan Chen and Xiaoxiao Li and Zhao Song and Danyang Zhuo},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=AhElGnhU2BV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "4;7;7;8", "confidence": "4;3;3;4", "wc_review": "1553;183;295;414", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1137;22;111;402", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.5, 1.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 611.25, 549.8210504336843 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 418.0, 438.25848537136164 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.33333333333333337, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16277894630562390734&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=AhElGnhU2BV", "email": "mit.edu;princeton.edu;princeton.edu;duke.edu", "author_num": 4, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "Massachusetts Institute of Technology;Princeton University;Duke University", "aff_unique_dep": ";;", "aff_unique_url": "https://web.mit.edu;https://www.princeton.edu;https://www.duke.edu", "aff_unique_abbr": "MIT;Princeton;Duke", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "AhLeNin_5sh", "title": "GenAD: General Representations of Multivariate Time Series for Anomaly Detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "Anomaly Detection(AD) for multivariate time series is an active area in machine learning, with critical applications in Information Technology system management, Spacecraft Health monitoring, Multi-Robot Systems detection, etc.. However, due to complex correlations and various temporal patterns of large-scale multivariate time series, a general unsupervised anomaly detection model with higher F1-score and Timeliness remains a challenging task. In this paper, We propose a General representations of multivariate time series for Anomaly Detection(GenAD). First, we apply Time-Series Attention to represent the various temporal patterns of each time series. Second, we employ Multi-Correlation Attention to represent the complex correlations of multivariate time series. With the above innovations, GenAD improves F1-scores of AD by 0.3% to 5% over state-of-the-art model in public datasets, while detecting anomalies more rapidly in anomaly segments. Moreover, we propose a general pre-training algorithm on large-scale multivariate time series, which can be easily transferred to a specific AD tasks with only a few fine-tuning steps. Extensive experiments show that GenAD is able to outperform state-of-the-art model with only 10% of the training data. ", "keywords": "Anomaly Detection;Multivariate Time Series;General Representations", "primary_area": "", "supplementary_material": "", "author": "Xiaolei Hua;Su Wang;Lin Zhu;Dong Zhou;Junlan Feng;Yiting Wang;Chao Deng;Shuo Wang;Mingtao Mei", "authorids": "~Xiaolei_Hua1;~Su_Wang1;~Lin_Zhu6;~Dong_Zhou1;~Junlan_Feng2;~Yiting_Wang1;~Chao_Deng3;~Shuo_Wang12;~Mingtao_Mei1", "gender": "M;M;F;M;F;;;M;F", "homepage": ";;;;http://cmri.hq.cmcc/;https://www.bupt.edu.cn/;;;", "dblp": ";37/5976;;15/2101;;;;;36/3948", "google_scholar": ";;;;;;;https://scholar.google.com/citations?hl=en;https://scholar.google.es/citations?user=rBjPtmQAAAAJ", "orcid": "0000-0002-0251-5484;;0000-0003-1167-1953;;;;;0000-0003-4449-5247;0000-0001-5292-2945", "linkedin": ";;;;;;;https://www.linkedin.cn/incareer/in/ACoAAB5sppAB_Da2tlvgSyM7NFTWl6d1DhZZe1o;junlan-feng-8968ba11/", "or_profile": "~Xiaolei_Hua1;~Su_Wang1;~Lin_Zhu6;~Dong_Zhou1;~Yiting_Wang1;~Shuo_Wang12;~Mingtao_Mei1;~Chao_Deng4;~Junlan_Feng3", "aff": ";Beijing University of Post and Telecommunication;;Beijing University of Post and Telecommunication;;Beijing University of Posts and Telecommunications;;China Mobile Research Institute;China Mobile", "aff_domain": ";bupt.edu.cn;;bupt.edu.cn;;bupt.edu.cn;;jiutian.10086.cn;ioa.ac.cn", "position": ";MS student;;Undergrad student;;Assistant Professor;;Researcher;Principal Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=AhLeNin_5sh", "pdf_size": 0, "rating": "3;4;5", "confidence": "4;3;4", "wc_review": "272;141;346", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "834;786;757", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 253.0, 84.76241305358565 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 792.3333333333334, 31.752515210959622 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1569733284903894291&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0;1;1", "aff_unique_norm": "Beijing University of Posts and Telecommunications;China Mobile", "aff_unique_dep": ";Research Institute", "aff_unique_url": "http://www.bupt.edu.cn/;https://www.chinamobile.com/", "aff_unique_abbr": "BUPT;CMRI", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Beijing;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "China" }, { "id": "AhgxLhJvbpS", "title": "Effective Training of Sparse Neural Networks under Global Sparsity Constraint", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Weight pruning is an effective technique to reduce the model size and inference time for deep neural networks in real-world deployments. However, since magnitudes and relative importance of weights are very different for different layers of a neural network, existing methods rely on either manual tuning or handcrafted heuristic rules to find appropriate pruning rates individually for each layer. This approach general leads to suboptimal performance. In this paper, by directly working on the probability space, we propose an effective network sparsification method called {\\it probabilistic masking} (ProbMask), which solves a natural sparsification formulation under global sparsity constraint. The key idea is to use probability as a global criterion for all layers to measure the weight importance. An appealing feature of ProbMask is that the amounts of weight redundancy can be learned automatically via our constraint and thus we avoid the problem of tuning pruning rates individually for different layers in a network. Extensive experimental results on CIFAR-10/100 and ImageNet demonstrate that our method is highly effective, and can outperform previous state-of-the-art methods by a significant margin, especially in the high pruning rate situation. Notably, the gap of Top-1 accuracy between our ProbMask and existing methods can be up to 10\\%. As a by-product, we show ProbMask is also highly effective in identifying supermasks, which are subnetworks with high performance in a randomly weighted dense neural network.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Xiao Zhou;Weizhong Zhang;Tong Zhang", "authorids": "~Xiao_Zhou4;~Weizhong_Zhang1;~Tong_Zhang2", "gender": ";M;M", "homepage": "https://x-zho14.github.io;https://facultyprofiles.ust.hk/profiles.php?profile=weizhong-zhang-weizhong;http://tongzhang-ml.org", "dblp": ";39/2330.html;07/4227-1", "google_scholar": "https://scholar.google.com.hk/citations?user=mNiFF4wAAAAJ;qd06pUgAAAAJ;LurWtuYAAAAJ", "orcid": ";0000-0001-7311-0698;0000-0002-5511-2558", "linkedin": ";;", "or_profile": "~Xiao_Zhou4;~Weizhong_Zhang1;~Tong_Zhang2", "aff": "Hong Kong University of Science and Technology;Hong Kong University of Science and Technology;Hong Kong University of Science and Technology", "aff_domain": "ust.hk;ust.hk;ust.hk", "position": "PhD student;Research Assistant Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=AhgxLhJvbpS", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "5;4;5;4", "wc_review": "805;463;310;385", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "650;626;340;351", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 490.75, 189.32561237191337 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 491.75, 146.54756053923245 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ybV9Wp1NGNEJ:scholar.google.com/&scioq=Effective+Training+of+Sparse+Neural+Networks+under+Global+Sparsity+Constraint&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Hong Kong University of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ust.hk", "aff_unique_abbr": "HKUST", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Hong Kong SAR", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "Aj4_e50nB8", "title": "Contextual Knowledge Distillation for Transformer Compression", "track": "main", "status": "Reject", "tldr": "", "abstract": "A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning. Knowledge distillation, a major technique for deploying such a vast language model in resource-scarce environments, transfers the knowledge on individual word representations learned without restrictions. In this paper, inspired by the recent observations that language representations are relatively positioned and have more semantic knowledge as a whole, we present a new knowledge distillation strategy for language representation learning that transfers the contextual knowledge via two types of relationships across representations: Word Relation and Layer Transforming Relation. We validate the effectiveness of our method on challenging benchmarks of language understanding tasks. The code will be released.", "keywords": "Knowledge Distillation;Transformer Compression;BERT", "primary_area": "", "supplementary_material": "", "author": "Geondo Park;Gyeongman Kim;Eunho Yang", "authorids": "~Geondo_Park1;~Gyeongman_Kim1;~Eunho_Yang1", "gender": "M;;M", "homepage": ";;https://sites.google.com/site/hleehome2/", "dblp": "256/5123;302/0097;96/2621", "google_scholar": ";31D801cAAAAJ;", "orcid": ";;", "linkedin": ";gyeongman-kim-592257225/;", "or_profile": "~Geondo_Park1;~Gyeongman_Kim1;~Eunho_Yang1", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "position": "PhD student;MS student;Associate Professor", "bibtex": "@misc{\npark2021contextual,\ntitle={Contextual Knowledge Distillation for Transformer Compression},\nauthor={Geondo Park and Gyeongman Kim and Eunho Yang},\nyear={2021},\nurl={https://openreview.net/forum?id=Aj4_e50nB8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Aj4_e50nB8", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;3;5;4", "wc_review": "368;266;243;183", "wc_reply_reviewers": "165;63;12;0", "wc_reply_authors": "2037;1672;730;853", "reply_reviewers": "1;1;1;0", "reply_authors": "3;3;2;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 265.0, 66.74204072396948 ], "wc_reply_reviewers_avg": [ 60.0, 65.07303589045158 ], "wc_reply_authors_avg": [ 1323.0, 548.66793235982 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.7071067811865475, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:kh1yUbxhAZMJ:scholar.google.com/&scioq=Contextual+Knowledge+Distillation+for+Transformer+Compression&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "AjrRA6WYSW", "title": "Estimation of Number of Communities in Assortative Sparse Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Most community detection algorithms assume the number of communities, $K$, to be known \\textit{a priori}. Among various approaches that have been proposed to estimate $K$, the non-parametric methods based on the spectral properties of the Bethe Hessian matrices have garnered much popularity for their simplicity, computational efficiency, and robust performance irrespective of the sparsity of the input data. Recently, one such method has been shown to estimate $K$ consistently if the input network is generated from the (semi-dense) stochastic block model, when the average of the expected degrees ($\\tilde{d}$) of all the nodes in the network satisfies $\\tilde{d} \\gg \\log(N)$ ($N$ being the number of nodes in the network). In this paper, we prove some finite sample results that hold for $\\tilde{d} = o(\\log(N))$, which in turn show that the estimation of $K$ based on the spectra of the Bethe Hessian matrices is consistent not only for the semi-dense regime, but also for the sub-logarithmic sparse regime when $1 \\ll \\tilde{d} \\ll \\log(N)$. Thus, our estimation procedure is a robust method for a wide range of problem settings, regardless of the sparsity of the network input.", "keywords": "networks;number of communities;Bethe-Hessian;sparse networks;stochastic block model", "primary_area": "", "supplementary_material": "/attachment/1d6e7ec6d8dd05f177f71265a2686784ec3393f7.zip", "author": "Neil Hwang;Jiarui Xu;Shirshendu Chatterjee;Sharmodeep Bhattacharyya", "authorids": "neilhwang@gmail.com;xujiar@oregonstate.edu;shirshendu@ccny.cuny.edu;~Sharmodeep_Bhattacharyya1", "gender": ";;;M", "homepage": ";;;https://stat.oregonstate.edu/people/bhattacharyya-sharmodeep", "dblp": ";;;140/7437", "google_scholar": ";;;1WHKFAsAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "neilhwang@gmail.com;xujiar@oregonstate.edu;shirshendu@ccny.cuny.edu;~Sharmodeep_Bhattacharyya1", "aff": ";;;Oregon State University", "aff_domain": ";;;oregonstate.edu", "position": ";;;Assistant Professor", "bibtex": "@misc{\nhwang2021estimation,\ntitle={Estimation of Number of Communities in Assortative Sparse Networks},\nauthor={Neil Hwang and Jiarui Xu and Shirshendu Chatterjee and Sharmodeep Bhattacharyya},\nyear={2021},\nurl={https://openreview.net/forum?id=AjrRA6WYSW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=AjrRA6WYSW", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "2;4;3;4", "wc_review": "729;1223;262;1187", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 850.25, 391.49800446490144 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.8528028654224418, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:pSgALUfqWLQJ:scholar.google.com/&scioq=Estimation+of+Number+of+Communities+in+Assortative+Sparse+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Oregon State University", "aff_unique_dep": "", "aff_unique_url": "https://oregonstate.edu", "aff_unique_abbr": "OSU", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "Al7Wpsy49g", "title": "Noisy Agents: Self-supervised Exploration by Predicting Auditory Events", "track": "main", "status": "Reject", "tldr": "", "abstract": "Humans integrate multiple sensory modalities (e.g., visual and audio) to build a causal understanding of the physical world. In this work, we propose a novel type of intrinsic motivation for Reinforcement Learning (RL) that encourages the agent to understand the causal effect of its actions through auditory event prediction. First, we allow the agent to collect a small amount of acoustic data and use K-means to discover underlying auditory event clusters. We then train a neural network to predict the auditory events and use the prediction errors as intrinsic rewards to guide RL exploration. We first conduct an in-depth analysis of our module using a set of Atari games. We then apply our model to audio-visual exploration using the Habitat simulator and active learning using the TDW simulator. Experimental results demonstrate the advantages of using audio signals over vision-based models as intrinsic rewards to guide RL explorations.", "keywords": "Audio Curiosity;RL exploration", "primary_area": "", "supplementary_material": "/attachment/37a2b9637cc72465bfa7cff47087a1cece31e9b4.zip", "author": "Chuang Gan;Xiaoyu Chen;Phillip Isola;Antonio Torralba;Joshua B. Tenenbaum", "authorids": "~Chuang_Gan1;~Xiaoyu_Chen4;~Phillip_Isola1;~Antonio_Torralba1;~Joshua_B._Tenenbaum1", "gender": "M;;M;M;", "homepage": "http://people.csail.mit.edu/ganchuang/;https://github.com/Cospui;http://web.mit.edu/phillipi/;http://web.mit.edu/torralba/www//;", "dblp": "139/6993;;36/9988;t/AntonioBTorralba;t/JoshuaBTenenbaum", "google_scholar": "PTeSCbIAAAAJ;;ROILf3EAAAAJ;https://scholar.google.com.tw/citations?user=8cxDHS4AAAAJ;", "orcid": ";;0000-0002-1411-6704;;", "linkedin": ";;phillip-isola-a9955b20/;;", "or_profile": "~Chuang_Gan1;~Xiaoyu_Chen4;~Phillip_Isola1;~Antonio_Torralba1;~Joshua_B._Tenenbaum1", "aff": "MIT-IBM Watson AI Lab;Tsinghua University;Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology", "aff_domain": "ibm.com;tsinghua.edu.cn;mit.edu;mit.edu;mit.edu", "position": "PhD student;Undergrad student;Assistant Professor;Full Professor;Professor", "bibtex": "@misc{\ngan2021noisy,\ntitle={Noisy Agents: Self-supervised Exploration by Predicting Auditory Events},\nauthor={Chuang Gan and Xiaoyu Chen and Phillip Isola and Antonio Torralba and Joshua B. Tenenbaum},\nyear={2021},\nurl={https://openreview.net/forum?id=Al7Wpsy49g}\n}", "github": "", "project": "", "reviewers": "AnonReviewer6;AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=Al7Wpsy49g", "pdf_size": 0, "rating": "2;4;4;5;5;6", "confidence": "4;4;3;3;4;4", "wc_review": "3162;665;492;623;497;1430", "wc_reply_reviewers": "0;0;18;0;78;215", "wc_reply_authors": "0;703;210;289;711;644", "reply_reviewers": "0;0;1;0;1;1", "reply_authors": "0;2;2;1;3;2", "rating_avg": [ 4.333333333333333, 1.247219128924647 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 1144.8333333333333, 957.3148936246399 ], "wc_reply_reviewers_avg": [ 51.833333333333336, 78.00943675108942 ], "wc_reply_authors_avg": [ 426.1666666666667, 274.5841805753241 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.0944911182523068, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16431875766027666090&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;0;0;0", "aff_unique_norm": "Massachusetts Institute of Technology;Tsinghua University", "aff_unique_dep": "IBM Watson AI Lab;", "aff_unique_url": "https://www.mitibmwatsonailab.org;https://www.tsinghua.edu.cn", "aff_unique_abbr": "MIT-IBM AI Lab;THU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0;0", "aff_country_unique": "United States;China" }, { "id": "Ao2-JgYxuQf", "title": "Active Tuning", "track": "main", "status": "Reject", "tldr": "", "abstract": "We introduce Active Tuning, a novel paradigm for optimizing the internal dynamics of recurrent neural networks (RNNs) on the fly. In contrast to the conventional sequence-to-sequence mapping scheme, Active Tuning decouples the RNN's recurrent neural activities from the input stream, using the unfolding temporal gradient signal to tune the internal dynamics into the data stream. As a consequence, the model output depends only on its internal hidden dynamics and the closed-loop feedback of its own predictions; its hidden state is continuously adapted by means of the temporal gradient resulting from backpropagating the discrepancy between the signal observations and the model outputs through time. In this way, Active Tuning infers the signal actively but indirectly based on the originally learned temporal patterns, fitting the most plausible hidden state sequence into the observations. We demonstrate the effectiveness of Active Tuning on several time series prediction benchmarks, including multiple super-imposed sine waves, a chaotic double pendulum, and spatiotemporal wave dynamics. Active Tuning consistently improves the robustness, accuracy, and generalization abilities of all evaluated models. Moreover, networks trained for signal prediction and denoising can be successfully applied to a much larger range of noise conditions with the help of Active Tuning. Thus, given a capable time series predictor, Active Tuning enhances its online signal filtering, denoising, and reconstruction abilities without the need for additional training.", "keywords": "Signal Filtering;Recurrent Neural Network;Time Series;Denoising;Temporal Gradients", "primary_area": "", "supplementary_material": "/attachment/1b3d7fa8bdd9d5ced40b54007498539eba6b456c.zip", "author": "Sebastian Otte;Matthias Karlbauer;Martin V. Butz", "authorids": "~Sebastian_Otte1;~Matthias_Karlbauer1;martin.butz@uni-tuebingen.de", "gender": ";M;", "homepage": ";https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/cognitive-modeling/staff/matthias-karlbauer/;", "dblp": ";;", "google_scholar": ";;", "orcid": ";0000-0002-4509-7921;", "linkedin": ";;", "or_profile": "~Sebastian_Otte1;~Matthias_Karlbauer1;martin.butz@uni-tuebingen.de", "aff": ";Max-Planck Institute;", "aff_domain": ";mpg.de;", "position": ";PhD student;", "bibtex": "@misc{\notte2021active,\ntitle={Active Tuning},\nauthor={Sebastian Otte and Matthias Karlbauer and Martin V. Butz},\nyear={2021},\nurl={https://openreview.net/forum?id=Ao2-JgYxuQf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=Ao2-JgYxuQf", "pdf_size": 0, "rating": "3;5;8", "confidence": "4;3;3", "wc_review": "253;178;387", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "280;346;155", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 2.0548046676563256 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 272.6666666666667, 86.44972848746002 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 260.3333333333333, 79.20577981154882 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8029550685469661, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1552964637093593991&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Max-Planck-Gesellschaft zur F\u00f6rderung der Wissenschaften e.V.", "aff_unique_dep": "", "aff_unique_url": "https://www.mpg.de", "aff_unique_abbr": "MPG", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "id": "Aoq37n5bhpJ", "title": "Federated learning using mixture of experts", "track": "main", "status": "Reject", "tldr": "", "abstract": "Federated learning has received attention for its efficiency and privacy benefits,in settings where data is distributed among devices. Although federated learning shows significant promise as a key approach when data cannot be shared or centralized, current incarnations show limited privacy properties and have short-comings when applied to common real-world scenarios. One such scenario is heterogeneous data among devices, where data may come from different generating distributions. In this paper, we propose a federated learning framework using a mixture of experts to balance the specialist nature of a locally trained model with the generalist knowledge of a global model in a federated learning setting. Our results show that the mixture of experts model is better suited as a personalized model for devices when data is heterogeneous, outperforming both global and lo-cal models. Furthermore, our framework gives strict privacy guarantees, which allows clients to select parts of their data that may be excluded from the federation. The evaluation shows that the proposed solution is robust to the setting where some users require a strict privacy setting and do not disclose their models to a central server at all, opting out from the federation partially or entirely. The proposed framework is general enough to include any kind of machine learning models, and can even use combinations of different kinds.", "keywords": "federated learning;mixture of experts", "primary_area": "", "supplementary_material": "/attachment/8f7e644e14bb85b1de3b92ff649434199e6d2a35.zip", "author": "Edvin Listo Zec;John Martinsson;Olof Mogren;Leon Ren\u00e9 S\u00fctfeld;Daniel Gillblad", "authorids": "~Edvin_Listo_Zec1;~John_Martinsson1;~Olof_Mogren1;~Leon_Ren\u00e9_S\u00fctfeld1;~Daniel_Gillblad1", "gender": "M;M;M;M;", "homepage": "https://edvinli.github.io/;https://johnmartinsson.org;http://mogren.one/;http://www.leonsuetfeld.com;", "dblp": ";224/2647;;;48/5973", "google_scholar": "https://scholar.google.se/citations?user=Ft52aSsAAAAJ;https://scholar.google.se/citations?user=sAMIwlMAAAAJ;https://scholar.google.com/citations?hl=en;;", "orcid": ";0000-0002-5032-4367;;0000-0003-3995-8833;", "linkedin": "edvin-listo-zec/;john-martinsson-2541b772/;;leonsuetfeld/;", "or_profile": "~Edvin_Listo_Zec1;~John_Martinsson1;~Olof_Mogren1;~Leon_Ren\u00e9_S\u00fctfeld1;~Daniel_Gillblad1", "aff": "KTH Royal Institute of Technology;RISE Research Institutes of Sweden;RISE Research Institutes of Sweden;;AI Sweden", "aff_domain": "kth.se;ri.se;ri.se;;ai.se", "position": "PhD student;Researcher;Researcher;;co-Director", "bibtex": "@misc{\nzec2021federated,\ntitle={Federated learning using mixture of experts},\nauthor={Edvin Listo Zec and John Martinsson and Olof Mogren and Leon Ren{\\'e} S{\\\"u}tfeld and Daniel Gillblad},\nyear={2021},\nurl={https://openreview.net/forum?id=Aoq37n5bhpJ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=Aoq37n5bhpJ", "pdf_size": 0, "rating": "3;3;3;6", "confidence": "4;5;4;5", "wc_review": "384;265;343;575", "wc_reply_reviewers": "0;0;0;108", "wc_reply_authors": "509;359;275;696", "reply_reviewers": "0;0;0;2", "reply_authors": "1;1;1;2", "rating_avg": [ 3.75, 1.299038105676658 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 391.75, 114.10822713546995 ], "wc_reply_reviewers_avg": [ 27.0, 46.76537180435969 ], "wc_reply_authors_avg": [ 459.75, 160.0958697156176 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2130165274801561676&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "KTH Royal Institute of Technology;RISE Research Institutes of Sweden;AI Sweden", "aff_unique_dep": ";;", "aff_unique_url": "https://www.kth.se;https://www.rise.se;https://www.aisweden.org", "aff_unique_abbr": "KTH;RISE;AI Sweden", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Sweden" }, { "id": "Atpv9GUhRt6", "title": "Learning from multiscale wavelet superpixels using GNN with spatially heterogeneous pooling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural networks have become the standard for image classification tasks. On one hand, convolutional neural networks (CNNs) achieve state-of-the-art performance by learning from a regular grid representation of images. On the other hand, graph neural networks (GNNs) have shown promise in learning image classification from an embedded superpixel graph. However, in the latter, studies have been restricted to SLIC superpixels, where 1) a single target number of superpixels is arbitrarily defined for an entire dataset irrespective of differences across images and 2) the superpixels in a given image are of similar size despite intrinsic multiscale structure. In this study, we investigate learning from a new principled representation in which individual images are represented by an image-specific number of multiscale superpixels. We propose WaveMesh, a wavelet-based superpixeling algorithm, where the number and sizes of superpixels in an image are systematically computed based on the image content. We also present WavePool, a spatially heterogeneous pooling scheme tailored to WaveMesh superpixels. We study the feasibility of learning from the WaveMesh superpixel representation using SplineCNN, a state-of-the-art network for image graph classification. We show that under the same network architecture and training settings, SplineCNN with original Graclus-based pooling learns from WaveMesh superpixels on-par with SLIC superpixels. Additionally, we observe that the best performance is achieved when replacing Graclus-based pooling with WavePool while using WaveMesh superpixels.", "keywords": "Image classification;GNN;superpixel;SLIC;wavelet", "primary_area": "", "supplementary_material": "", "author": "Maxime Bassenne;Varun Vasudevan;Lei Xing", "authorids": "~Maxime_Bassenne1;devan@stanford.edu;~Lei_Xing1", "gender": "M;;M", "homepage": ";;http://med.stanford.edu/xinglab.html", "dblp": ";;", "google_scholar": "bG9qt-QAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Maxime_Bassenne1;devan@stanford.edu;~Lei_Xing1", "aff": ";;Stanford University", "aff_domain": ";;stanford.edu", "position": ";;Professor, Dept of Radiation Oncology,", "bibtex": "@misc{\nbassenne2021learning,\ntitle={Learning from multiscale wavelet superpixels using {\\{}GNN{\\}} with spatially heterogeneous pooling},\nauthor={Maxime Bassenne and Varun Vasudevan and Lei Xing},\nyear={2021},\nurl={https://openreview.net/forum?id=Atpv9GUhRt6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=Atpv9GUhRt6", "pdf_size": 0, "rating": "2;5;5;7", "confidence": "4;4;3;4", "wc_review": "364;271;155;327", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1301;787;305;619", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.75, 1.7853571071357126 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 279.25, 79.00751546530242 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 753.0, 360.59672766124766 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.08084520834544431, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:RabBMFkY_aEJ:scholar.google.com/&scioq=Learning+from+multiscale+wavelet+superpixels+using+GNN+with+spatially+heterogeneous+pooling&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Stanford University", "aff_unique_dep": "", "aff_unique_url": "https://www.stanford.edu", "aff_unique_abbr": "Stanford", "aff_campus_unique_index": "0", "aff_campus_unique": "Stanford", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "Au1gNqq4brw", "title": "SEQUENCE-LEVEL FEATURES: HOW GRU AND LSTM CELLS CAPTURE N-GRAMS", "track": "main", "status": "Reject", "tldr": "", "abstract": "Modern recurrent neural networks (RNN) such as Gated Recurrent Units (GRU) and Long Short-term Memory (LSTM) have demonstrated impressive results on tasks involving sequential data in practice. Despite continuous efforts on interpreting their behaviors, the exact mechanism underlying their successes in capturing sequence-level information have not been thoroughly understood. In this work, we present a study on understanding the essential features captured by GRU/LSTM cells by mathematically expanding and unrolling the hidden states. Based on the expanded and unrolled hidden states, we find there was a type of sequence-level representations brought in by the gating mechanism, which enables the cells to encode sequence-level features along with token-level features. Specifically, we show that the cells would consist of such sequence-level features similar to those of N-grams. Based on such a finding, we also found that replacing the hidden states of the standard cells with N-gram representations does not necessarily degrade performance on the sentiment analysis and language modeling tasks, indicating such features may play a significant role for GRU/LSTM cells.", "keywords": "GRU;LSTM;Sequence-level;Features;N-grams", "primary_area": "", "supplementary_material": "", "author": "Xiaobing Sun;Wei Lu", "authorids": "~Xiaobing_Sun1;luwei@sutd.edu.sg", "gender": "M;", "homepage": ";", "dblp": "30/4077-2;", "google_scholar": "https://scholar.google.com/citations?hl=en;", "orcid": ";", "linkedin": ";", "or_profile": "~Xiaobing_Sun1;luwei@sutd.edu.sg", "aff": "Singapore University of Technology and Design;", "aff_domain": "sutd.edu.sg;", "position": "PhD student;", "bibtex": "@misc{\nsun2021sequencelevel,\ntitle={{\\{}SEQUENCE{\\}}-{\\{}LEVEL{\\}} {\\{}FEATURES{\\}}: {\\{}HOW{\\}} {\\{}GRU{\\}} {\\{}AND{\\}} {\\{}LSTM{\\}} {\\{}CELLS{\\}} {\\{}CAPTURE{\\}} N-{\\{}GRAMS{\\}}},\nauthor={Xiaobing Sun and Wei Lu},\nyear={2021},\nurl={https://openreview.net/forum?id=Au1gNqq4brw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer5;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=Au1gNqq4brw", "pdf_size": 0, "rating": "3;4;4;5;6", "confidence": "3;2;4;4;4", "wc_review": "407;471;947;450;168", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "362;296;654;86;111", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 4.4, 1.0198039027185568 ], "confidence_avg": [ 3.4, 0.8 ], "wc_review_avg": [ 488.6, 253.52443669200804 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 301.8, 205.26899424900975 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5393193716300062, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:b_nzoNefgZUJ:scholar.google.com/&scioq=SEQUENCE-LEVEL+FEATURES:+HOW+GRU+AND+LSTM+CELLS+CAPTURE+N-GRAMS&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Singapore University of Technology and Design", "aff_unique_dep": "", "aff_unique_url": "https://www.sutd.edu.sg", "aff_unique_abbr": "SUTD", "aff_country_unique_index": "0", "aff_country_unique": "Singapore" }, { "id": "AwPGPgExiYA", "title": "Differentiable Learning of Graph-like Logical Rules from Knowledge Graphs", "track": "main", "status": "Reject", "tldr": "", "abstract": "Logical rules inside a knowledge graph (KG) are essential for reasoning, logical inference, and rule mining. However, existing works can only handle simple, i.e., chain-like and tree-like, rules and cannot capture KG's complex semantics, which can be better captured by graph-like rules. Besides, learning graph-like rules is very difficult because the graph structure exhibits a huge discrete search space. To address these issues, observing that the plausibility of logical rules can be explained by how frequently it appears in a KG, we propose a score function that represents graph-like rules with learnable parameters. The score also helps relax the discrete space into a continuous one and can be uniformly transformed into matrix form by the Einstein summation convention. Thus, it allows us to learn graph-like rules in an efficient, differentiable, and end-to-end training manner by optimizing the normalized score. We conduct extensive experiments on real-world datasets to show that our method outperforms previous works due to logical rules' better expressive ability. Furthermore, we demonstrate that our method can learn high-quality and interpretable graph-like logical rules.", "keywords": "knowledge graph;logical rules;logical query", "primary_area": "", "supplementary_material": "", "author": "Hongzhi Shi;quanming yao;Yong Li", "authorids": "~Hongzhi_Shi1;~quanming_yao1;~Yong_Li3", "gender": "M;M;", "homepage": ";https://lars-group.github.io/;", "dblp": "187/3029;158/1014;93/2334-8", "google_scholar": "7y9YCqwAAAAJ;https://scholar.google.com/schhp?hl=en;", "orcid": "0000-0002-5209-7540;;", "linkedin": ";;", "or_profile": "~Hongzhi_Shi1;~quanming_yao1;~Yong_Li3", "aff": "Tsinghua University;4Paradigm Inc.;", "aff_domain": "tsinghua.edu.cn;4paradigm.com;", "position": "PhD student;Senior Scientist;", "bibtex": "@misc{\nshi2021differentiable,\ntitle={Differentiable Learning of Graph-like Logical Rules from Knowledge Graphs},\nauthor={Hongzhi Shi and quanming yao and Yong Li},\nyear={2021},\nurl={https://openreview.net/forum?id=AwPGPgExiYA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=AwPGPgExiYA", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "4;3;4;2", "wc_review": "686;611;314;261", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1480;626;523;537", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 468.0, 183.39711011899834 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 791.5, 399.46370298188543 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.674199862463242, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:wK4nINgUkwsJ:scholar.google.com/&scioq=Differentiable+Learning+of+Graph-like+Logical+Rules+from+Knowledge+Graphs&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Tsinghua University;4Paradigm", "aff_unique_dep": ";", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.4paradigm.com/", "aff_unique_abbr": "THU;4Paradigm", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "B4SHgqe1kvX", "title": "Accurate Word Representations with Universal Visual Guidance", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Word representation is a fundamental component in neural language understanding models. Recently, pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally give more accurate contextualized word representations than non-contextualized models do, they are still subject to a sequence of text contexts without diverse hints for word representation from multimodality. This paper thus proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. In detail, we build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. The texts and paired images are encoded in parallel, followed by an attention layer to integrate the multimodal representations. We show that the method substantially improves the accuracy of disambiguation. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.", "keywords": "Natural Language Processing;Visual Representation;Multimodal Language Representation;Natural Language Understanding", "primary_area": "", "supplementary_material": "", "author": "Haojie Yu;Zhuosheng Zhang;hai zhao;Rui Wang;Masao Utiyama", "authorids": "hudiefeiafei@sjtu.edu.cn;~Zhuosheng_Zhang1;~hai_zhao1;~Rui_Wang10;~Masao_Utiyama2", "gender": ";M;M;;M", "homepage": ";https://bcmi.sjtu.edu.cn/~zhangzs/;http://bcmi.sjtu.edu.cn/~zhaohai/;;http://www2.nict.go.jp/astrec-att/member/mutiyama/", "dblp": ";06/9708;25/1145-1.html;;76/5745.html", "google_scholar": ";https://scholar.google.co.jp/citations?user=63LTQhgAAAAJ;https://scholar.google.com.tw/citations?user=4dU5KS0AAAAJ;;artIO6gAAAAJ", "orcid": ";0000-0002-4183-3645;;;", "linkedin": ";;;;", "or_profile": "hudiefeiafei@sjtu.edu.cn;~Zhuosheng_Zhang1;~hai_zhao1;~Rui_Wang10;~Masao_Utiyama2", "aff": ";Shanghai Jiaotong University;Shanghai Jiaotong University;;National Institute of Information and Communications Technology (NICT), National Institute of Advanced Industrial Science and Technology", "aff_domain": ";sjtu.edu.cn;sjtu.edu.cn;;nict.go.jp", "position": ";PhD student;Full Professor;;Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=B4SHgqe1kvX", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "4;4;5;4", "wc_review": "449;674;1246;1141", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "390;346;573;452", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 877.5, 327.9668428362843 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 440.25, 85.39430601626785 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:mOB_HFqIVosJ:scholar.google.com/&scioq=Accurate+Word+Representations+with+Universal+Visual+Guidance&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Shanghai Jiao Tong University;National Institute of Information and Communications Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.sjtu.edu.cn;https://www.nict.go.jp/", "aff_unique_abbr": "SJTU;NICT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "China;Japan" }, { "title": "Nonseparable Symplectic Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3136", "id": "B5VvQrI49Pa", "poster": "", "openreview": "https://openreview.net/forum?id=B5VvQrI49Pa", "slides": "https://iclr.cc/virtual/2021/poster/3136", "video": "https://iclr.cc/virtual/2021/poster/3136", "author_site": "Shiying Xiong, Yunjin Tong, Xingzhe He, Shuqi Yang, Cheng Yang, Bo Zhu", "tldr": "", "abstract": "Predicting the behaviors of Hamiltonian systems has been drawing increasing attention in scientific machine learning. However, the vast majority of the literature was focused on predicting separable Hamiltonian systems with their kinematic and potential energy terms being explicitly decoupled, while building data-driven paradigms to predict nonseparable Hamiltonian systems that are ubiquitous in fluid dynamics and quantum mechanics were rarely explored. The main computational challenge lies in the effective embedding of symplectic priors to describe the inherently coupled evolution of position and momentum, which typically exhibits intricate dynamics. To solve the problem, we propose a novel neural network architecture, Nonseparable Symplectic Neural Networks (NSSNNs), to uncover and embed the symplectic structure of a nonseparable Hamiltonian system from limited observation data. The enabling mechanics of our approach is an augmented symplectic time integrator to decouple the position and momentum energy terms and facilitate their evolution. We demonstrated the efficacy and versatility of our method by predicting a wide range of Hamiltonian systems, both separable and nonseparable, including chaotic vortical flows. We showed the unique computational merits of our approach to yield long-term, accurate, and robust predictions for large-scale Hamiltonian systems by rigorously enforcing symplectomorphism.", "keywords": "Data-driven modeling;nonseparable Hailtonian system;symplectic networks", "primary_area": "", "supplementary_material": "/attachment/8857e32eb7b29ff14b92fb95f1e7f9ba3c5b4260.zip", "author": "Shiying Xiong;Yunjin Tong;Xingzhe He;Shuqi Yang;Cheng Yang;Bo Zhu", "authorids": "~Shiying_Xiong1;yunjin.tong.22@dartmouth.edu;~Xingzhe_He1;~Shuqi_Yang2;yangcheng.iron@bytedance.com;~Bo_Zhu2", "gender": "M;;M;F;;M", "homepage": ";;https://xingzhehe.github.io/;https://y-sq.github.io/;;https://faculty.cc.gatech.edu/~bozhu/", "dblp": ";;258/0493;243/3873;;", "google_scholar": "https://scholar.google.com.hk/citations?user=eq5bc5oAAAAJ;;25tDZpwAAAAJ;l-F-0cEAAAAJ;;atNjbs0AAAAJ", "orcid": "0000-0002-0468-4249;;;;;", "linkedin": ";;;shuqiyang1998/;;", "or_profile": "~Shiying_Xiong1;yunjin.tong.22@dartmouth.edu;~Xingzhe_He1;~Shuqi_Yang2;yangcheng.iron@bytedance.com;~Bo_Zhu2", "aff": "Dartmouth College;;University of British Columbia;Dartmouth College;;Dartmouth College", "aff_domain": "dartmouth.edu;;cs.ubc.ca;dartmouth.edu;;dartmouth.edu", "position": "Postdoc;;PhD student;MS student;;Assistant Professor", "bibtex": "@inproceedings{\nxiong2021nonseparable,\ntitle={Nonseparable Symplectic Neural Networks},\nauthor={Shiying Xiong and Yunjin Tong and Xingzhe He and Shuqi Yang and Cheng Yang and Bo Zhu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=B5VvQrI49Pa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;3;3;3", "wc_review": "1069;497;413;378", "wc_reply_reviewers": "250;0;0;0", "wc_reply_authors": "296;85;105;128", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 589.25, 280.3394148171106 ], "wc_reply_reviewers_avg": [ 62.5, 108.25317547305482 ], "wc_reply_authors_avg": [ 153.5, 83.66749667582985 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 42, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11638626394982085887&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=B5VvQrI49Pa", "email": "dartmouth.edu;;cs.ubc.ca;dartmouth.edu;;dartmouth.edu", "author_num": 6, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Dartmouth College;University of British Columbia", "aff_unique_dep": ";", "aff_unique_url": "https://www.dartmouth.edu;https://www.ubc.ca", "aff_unique_abbr": "Dartmouth;UBC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "United States;Canada" }, { "id": "B5bZp0m7jZd", "title": "Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities", "track": "main", "status": "Reject", "tldr": "", "abstract": "The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help. In this work, we revisit prioritized ER and, in an ideal setting, show equivalence to minimizing cubic loss, providing theoretical insight into why it improves upon uniform sampling. This theoretical equivalence highlights two limitations of current prioritized experience replay methods: insufficient coverage of the sample space and outdated priorities of training samples. This motivates our model-based approach, which does not suffer from these limitations. Our key idea is to actively search for high priority states using gradient ascent. Under certain conditions, we prove that the hypothetical experiences generated from these states are sampled proportionally to approximately true priorities. We also characterize the distance between the sampling distribution of our method and the true prioritized sampling distribution. Our experiments on both benchmark and application-oriented domains show that our approach achieves superior performance over baselines. ", "keywords": "Experience replay;prioritized sampling;model-based reinforcement learning;Dyna architecture", "primary_area": "", "supplementary_material": "", "author": "Jincheng Mei;Yangchen Pan;Martha White;Amir-massoud Farahmand;Hengshuai Yao", "authorids": "~Jincheng_Mei1;~Yangchen_Pan2;~Martha_White1;~Amir-massoud_Farahmand1;~Hengshuai_Yao2", "gender": "M;M;F;M;M", "homepage": "https://jinchengmei.github.io;https://yannickycpan.github.io/yangchenpan/;http://marthawhite.ca;http://academic.sologen.net/;https://hengshuaiyao.github.io/", "dblp": "149/1408;183/0925;60/7057;17/671;25/4960", "google_scholar": ";4M4pOp4AAAAJ;t5zdD_IAAAAJ;https://scholar.google.ca/citations?user=G5SAV7gAAAAJ;R_wcnUgAAAAJ", "orcid": ";;0000-0002-5356-2950;;", "linkedin": ";;;amir-massoud-farahmand/;", "or_profile": "~Jincheng_Mei1;~Yangchen_Pan2;~Martha_White1;~Amir-massoud_Farahmand1;~hengshuai_yao1", "aff": "University of Alberta;University of Alberta;University of Alberta;Vector Institute;Huawei Technologies Ltd.", "aff_domain": "ualberta.ca;ualberta.ca;ualberta.ca;vectorinstitute.ai;huawei.com", "position": "PhD student;PhD student;Associate Professor;Faculty Member;Principal Researcher", "bibtex": "@misc{\nmei2021beyond,\ntitle={Beyond Prioritized Replay: Sampling States in Model-Based {\\{}RL{\\}} via Simulated Priorities},\nauthor={Jincheng Mei and Yangchen Pan and Martha White and Amir-massoud Farahmand and Hengshuai Yao},\nyear={2021},\nurl={https://openreview.net/forum?id=B5bZp0m7jZd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=B5bZp0m7jZd", "pdf_size": 0, "rating": "4;5;6", "confidence": "3;3;2", "wc_review": "803;482;487", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "485;513;356", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 2.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 590.6666666666666, 150.1562149525916 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 451.3333333333333, 68.37315912614314 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4045581742204449565&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;1;2", "aff_unique_norm": "University of Alberta;Vector Institute;Huawei", "aff_unique_dep": ";;Huawei Technologies", "aff_unique_url": "https://www.ualberta.ca;https://vectorinstitute.ai/;https://www.huawei.com", "aff_unique_abbr": "UAlberta;Vector Institute;Huawei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;1", "aff_country_unique": "Canada;China" }, { "title": "Federated Learning Based on Dynamic Regularization", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2748", "id": "B7v4QMR6Z9w", "poster": "", "openreview": "https://openreview.net/forum?id=B7v4QMR6Z9w", "slides": "https://iclr.cc/virtual/2021/poster/2748", "video": "https://iclr.cc/virtual/2021/poster/2748", "author_site": "Durmus Alp Emre Acar, Yue Zhao, Ramon Matas, Matthew Mattina, Paul Whatmough, Venkatesh Saligrama", "tldr": "", "abstract": "We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round. We view Federated Learning problem primarily from a communication perspective and allow more device level computations to save transmission costs. We point out a fundamental dilemma, in that the minima of the local-device level empirical loss are inconsistent with those of the global empirical loss. Different from recent prior works, that either attempt inexact minimization or utilize devices for parallelizing gradient computation, we propose a dynamic regularizer for each device at each round, so that in the limit the global and device solutions are aligned. We demonstrate both through empirical results on real and synthetic data as well as analytical results that our scheme leads to efficient training, in both convex and non-convex settings, while being fully agnostic to device heterogeneity and robust to large number of devices, partial participation and unbalanced data.", "keywords": "Federated Learning;Deep Neural Networks;Distributed Optimization", "primary_area": "", "supplementary_material": "", "author": "Durmus Alp Emre Acar;Yue Zhao;Ramon Matas;Matthew Mattina;Paul Whatmough;Venkatesh Saligrama", "authorids": "~Durmus_Alp_Emre_Acar1;~Yue_Zhao12;~Ramon_Matas1;~Matthew_Mattina1;~Paul_Whatmough1;~Venkatesh_Saligrama1", "gender": ";M;M;;M;", "homepage": ";https://www.linkedin.com/in/yzhao-washu/;;;;https://venkatesh-saligrama.github.io/", "dblp": ";;;;87/9432;67/4721", "google_scholar": "https://scholar.google.com/citations?hl=en;nfILaSYAAAAJ;;;hu3x-LoAAAAJ;S4z3uzMAAAAJ", "orcid": ";;;;;0000-0002-0675-2268", "linkedin": ";;ramon-matas-2585658/;matthewmattina;paul-whatmough-2062729/;venkatesh-saligrama-91175a16/", "or_profile": "~Durmus_Alp_Emre_Acar1;~Yue_Zhao12;~Ramon_Matas1;~Matthew_Mattina1;~Paul_Whatmough1;~Venkatesh_Saligrama1", "aff": "Boston University;;Arm Ltd;arm;Arm Inc;Boston University", "aff_domain": "bu.edu;;arm.com;arm.com;arm.com;bu.edu", "position": "PhD student;;Researcher;Senior Director;Senior Principal Research Engineer;Full Professor", "bibtex": "@inproceedings{\nacar2021federated,\ntitle={Federated Learning Based on Dynamic Regularization},\nauthor={Durmus Alp Emre Acar and Yue Zhao and Ramon Matas and Matthew Mattina and Paul Whatmough and Venkatesh Saligrama},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=B7v4QMR6Z9w}\n}", "github": "[![github](/images/github_icon.svg) alpemreacar/FedDyn](https://github.com/alpemreacar/FedDyn) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=B7v4QMR6Z9w)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "7;7;7;8", "confidence": "5;3;3;4", "wc_review": "926;314;254;572", "wc_reply_reviewers": "118;0;0;0", "wc_reply_authors": "1540;532;835;1433", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;2;3", "rating_avg": [ 7.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 516.5, 264.89762173337834 ], "wc_reply_reviewers_avg": [ 29.5, 51.09549882328188 ], "wc_reply_authors_avg": [ 1085.0, 417.26430472783073 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 1035, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10329355946947839611&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=B7v4QMR6Z9w", "email": "bu.edu;;arm.com;arm.com;arm.com;bu.edu", "author_num": 6, "aff_unique_index": "0;1;1;1;0", "aff_unique_norm": "Boston University;Arm Limited", "aff_unique_dep": ";", "aff_unique_url": "https://www.bu.edu;https://www.arm.com", "aff_unique_abbr": "BU;Arm", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;1;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "B8fp0LVMHa", "title": "EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL", "track": "main", "status": "Reject", "tldr": "", "abstract": "Off-policy reinforcement learning (RL) holds the promise of sample-efficient learning of decision-making policies by leveraging past experience. However, in the offline RL setting -- where a fixed collection of interactions are provided and no further interactions are allowed -- it has been shown that standard off-policy RL methods can significantly underperform. Recently proposed methods often aim to address this shortcoming by constraining learned policies to remain close to the given dataset of interactions. In this work, we closely investigate an important simplification of BCQ~\\citep{fujimoto2018off} -- a prior approach for offline RL -- which removes a heuristic design choice and naturally restrict extracted policies to remain \\emph{exactly} within the support of a given behavior policy. Importantly, in contrast to their original theoretical considerations, we derive this simplified algorithm through the introduction of a novel backup operator, Expected-Max Q-Learning (EMaQ), which is more closely related to the resulting practical algorithm. Specifically, in addition to the distribution support, EMaQ explicitly considers the number of samples and the proposal distribution, allowing us to derive new sub-optimality bounds which can serve as a novel measure of complexity for offline RL problems. In the offline RL setting -- the main focus of this work -- EMaQ matches and outperforms prior state-of-the-art in the D4RL benchmarks~\\citep{fu2020d4rl}. In the online RL setting, we demonstrate that EMaQ is competitive with Soft Actor Critic (SAC). The key contributions of our empirical findings are demonstrating the importance of careful generative model design for estimating behavior policies, and an intuitive notion of complexity for offline RL problems. With its simple interpretation and fewer moving parts, such as no explicit function approximator representing the policy, EMaQ serves as a strong yet easy to implement baseline for future work.", "keywords": "Offline Reinforcement Learning;Off-Policy Reinforcement Learning", "primary_area": "", "supplementary_material": "/attachment/c63af23cd8ac2daaf1071fde77b6fedbf94ff694.zip", "author": "Seyed Kamyar Seyed Ghasemipour;Dale Schuurmans;Shixiang Gu", "authorids": "~Seyed_Kamyar_Seyed_Ghasemipour1;~Dale_Schuurmans1;~Shixiang_Gu1", "gender": "M;;M", "homepage": "http://www.cs.utoronto.ca/~kamyar/;;https://sites.google.com/view/gugurus/home", "dblp": "238/2555;;121/0550", "google_scholar": "LHvso9QAAAAJ;;B8wslVsAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Seyed_Kamyar_Seyed_Ghasemipour1;~Dale_Schuurmans1;~Shixiang_Gu1", "aff": "Google DeepMind Robotics;;Google", "aff_domain": "google.com;;google.com", "position": "Student Researcher;;Senior Research Scientist", "bibtex": "@misc{\nghasemipour2021emaq,\ntitle={{\\{}EM{\\}}aQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online {\\{}RL{\\}}},\nauthor={Seyed Kamyar Seyed Ghasemipour and Dale Schuurmans and Shixiang Gu},\nyear={2021},\nurl={https://openreview.net/forum?id=B8fp0LVMHa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer5", "site": "https://openreview.net/forum?id=B8fp0LVMHa", "pdf_size": 0, "rating": "4;6;6;6", "confidence": "4;4;3;3", "wc_review": "616;809;290;1011", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "869;355;465;860", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;2", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 681.5, 265.7014301805694 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 637.25, 230.57577387921742 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 143, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5481875536186180603&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0", "aff_unique_norm": "Google", "aff_unique_dep": "DeepMind Robotics", "aff_unique_url": "https://deepmind.com", "aff_unique_abbr": "DeepMind", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;United States" }, { "id": "B9nDuDeanHK", "title": "Weights Having Stable Signs Are Important: Finding Primary Subnetworks and Kernels to Compress Binary Weight Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Binary Weight Networks (BWNs) have significantly lower computational and memory costs compared to their full-precision counterparts. To address the non-differentiable issue of BWNs, existing methods usually use the Straight-Through-Estimator (STE). In the optimization, they learn optimal binary weight outputs represented as a combination of scaling factors and weight signs to approximate 32-bit floating-point weight values, usually with a layer-wise quantization scheme. In this paper, we begin with an empirical study of training BWNs with STE under the settings of using common techniques and tricks. We show that in the context of using batch normalization after convolutional layers, adapting scaling factors with either hand-crafted or learnable methods brings marginal or no accuracy gain to final model, while the change of weight signs is crucial in the training of BWNs. Furthermore, we observe two astonishing training phenomena. Firstly, the training of BWNs demonstrates the process of seeking primary binary sub-networks whose weight signs are determined and fixed at the early training stage, which is akin to recent findings on the lottery ticket hypothesis for efficient learning of sparse neural networks. Secondly, we find binary kernels in the convolutional layers of final models tend to be centered on a limited number of the most frequent binary kernels, showing binary weight networks may has the potential to be further compressed, which breaks the common wisdom that representing each weight with a single bit puts the quantization to the extreme compression. To testify this hypothesis, we additionally propose a binary kernel quantization method, and we call resulting models Quantized Binary-Kernel Networks (QBNs). We hope these new experimental observations would shed new design insights to improve the training and broaden the usages of BWNs.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Zhaole Sun;Anbang Yao", "authorids": "~Zhaole_Sun1;~Anbang_Yao1", "gender": "M;", "homepage": "https://sunzhaole.github.io/;https://yaoanbang.github.io/", "dblp": ";http://dblp.uni-trier.de/pers/hd/y/Yao:Anbang", "google_scholar": "onTsdhYAAAAJ;b9hCmPYAAAAJ", "orcid": ";0000-0002-3878-8679", "linkedin": ";anbang-yao-1805b712a/", "or_profile": "~Zhaole_Sun1;~Anbang_Yao1", "aff": "School of Informatics, University of Edinburgh;Intel", "aff_domain": "ed.ac.uk;intel.com", "position": "PhD student;Principal Researcher", "bibtex": "@misc{\nsun2021weights,\ntitle={Weights Having Stable Signs Are Important: Finding Primary Subnetworks and Kernels to Compress Binary Weight Networks},\nauthor={Zhaole Sun and Anbang Yao},\nyear={2021},\nurl={https://openreview.net/forum?id=B9nDuDeanHK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=B9nDuDeanHK", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "5;2;4;4", "wc_review": "609;314;433;340", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1501;617;618;391", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 424.0, 115.60925568482828 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 781.75, 425.43000305573185 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.4736842105263159, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:nz0etX0NHrsJ:scholar.google.com/&scioq=Weights+Having+Stable+Signs+Are+Important:+Finding+Primary+Subnetworks+and+Kernels+to+Compress+Binary+Weight+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Edinburgh;Intel", "aff_unique_dep": "School of Informatics;Intel Corporation", "aff_unique_url": "https://www.ed.ac.uk;https://www.intel.com", "aff_unique_abbr": "Edinburgh;Intel", "aff_campus_unique_index": "0", "aff_campus_unique": "Edinburgh;", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;United States" }, { "id": "B9t708KMr9d", "title": "Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph neural network (GNN) and label propagation algorithm (LPA) are both message passing algorithms, which have achieved superior performance in semi-supervised classification. GNN performs \\emph{feature propagation} by a neural network to make predictions, while LPA uses \\emph{label propagation} across graph adjacency matrix to get results. However, there is still no good way to combine these two kinds of algorithms. In this paper, we proposed a new {\\bf Uni}fied {\\bf M}essage {\\bf P}assaging Model (UniMP) that can incorporate \\emph{feature propagation} and \\emph{label propagation} with a shared message passing network, providing a better performance in semi-supervised classification. First, we adopt a Graph Transformer jointly label embedding to propagate both the feature and label information. Second, to train UniMP without overfitting in self-loop label information, we propose a masked label prediction strategy, in which some percentage of training labels are simply masked at random, and then predicted. UniMP conceptually unifies feature propagation and label propagation and be empirically powerful. It obtains new state-of-the-art semi-supervised classification results in Open Graph Benchmark (OGB). ", "keywords": "Unified Message Passing Model;Graph Neural Network;Label Propagation Algorithm;Semi-Supervised Classification.", "primary_area": "", "supplementary_material": "", "author": "Yunsheng Shi;Zhengjie Huang;shikun feng;Hui Zhong;Wenjin Wang;Yu Sun", "authorids": "shiyunsheng01@baidu.com;huangzhengjie@baidu.com;~shikun_feng1;zhonghui03@baidu.com;wangwenjin02@baidu.com;sunyu02@baidu.com", "gender": ";;M;;;", "homepage": ";;;;;", "dblp": ";;26/7906;;;", "google_scholar": ";;u9CYmnAAAAAJ;;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "shiyunsheng01@baidu.com;huangzhengjie@baidu.com;~shikun_feng1;zhonghui03@baidu.com;wangwenjin02@baidu.com;sunyu02@baidu.com", "aff": ";;Baidu;;;", "aff_domain": ";;baidu.com;;;", "position": ";;Principal Architect;;;", "bibtex": "@misc{\nshi2021masked,\ntitle={Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification},\nauthor={Yunsheng Shi and Zhengjie Huang and shikun feng and Hui Zhong and Wenjin Wang and Yu Sun},\nyear={2021},\nurl={https://openreview.net/forum?id=B9t708KMr9d}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=B9t708KMr9d", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;4;4;4", "wc_review": "365;199;217;299", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "723;188;610;76", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 270.0, 66.55073252789934 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 399.25, 273.105634324889 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 988, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7324136299367753679&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "Baidu", "aff_unique_dep": "Baidu, Inc.", "aff_unique_url": "https://www.baidu.com", "aff_unique_abbr": "Baidu", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "BEs-Q1ggdwT", "title": "Policy Gradient with Expected Quadratic Utility Maximization: A New Mean-Variance Approach in Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "In real-world decision-making problems, risk management is critical. Among various risk management approaches, the mean-variance criterion is one of the most widely used in practice. In this paper, we suggest expected quadratic utility maximization (EQUM) as a new framework for policy gradient style reinforcement learning (RL) algorithms with mean-variance control. The quadratic utility function is a common objective of risk management in finance and economics. The proposed EQUM framework has several interpretations, such as reward-constrained variance minimization and regularization, as well as agent utility maximization. In addition, the computation of the EQUM framework is easier than that of existing mean-variance RL methods, which require double sampling. In experiments, we demonstrate the effectiveness of the proposed framework in benchmark setting of RL and financial data.", "keywords": "mean-variance reinforcement learning;finance", "primary_area": "", "supplementary_material": "/attachment/bd2a285b390cb048625eeb03c61ecbfe791e6187.zip", "author": "Masahiro Kato;Kei Nakagawa", "authorids": "~Masahiro_Kato1;kei.nak.0315@gmail.com", "gender": "M;", "homepage": "https://masakat0.github.io/;", "dblp": ";", "google_scholar": "https://scholar.google.co.jp/schhp?hl=ja;", "orcid": ";", "linkedin": ";", "or_profile": "~Masahiro_Kato1;kei.nak.0315@gmail.com", "aff": "Cyberagent;", "aff_domain": "cyberagent.co.jp;", "position": "Researcher;", "bibtex": "@misc{\nkato2021policy,\ntitle={Policy Gradient with Expected Quadratic Utility Maximization: A New Mean-Variance Approach in Reinforcement Learning},\nauthor={Masahiro Kato and Kei Nakagawa},\nyear={2021},\nurl={https://openreview.net/forum?id=BEs-Q1ggdwT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=BEs-Q1ggdwT", "pdf_size": 0, "rating": "4;5;6", "confidence": "5;4;4", "wc_review": "922;578;339", "wc_reply_reviewers": "296;0;0", "wc_reply_authors": "1497;698;659", "reply_reviewers": "2;0;0", "reply_authors": "3;1;1", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 613.0, 239.29201128885742 ], "wc_reply_reviewers_avg": [ 98.66666666666667, 139.53573815414538 ], "wc_reply_authors_avg": [ 951.3333333333334, 386.1729606737835 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:A-h3ENjYR8oJ:scholar.google.com/&scioq=Policy+Gradient+with+Expected+Quadratic+Utility+Maximization:+A+New+Mean-Variance+Approach+in+Reinforcement+Learning&hl=en&as_sdt=0,33", "gs_version_total": 2, "aff_unique_index": "0", "aff_unique_norm": "CyberAgent Inc.", "aff_unique_dep": "", "aff_unique_url": "https://www.cyberagent.co.jp", "aff_unique_abbr": "CyberAgent", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "id": "BG9hcZ-wIEq", "title": "Detecting Adversarial Examples by Additional Evidence from Noise Domain", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Deep neural networks are widely adopted powerful tools for perceptual tasks. However, recent research indicated that they are easily fooled by adversarial examples, which are produced by adding imperceptible adversarial perturbations to clean examples. In this paper, we utilize the steganalysis rich model (SRM) to generate noise feature maps, and combine them with RGB images to discover the difference between adversarial examples and clean examples. In particular, we propose a two-stream pseudo-siamese network and train it end-to-end to detect adversarial examples. Our approach fuses the subtle difference in RGB images with the noise inconsistency in noise features. The proposed method has strong detection capability and transferability, and can be combined with any classifier without modifying its architecture or training procedure. Our extensive empirical experiments show that, compared with the state-of-the-art detection methods, the proposed method achieves excellent performance in distinguishing adversarial samples generated by popular attack methods on different real datasets. Moreover, our method has good generalization, it trained by a specific adversary can generalize to other adversaries.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/d95d7b5eb2e81008611e748dac0ef2115b1c6e49.zip", "author": "Song Gao;Shui Yu;Shaowen Yao", "authorids": "~Song_Gao2;~Shui_Yu1;yaosw@ynu.edu.cn", "gender": "M;M;", "homepage": "https://github.com/Gaoyitu/;;", "dblp": ";90/3575-1.html;", "google_scholar": ";_WbktxMAAAAJ;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Song_Gao2;~Shui_Yu1;yaosw@ynu.edu.cn", "aff": "Yunnan University;University of Technology Sydney;", "aff_domain": "ynu.edu.cn;uts.edu.au;", "position": "PhD student;Full Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=BG9hcZ-wIEq", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "4;4;4;4", "wc_review": "328;430;795;299", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 463.0, 197.75869133871208 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17185295605057346386&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1", "aff_unique_norm": "Yunnan University;University of Technology Sydney", "aff_unique_dep": ";", "aff_unique_url": "http://www.ynu.edu.cn;https://www.uts.edu.au", "aff_unique_abbr": "YNU;UTS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "China;Australia" }, { "id": "BHBb-QVVkNS", "title": "Efficiently labelling sequences using semi-supervised active learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "In natural language processing, deep learning methods are popular for sequence labelling tasks but training them usually requires large amounts of labelled data. Active learning can reduce the amount of labelled training data required by iteratively acquiring labels for the data points a model is most uncertain about. However, active learning methods usually use supervised training and ignore the data points which have not yet been labelled. We propose an approach to sequence labelling using active learning which incorporates both labelled and unlabelled data. We train a locally-contextual conditional random field with deep nonlinear potentials in a semi-supervised manner, treating the missing labels of the unlabelled sentences as latent variables. Our semi-supervised active learning method is able to leverage the sentences which have not yet been labelled to improve on the performance of purely supervised active learning. We also find that using an additional, larger pool of unlabelled data provides further improvements. Across a variety of sequence labelling tasks, our method is consistently able to match 97% of the performance of state of the art models while using less than 30% of the amount of training data.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Harshil Shah;David Barber", "authorids": "~Harshil_Shah1;~David_Barber1", "gender": ";M", "homepage": ";http://www.cs.ucl.ac.uk/staff/D.Barber/", "dblp": ";", "google_scholar": ";https://scholar.google.com.tw/citations?user=Nej1FcgAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Harshil_Shah1;~David_Barber1", "aff": ";University College London", "aff_domain": ";", "position": ";Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=BHBb-QVVkNS", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;4;3;4", "wc_review": "463;383;161;531", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 384.5, 139.26503509495842 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:WiX8aarVhrsJ:scholar.google.com/&scioq=Efficiently+labelling+sequences+using+semi-supervised+active+learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University College London", "aff_unique_dep": "", "aff_unique_url": "https://www.ucl.ac.uk", "aff_unique_abbr": "UCL", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "BIIwfP55pp", "title": "PERIL: Probabilistic Embeddings for hybrid Meta-Reinforcement and Imitation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Imitation learning is a natural way for a human to describe a task to an agent, and it can be combined with reinforcement learning to enable the agent to solve that task through exploration. However, traditional methods which combine imitation learning and reinforcement learning require a very large amount of interaction data to learn each new task, even when bootstrapping from a demonstration. One solution to this is to use meta reinforcement learning (meta-RL) to enable an agent to quickly adapt to new tasks at test time. In this work, we introduce a new method to combine imitation learning with meta reinforcement learning, Probabilistic Embeddings for hybrid meta-Reinforcement and Imitation Learning (PERIL). Dual inference strategies allow PERIL to precondition exploration policies on demonstrations, which greatly improves adaptation rates in unseen tasks. In contrast to pure imitation learning, our approach is capable of exploring beyond the demonstration, making it robust to task alterations and uncertainties. By exploiting the flexibility of meta-RL, we show how PERIL is capable of interpolating from within previously learnt dynamics to adapt to unseen tasks, as well as unseen task families, within a set of meta-RL benchmarks under sparse rewards.", "keywords": "Meta-learning;Imitation Learning;Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Alvaro Prat;Edward Johns", "authorids": "~Alvaro_Prat1;~Edward_Johns1", "gender": "M;M", "homepage": "https://github.com/alvaroprat97;https://www.robot-learning.uk", "dblp": ";68/9968", "google_scholar": "6fk2fwMAAAAJ;https://scholar.google.co.uk/citations?user=sMIUkiQAAAAJ", "orcid": ";0000-0002-8914-8786", "linkedin": "alvaro-prat-34839b132/;https://uk.linkedin.com/in/edward-johns-1b24845a", "or_profile": "~Alvaro_Prat1;~Edward_Johns1", "aff": ";Imperial College London", "aff_domain": ";imperial.ac.uk", "position": ";Assistant Professor", "bibtex": "@misc{\nprat2021peril,\ntitle={{\\{}PERIL{\\}}: Probabilistic Embeddings for hybrid Meta-Reinforcement and Imitation Learning},\nauthor={Alvaro Prat and Edward Johns},\nyear={2021},\nurl={https://openreview.net/forum?id=BIIwfP55pp}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=BIIwfP55pp", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "5;4;4;3", "wc_review": "1651;1056;960;1819", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "550;807;844;153", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 1371.5, 369.8813999108363 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 588.5, 275.7557796311802 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3952019151654348429&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Imperial College London", "aff_unique_dep": "", "aff_unique_url": "https://www.imperial.ac.uk", "aff_unique_abbr": "ICL", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "BIwkgTsSp_8", "title": "Learning to Noise: Application-Agnostic Data Sharing with Local Differential Privacy", "track": "main", "status": "Reject", "tldr": "", "abstract": "In recent years, the collection and sharing of individuals\u2019 private data has become commonplace in many industries. Local differential privacy (LDP) is a rigorous approach which uses a randomized algorithm to preserve privacy even from the database administrator, unlike the more standard central differential privacy. For LDP, when applying noise directly to high-dimensional data, the level of noise required all but entirely destroys data utility. In this paper we introduce a novel, application-agnostic privatization mechanism that leverages representation learning to overcome the prohibitive noise requirements of direct methods, while maintaining the strict guarantees of LDP. We further demonstrate that data privatized with this mechanism can be used to train machine learning algorithms. Applications of this model include private data collection, private novel-class classification, and the augmentation of clean datasets with additional privatized features. We achieve significant gains in performance on downstream classification tasks relative to benchmarks that noise the data directly, which are state-of-the-art in the context of application-agnostic LDP mechanisms for high-dimensional data sharing tasks.", "keywords": "Differential Privacy;Representation Learning;Variational Inference;Generative Modelling", "primary_area": "", "supplementary_material": "/attachment/7db94d5e6c3609e457bb466b69da6b0df29110cb.zip", "author": "Alex Mansbridge;Gregory Barbour;Davide Piras;Christopher Frye;Ilya Feige;David Barber", "authorids": "~Alex_Mansbridge1;~Gregory_Barbour1;~Davide_Piras1;~Christopher_Frye1;~Ilya_Feige1;~David_Barber2", "gender": "M;M;M;;;M", "homepage": ";;https://dpiras.github.io/;;;http://www.cs.ucl.ac.uk/staff/D.Barber/", "dblp": ";;;;222/3226;", "google_scholar": ";;nAvm-xYAAAAJ;;;https://scholar.google.com.tw/citations?user=Nej1FcgAAAAJ", "orcid": "0000-0001-7557-9668;0000-0002-6114-7423;0000-0002-9836-2661;;;", "linkedin": ";;davide-piras-412949b4/;christopher-frye/;;", "or_profile": "~Alex_Mansbridge1;~Gregory_Barbour1;~Davide_Piras1;~Christopher_Frye1;~Ilya_Feige1;~David_Barber1", "aff": "University College London;University College London;University College London;Faculty;University College London;University College London", "aff_domain": "ucl.ac.uk;ucl.ac.uk;ucl.ac.uk;faculty.ai;ucl.ac.uk;", "position": "PhD student;PhD student;PhD student;Head of R&D;Postdoc;Full Professor", "bibtex": "@misc{\nmansbridge2021learning,\ntitle={Learning to Noise: Application-Agnostic Data Sharing with Local Differential Privacy},\nauthor={Alex Mansbridge and Gregory Barbour and Davide Piras and Christopher Frye and Ilya Feige and David Barber},\nyear={2021},\nurl={https://openreview.net/forum?id=BIwkgTsSp_8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=BIwkgTsSp_8", "pdf_size": 0, "rating": "3;6;6;6", "confidence": "4;3;2;5", "wc_review": "680;412;231;217", "wc_reply_reviewers": "154;129;0;0", "wc_reply_authors": "1166;963;226;303", "reply_reviewers": "1;1;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 385.0, 186.8783026464014 ], "wc_reply_reviewers_avg": [ 70.75, 71.29998246844104 ], "wc_reply_authors_avg": [ 664.5, 407.2987232977781 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.2581988897471611, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5719813925261222342&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "University College London;", "aff_unique_dep": ";Faculty", "aff_unique_url": "https://www.ucl.ac.uk;", "aff_unique_abbr": "UCL;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United Kingdom;" }, { "id": "BKIS2NCUro9", "title": "LATENT OPTIMIZATION VARIATIONAL AUTOENCODER FOR CONDITIONAL MOLECULAR GENERATION", "track": "main", "status": "Reject", "tldr": "", "abstract": "Variational autoencoder (VAE) is a generation algorithm, consisting of an encoder and a decoder, and the latent variable from the encoder is used as the input of the decoder. \nVAE is widely used for image, audio and text generation tasks. In general, the training of VAE is at risk of posterior collapsing especially for long sequential data. To alleviate this, modified evidence lower bounds (ELBOs) were proposed. However, these approaches heuristically control training loss using a hyper-parameter, and it is not way to solve the fundamental problem of vanilla VAE.\nIn this paper, we propose a method to insert an optimization step of the latent variable and alternately update the encoder and decoder of conditional VAE for maximizing ELBOs. \nIn experiments, we applied the latent optimization VAE (LOVAE) on ZINC database, consisting of string representation of molecules, for the inverse molecular design. \nWe showed that the proposed LOVAE achieves better performance than vanilla VAE in terms of ELBOs and molecular generation performance. In addition, the proposed method showed better performance in property satisfaction and property maximization tasks compared to existing works.", "keywords": "latent optimization;Variational Autoencoder;molecular generation", "primary_area": "", "supplementary_material": "", "author": "Kisoo Kwon;Jung-Hyun Park;Kuhwan Jeong;Sunjae Lee;Hoshik Lee", "authorids": "~Kisoo_Kwon1;~Jung-Hyun_Park1;~Kuhwan_Jeong1;~Sunjae_Lee1;~Hoshik_Lee1", "gender": ";M;M;M;M", "homepage": ";;;https://www.samsung.com/sunjae79.lee;", "dblp": ";;;;", "google_scholar": "https://scholar.google.com/citations?hl=ko;;;;https://scholar.google.co.kr/citations?user=wHuOXJEAAAAJ", "orcid": ";;;;", "linkedin": ";https://www.linkedin.com/mwlite/in/jung-hyun-park-3706561b8;kuhwan-jeong-30a6551b8;;", "or_profile": "~Kisoo_Kwon1;~Jung-Hyun_Park1;~Kuhwan_Jeong1;~Sunjae_Lee1;~Hoshik_Lee1", "aff": "Samsung;Samsung;;;", "aff_domain": "samsung.com;samsung.com;;;", "position": "Principal Researcher;Staff researcher;;;", "bibtex": "@misc{\nkwon2021latent,\ntitle={{\\{}LATENT{\\}} {\\{}OPTIMIZATION{\\}} {\\{}VARIATIONAL{\\}} {\\{}AUTOENCODER{\\}} {\\{}FOR{\\}} {\\{}CONDITIONAL{\\}} {\\{}MOLECULAR{\\}} {\\{}GENERATION{\\}}},\nauthor={Kisoo Kwon and Jung-Hyun Park and Kuhwan Jeong and Sunjae Lee and Hoshik Lee},\nyear={2021},\nurl={https://openreview.net/forum?id=BKIS2NCUro9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=BKIS2NCUro9", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "4;4;2;4", "wc_review": "551;256;180;360", "wc_reply_reviewers": "43;27;0;19", "wc_reply_authors": "639;875;1015;513", "reply_reviewers": "1;1;0;1", "reply_authors": "1;2;2;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 336.75, 139.2253119946226 ], "wc_reply_reviewers_avg": [ 22.25, 15.481844205391036 ], "wc_reply_authors_avg": [ 760.5, 196.1498151923677 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:74cuIeBfhIUJ:scholar.google.com/&scioq=LATENT+OPTIMIZATION+VARIATIONAL+AUTOENCODER+FOR+CONDITIONAL+MOLECULAR+GENERATION&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Samsung", "aff_unique_dep": "Samsung", "aff_unique_url": "https://www.samsung.com", "aff_unique_abbr": "Samsung", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "id": "BL4FZG2bCR7", "title": "RNA Alternative Splicing Prediction with Discrete Compositional Energy Network", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "A single gene can encode for different protein versions through a process called alternative splicing. Since proteins play major roles in cellular functions, aberrant splicing profiles can result in a variety of diseases, including cancers. Alternative splicing is determined by the gene's primary sequence and other regulatory factors such as RNA-binding protein levels. With these as input, we formulate the prediction of RNA splicing as a regression task and build a new training dataset (CAPD) to benchmark learned models. We propose discrete compositional energy network (DCEN) which leverages the compositionality of key components to approach this task. In the case of alternative splicing prediction, DCEN models mRNA transcript probabilities through its constituent splice junctions' energy values. These transcript probabilities are subsequently mapped to relative abundance values of key nucleotides and trained with ground-truth experimental measurements. Through our experiments on CAPD, we show that DCEN outperforms baselines and its ablation variants.", "keywords": "RNA splicing;Computational Biology;RNA", "primary_area": "", "supplementary_material": "", "author": "Alvin Chan;Anna Korsakova;Yew-Soon Ong;Fernaldo Richtia Winnerdy;Kah Wai Lim;Anh Tuan Phan", "authorids": "~Alvin_Chan1;kors0001@e.ntu.edu.sg;~Yew-Soon_Ong1;fernaldo.winnerdy@ntu.edu.sg;kwlim@ntu.edu.sg;phantuan@ntu.edu.sg", "gender": "M;;;;;", "homepage": "https://www.alvinchan.io/;;;;;", "dblp": "163/6518.html;;;;;", "google_scholar": "SP4eIUYAAAAJ;;;;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Alvin_Chan1;kors0001@e.ntu.edu.sg;~Yew-Soon_Ong1;fernaldo.winnerdy@ntu.edu.sg;kwlim@ntu.edu.sg;phantuan@ntu.edu.sg", "aff": "Nanyang Technological University;;;;;", "aff_domain": "ntu.edu.sg;;;;;", "position": "PhD student;;;;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=BL4FZG2bCR7", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "4;5;5;4", "wc_review": "456;424;960;439", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 569.75, 225.59518501067348 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=557878363073009089&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "Nanyang Technological University", "aff_unique_dep": "", "aff_unique_url": "https://www.ntu.edu.sg", "aff_unique_abbr": "NTU", "aff_country_unique_index": "0", "aff_country_unique": "Singapore" }, { "title": "UMEC: Unified model and embedding compression for efficient recommendation systems", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2976", "id": "BM---bH_RSh", "poster": "", "openreview": "https://openreview.net/forum?id=BM---bH_RSh", "slides": "https://iclr.cc/virtual/2021/poster/2976", "video": "https://iclr.cc/virtual/2021/poster/2976", "author_site": "Jiayi Shen, Haotao Wang, Shupeng Gui, Jianchao Tan, Zhangyang Wang, Ji Liu", "tldr": "", "abstract": "The recommendation system (RS) plays an important role in the content recommendation and retrieval scenarios. The core part of the system is the Ranking neural network, which is usually a bottleneck of whole system performance during online inference. In this work, we propose a unified model and embedding compression (UMEC) framework to hammer an efficient neural network-based recommendation system. Our framework jointly learns input feature selection and neural network compression together, and solve them as an end-to-end resource-constrained optimization problem using ADMM. Our method outperforms other baselines in terms of neural network Flops, sparse embedding feature size and the number of sparse embedding features. We evaluate our method on the public benchmark of DLRM, trained over the Kaggle Criteo dataset. The codes can be found at https://github.com/VITA-Group/UMEC.", "keywords": "recommendation system;model compression;ADMM;resource constrained", "primary_area": "", "supplementary_material": "", "author": "Jiayi Shen;Haotao Wang;Shupeng Gui;Jianchao Tan;Zhangyang Wang;Ji Liu", "authorids": "~Jiayi_Shen1;~Haotao_Wang1;~Shupeng_Gui1;~Jianchao_Tan1;~Zhangyang_Wang1;~Ji_Liu1", "gender": ";;M;M;M;M", "homepage": "https://jiayishen.netlify.app/;;;https://jianchaotan.github.io/;https://vita-group.github.io;http://jiliu-ml.org", "dblp": ";236/5090;194/3519;165/9938;119/4026;51/4433-2.html", "google_scholar": ";aMIJhlEAAAAJ;ms9yTZkAAAAJ;1Gywy80AAAAJ;pxFyKAIAAAAJ;RRzVwKkAAAAJ", "orcid": ";;;;;", "linkedin": ";;;jianchao-tan-b58a96a7/;;", "or_profile": "~Jiayi_Shen1;~Haotao_Wang1;~Shupeng_Gui1;~Jianchao_Tan1;~Zhangyang_Wang1;~Ji_Liu1", "aff": "Texas A&M;University of Texas, Austin;Meta;Kuaishou;University of Texas, Austin;", "aff_domain": "tamu.edu;utexas.edu;meta.com;kuaishou.com;utexas.edu;", "position": "PhD student;PhD student;Researcher;Researcher;Assistant Professor;", "bibtex": "@inproceedings{\nshen2021umec,\ntitle={{\\{}UMEC{\\}}: Unified model and embedding compression for efficient recommendation systems},\nauthor={Jiayi Shen and Haotao Wang and Shupeng Gui and Jianchao Tan and Zhangyang Wang and Ji Liu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=BM---bH_RSh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "5;4;3;5", "wc_review": "421;315;327;261", "wc_reply_reviewers": "29;13;0;32", "wc_reply_authors": "390;49;274;319", "reply_reviewers": "1;1;0;1", "reply_authors": "1;1;2;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 331.0, 57.60208329565867 ], "wc_reply_reviewers_avg": [ 18.5, 12.893796958227627 ], "wc_reply_authors_avg": [ 258.0, 127.5558701118847 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17737729637037657087&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=BM---bH_RSh", "email": "tamu.edu;utexas.edu;meta.com;kuaishou.com;utexas.edu;", "author_num": 6, "aff_unique_index": "0;1;2;3;1", "aff_unique_norm": "Texas A&M University;University of Texas at Austin;Meta;Kuaishou Technology", "aff_unique_dep": ";;Meta Platforms, Inc.;", "aff_unique_url": "https://www.tamu.edu;https://www.utexas.edu;https://meta.com;https://www.kuaishou.com", "aff_unique_abbr": "TAMU;UT Austin;Meta;Kuaishou", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Austin", "aff_country_unique_index": "0;0;0;1;0", "aff_country_unique": "United States;China" }, { "id": "BMua55nUyyt", "title": "Median DC for Sign Recovery: Privacy can be Achieved by Deterministic Algorithms", "track": "main", "status": "Reject", "tldr": "", "abstract": "Privacy-preserving data analysis becomes prevailing in recent years. It is a common sense in privacy literature that strict differential privacy can only be obtained by imposing additional randomness in the algorithm. In this paper, we study the problem of private sign recovery for sparse mean estimation and sparse linear regression in a distributed setup. By taking a coordinate-wise median among the reported local sign vectors, which can be referred to as a median divide-and-conquer (Med-DC) approach, we can recover the signs of the true parameter with a provable consistency guarantee. Moreover, without adding any extra randomness to the algorithm, our Med-DC method can protect data privacy with high probability. Simulation studies are conducted to demonstrate the effectiveness of our proposed method.", "keywords": "Median-of-means;divide-and-conquer;privacy;sign recovery", "primary_area": "", "supplementary_material": "", "author": "Jiyuan Tu;Weidong Liu;Xiaojun Mao", "authorids": "~Jiyuan_Tu1;~Weidong_Liu2;~Xiaojun_Mao1", "gender": "M;;", "homepage": "https://ins.sjtu.edu.cn/graduate/tujiyuan;http://www.math.sjtu.edu.cn/faculty/weidongl/;https://mxjki.github.io/", "dblp": ";;232/4239", "google_scholar": ";;f6KvrMYAAAAJ", "orcid": ";;0000-0002-9362-508X", "linkedin": ";;", "or_profile": "~Jiyuan_Tu1;~Weidong_Liu2;~Mao_Xiaojun1", "aff": "Shanghai Jiaotong University;Shanghai Jiaotong University;Fudan University", "aff_domain": "sjtu.edu.cn;sjtu.edu.cn;fudan.edu.cn", "position": "PhD student;Full Professor;Assistant Professor", "bibtex": "@misc{\ntu2021median,\ntitle={Median {\\{}DC{\\}} for Sign Recovery: Privacy can be Achieved by Deterministic Algorithms},\nauthor={Jiyuan Tu and Weidong Liu and Xiaojun Mao},\nyear={2021},\nurl={https://openreview.net/forum?id=BMua55nUyyt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=BMua55nUyyt", "pdf_size": 0, "rating": "4;4;4;7", "confidence": "5;4;5;3", "wc_review": "252;414;255;605", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 381.5, 144.72473872838742 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8703882797784892, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:l0mC2RUDisMJ:scholar.google.com/&scioq=Median+DC+for+Sign+Recovery:+Privacy+can+be+Achieved+by+Deterministic+Algorithms&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "Shanghai Jiao Tong University;Fudan University", "aff_unique_dep": ";", "aff_unique_url": "https://www.sjtu.edu.cn;https://www.fudan.edu.cn", "aff_unique_abbr": "SJTU;Fudan", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "BUCHknhWq8D", "title": "Sparse Recovery via Bootstrapping: Collaborative or Independent?", "track": "main", "status": "Desk Reject", "tldr": "", "abstract": "Sparse regression problems have traditionally been solved using all available measurements simultaneously. However, this approach fails in challenging scenarios such as when the noise level is high or there are missing data / adversarial samples. \nWe propose JOBS (Joint-Sparse Optimization via Bootstrap Samples) -- a \\emph{collaborative} sparse-regression framework on bootstrapped samples from the pool of available measurements via a joint-sparse constraint to ensure support consistency. In comparison to traditional bagging which solves sub-problems in an \\emph{independent} fashion across bootstrapped samples, JOBS achieves state-of-the-art performance with the added advantage of having a sparser solution while requiring a lower number of observation samples.\n\nAnalysis of theoretical performance limits is employed to determine critical optimal parameters: the number of bootstrap samples $K$ and the number of elements $L$ in each bootstrap sample. Theoretical results indicate a better bound than Bagging (i.e. higher probability of achieving the same or better performance). Simulation results are used to validate this parameter selection. JOBS is robust to adversarial samples that fool the baseline method, as shown by better generalization in an image reconstruction task where the adversary has similar occlusions or alignment as the test sample. Furthermore, JOBS also improves discriminative performance in a facial recognition task in a sparse-representation-based classification setting.", "keywords": "LASSO;Bootstrapping;Bagging;sparsity;group sparsity", "primary_area": "", "supplementary_material": "/attachment/b88f8cb178475580f9cba34a422c4328a7c780f3.zip", "author": "Anonymous", "authorids": "ICLR.cc/2021/Conference/Paper3763/Authors", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nanonymous2021sparse,\ntitle={Sparse Recovery via Bootstrapping: Collaborative or Independent?},\nauthor={Anonymous},\nbooktitle={Submitted to International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=BUCHknhWq8D},\nnote={under review}\n}", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=BUCHknhWq8D", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1 }, { "id": "BUPIRa1D2J", "title": "Trans-Caps: Transformer Capsule Networks with Self-attention Routing", "track": "main", "status": "Reject", "tldr": "", "abstract": "Capsule Networks (CapsNets) have shown to be a promising alternative to Convolutional Neural Networks (CNNs) in many computer vision tasks, due to their ability to encode object viewpoint variations. The high computational complexity and numerical instability of iterative routing mechanisms stem from the challenging nature of the part-object encoding process. This hinders CapsNets from being utilized effectively in large-scale image tasks. In this paper, we propose a novel non-iterative routing strategy named self-attention routing (SAR) that computes the agreement between the capsules in one forward pass. SAR accomplishes this by utilizing a learnable inducing mixture of Gaussians (IMoG) to reduce the cost of computing pairwise attention values from quadratic to linear time complexity. Our observations show that our Transformer Capsule Network (Trans-Caps) is better suited for complex image tasks including CIFAR-10/100, Tiny-ImageNet, and ImageNet when compared to other prominent CapsNet architectures. We also show that Trans-Caps yields a dramatic improvement over its competitors when presented with novel viewpoints on the SmallNORB dataset, outperforming EM-Caps by 5.77% and 3.25% on the novel azimuth and elevation experiments, respectively. Our observations suggest that our routing mechanism is able to capture complex part-whole relationships which allow Trans-Caps to construct reliable geometrical representations of the objects.", "keywords": "capsule network;self-attention", "primary_area": "", "supplementary_material": "/attachment/278ea9feb252915aea9f24295527dcd2d6080ae3.zip", "author": "Aryan Mobiny;Pietro Antonio Cicalese;Hien Van Nguyen", "authorids": "~Aryan_Mobiny1;~Pietro_Antonio_Cicalese1;~Hien_Van_Nguyen1", "gender": "M;;M", "homepage": "https://amobiny.github.io/;;https://www.hvnguyen.com/", "dblp": ";;59/9550", "google_scholar": "PFf8g_8AAAAJ;;e5Gbt20AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Aryan_Mobiny1;~Pietro_Antonio_Cicalese1;~Hien_Van_Nguyen1", "aff": ";;University of Houston", "aff_domain": ";;uh.edu", "position": ";;Associate Professor", "bibtex": "@misc{\nmobiny2021transcaps,\ntitle={Trans-Caps: Transformer Capsule Networks with Self-attention Routing},\nauthor={Aryan Mobiny and Pietro Antonio Cicalese and Hien Van Nguyen},\nyear={2021},\nurl={https://openreview.net/forum?id=BUPIRa1D2J}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=BUPIRa1D2J", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;4;3;3", "wc_review": "561;371;459;330", "wc_reply_reviewers": "839;0;0;0", "wc_reply_authors": "1566;423;224;184", "reply_reviewers": "7;0;0;0", "reply_authors": "5;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 430.25, 88.71689523422243 ], "wc_reply_reviewers_avg": [ 209.75, 363.29765688757203 ], "wc_reply_authors_avg": [ 599.25, 565.4455654614333 ], "reply_reviewers_avg": [ 1.75, 3.031088913245535 ], "reply_authors_avg": [ 2.0, 1.7320508075688772 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7156818923554495849&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Houston", "aff_unique_dep": "", "aff_unique_url": "https://www.uh.edu", "aff_unique_abbr": "UH", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Improved Estimation of Concentration Under $\\ell_p$-Norm Distance Metrics Using Half Spaces", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2690", "id": "BUlyHkzjgmA", "poster": "", "openreview": "https://openreview.net/forum?id=BUlyHkzjgmA", "slides": "https://iclr.cc/virtual/2021/poster/2690", "video": "https://iclr.cc/virtual/2021/poster/2690", "author_site": "Jack Prescott, Xiao Zhang, David Evans", "tldr": "", "abstract": "Concentration of measure has been argued to be the fundamental cause of adversarial vulnerability. Mahloujifar et al. (2019) presented an empirical way to measure the concentration of a data distribution using samples, and employed it to find lower bounds on intrinsic robustness for several benchmark datasets. However, it remains unclear whether these lower bounds are tight enough to provide a useful approximation for the intrinsic robustness of a dataset. To gain a deeper understanding of the concentration of measure phenomenon, we first extend the Gaussian Isoperimetric Inequality to non-spherical Gaussian measures and arbitrary $\\ell_p$-norms ($p \\geq 2$). We leverage these theoretical insights to design a method that uses half-spaces to estimate the concentration of any empirical dataset under $\\ell_p$-norm distance metrics. Our proposed algorithm is more efficient than Mahloujifar et al. (2019)'s, and experiments on synthetic datasets and image benchmarks demonstrate that it is able to find much tighter intrinsic robustness bounds. These tighter estimates provide further evidence that rules out intrinsic dataset concentration as a possible explanation for the adversarial vulnerability of state-of-the-art classifiers.", "keywords": "Adversarial Examples;Concentration of Measure;Gaussian Isoperimetric Inequality", "primary_area": "", "supplementary_material": "/attachment/ce73489d6434eb82d45c4723f8f4b249cf0fa880.zip", "author": "Jack Prescott;Xiao Zhang;David Evans", "authorids": "jbp2jn@virginia.edu;~Xiao_Zhang2;~David_Evans1", "gender": ";M;Not Specified", "homepage": ";https://xiao-zhang.net;https://www.cs.virginia.edu/evans/", "dblp": ";;https://dblp.uni-trier.de/pid/e/DavidEvans", "google_scholar": ";L-lz7CUAAAAJ;DsR4PucAAAAJ", "orcid": ";0009-0008-1837-7670;", "linkedin": ";;", "or_profile": "jbp2jn@virginia.edu;~Xiao_Zhang2;~David_Evans1", "aff": ";University of Virginia;University of Virginia", "aff_domain": ";cs.virginia.edu;virginia.edu", "position": ";PhD student;Professor", "bibtex": "@inproceedings{\nprescott2021improved,\ntitle={Improved Estimation of Concentration Under {\\$}{\\textbackslash}ell{\\_}p{\\$}-Norm Distance Metrics Using Half Spaces},\nauthor={Jack Prescott and Xiao Zhang and David Evans},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=BUlyHkzjgmA}\n}", "github": "[![github](/images/github_icon.svg) jackbprescott/EMC_HalfSpaces](https://github.com/jackbprescott/EMC_HalfSpaces)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;4;3;2", "wc_review": "571;419;290;184", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "358;514;420;0", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;0", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 366.0, 144.68413872985525 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 323.0, 194.57903278616635 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.75, 0.4330127018922193 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4011457988977306291&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=BUlyHkzjgmA", "email": ";cs.virginia.edu;virginia.edu", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "University of Virginia", "aff_unique_dep": "", "aff_unique_url": "https://www.virginia.edu", "aff_unique_abbr": "UVA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "BVPowUU1cR", "title": "Assisting the Adversary to Improve GAN Training", "track": "main", "status": "Reject", "tldr": "", "abstract": "Some of the most popular methods for improving the stability and performance of GANs involve constraining or regularizing the discriminator. In this paper we consider a largely overlooked regularization technique which we refer to as the Adversary's Assistant (AdvAs). We motivate this using a different perspective to that of prior work. Specifically, we consider a common mismatch between theoretical analysis and practice: analysis often assumes that the discriminator reaches its optimum on each iteration. In practice, this is essentially never true, often leading to poor gradient estimates for the generator. To address this, AdvAs is a theoretically motivated penalty imposed on the generator based on the norm of the gradients used to train the discriminator. This encourages the generator to move towards points where the discriminator is optimal. We demonstrate the effect of applying AdvAs to several GAN objectives, datasets and network architectures. The results indicate a reduction in the mismatch between theory and practice and that AdvAs can lead to improvement of GAN training, as measured by FID scores.\n", "keywords": "Generative Adversarial Networks;GANs", "primary_area": "", "supplementary_material": "/attachment/b5fb0a999be989514622d29c02ab49eed4827a01.zip", "author": "Andreas Munk;William Harvey;Frank Wood", "authorids": "~Andreas_Munk1;~William_Harvey1;~Frank_Wood2", "gender": "M;M;M", "homepage": "https://ammunk.com/;https://www.cs.ubc.ca/~wsgh/;http://www.robots.ox.ac.uk/~fwood/", "dblp": ";26/8210-2;44/4750", "google_scholar": "MnrFp5AAAAAJ;https://scholar.google.co.uk/citations?user=kDd7nBkAAAAJ;d4yNzXIAAAAJ", "orcid": ";;", "linkedin": ";;frank-wood-43529114?trk=hp-identity-name", "or_profile": "~Andreas_Munk1;~William_Harvey1;~Frank_Wood2", "aff": "University of British Columbia;University of British Columbia;MILA", "aff_domain": "cs.ubc.ca;cs.ubc.ca;mila.quebec", "position": "PhD student;PhD student;Associate Professor", "bibtex": "@misc{\nmunk2021assisting,\ntitle={Assisting the Adversary to Improve {\\{}GAN{\\}} Training},\nauthor={Andreas Munk and William Harvey and Frank Wood},\nyear={2021},\nurl={https://openreview.net/forum?id=BVPowUU1cR}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=BVPowUU1cR", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "5;4;4;2", "wc_review": "294;514;513;376", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "48;87;72;48", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 424.25, 93.84128888714179 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 63.75, 16.618889854620253 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:HlZJ8sg6neYJ:scholar.google.com/&scioq=Assisting+the+Adversary+to+Improve+GAN+Training&hl=en&as_sdt=0,33", "gs_version_total": 6, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of British Columbia;Mila", "aff_unique_dep": ";", "aff_unique_url": "https://www.ubc.ca;https://mila.quebec", "aff_unique_abbr": "UBC;MILA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Canada" }, { "title": "Robust and Generalizable Visual Representation Learning via Random Convolutions", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2877", "id": "BVSM0x3EDK6", "poster": "", "openreview": "https://openreview.net/forum?id=BVSM0x3EDK6", "slides": "https://iclr.cc/virtual/2021/poster/2877", "video": "https://iclr.cc/virtual/2021/poster/2877", "author_site": "Zhenlin Xu, Deyi Liu, Junlin Yang, Colin Raffel, Marc Niethammer", "tldr": "", "abstract": "While successful for various computer vision tasks, deep neural networks have shown to be vulnerable to texture style shifts and small perturbations to which humans are robust. In this work, we show that the robustness of neural networks can be greatly improved through the use of random convolutions as data augmentation. Random convolutions are approximately shape-preserving and may distort local textures. Intuitively, randomized convolutions create an infinite number of new domains with similar global shapes but random local texture. Therefore, we explore using outputs of multi-scale random convolutions as new images or mixing them with the original images during training. When applying a network trained with our approach to unseen domains, our method consistently improves the performance on domain generalization benchmarks and is scalable to ImageNet. In particular, in the challenging scenario of generalizing to the sketch domain in PACS and to ImageNet-Sketch, our method outperforms state-of-art methods by a large margin. More interestingly, our method can benefit downstream tasks by providing a more robust pretrained visual representation.", "keywords": "domain generalization;robustness;representation learning;data augmentation", "primary_area": "", "supplementary_material": "", "author": "Zhenlin Xu;Deyi Liu;Junlin Yang;Colin Raffel;Marc Niethammer", "authorids": "~Zhenlin_Xu1;~Deyi_Liu1;~Junlin_Yang1;~Colin_Raffel1;~Marc_Niethammer1", "gender": "M;M;M;;M", "homepage": "http://wildphoton.github.io/;https://deyiopt.github.io/;;http://colinraffel.com;http://wwwx.cs.unc.edu/~mn/", "dblp": "66/5350;259/3220.html;;149/0082;88/3304", "google_scholar": "RPGduXAAAAAJ;m4b1TScAAAAJ;QYkscc4AAAAJ;I66ZBYwAAAAJ;https://scholar.google.com.au/citations?user=KqtBi6MAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Zhenlin_Xu1;~Deyi_Liu1;~Junlin_Yang1;~Colin_Raffel1;~Marc_Niethammer1", "aff": "University of North Carolina, Chapel Hill;University of North Carolina, Chapel Hill;Yale University;Google;The University of North Carolina at Chapel Hill", "aff_domain": "unc.edu;unc.edu;yale.edu;google.com;unc.edu", "position": "PhD student;PhD student;PhD student;Research Scientist;Full Professor", "bibtex": "@inproceedings{\nxu2021robust,\ntitle={Robust and Generalizable Visual Representation Learning via Random Convolutions},\nauthor={Zhenlin Xu and Deyi Liu and Junlin Yang and Colin Raffel and Marc Niethammer},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=BVSM0x3EDK6}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=BVSM0x3EDK6)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;4;3;3", "wc_review": "662;798;262;266", "wc_reply_reviewers": "416;194;0;0", "wc_reply_authors": "1130;1003;738;392", "reply_reviewers": "1;1;0;0", "reply_authors": "2;3;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 497.0, 237.91384995413782 ], "wc_reply_reviewers_avg": [ 152.5, 171.5131190317522 ], "wc_reply_authors_avg": [ 815.75, 282.58837113370396 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 267, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5351165524460933489&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=BVSM0x3EDK6", "email": "unc.edu;unc.edu;yale.edu;google.com;unc.edu", "author_num": 5, "aff_unique_index": "0;0;1;2;3", "aff_unique_norm": "University of North Carolina;Yale University;Google;University of North Carolina at Chapel Hill", "aff_unique_dep": ";;Google;", "aff_unique_url": "https://www.unc.edu;https://www.yale.edu;https://www.google.com;https://www.unc.edu", "aff_unique_abbr": "UNC;Yale;Google;UNC Chapel Hill", "aff_campus_unique_index": "0;0;2;0", "aff_campus_unique": "Chapel Hill;;Mountain View", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "BW5PuV4V-rL", "title": "Gradient-based training of Gaussian Mixture Models for High-Dimensional Streaming Data", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present an approach for efficiently training Gaussian Mixture Models by SGD on non-stationary, high-dimensional streaming data.\nOur training scheme does not require data-driven parameter initialization (e.g., k-means) and has the ability to process high-dimensional samples without numerical problems.\nFurthermore, the approach allows mini-batch sizes as low as 1, typical for streaming-data settings, and it is possible to react and adapt to changes in data statistics (concept drift/shift) without catastrophic forgetting.\nMajor problems in such streaming-data settings are undesirable local optima during early training phases and numerical instabilities due to high data dimensionalities.%, and catastrophic forgetting when encountering concept drift.\nWe introduce an adaptive annealing procedure to address the first problem,%, which additionally plays a decisive role in controlling the \\acp{GMM}' reaction to concept drift.\nwhereas numerical instabilities are eliminated by using an exponential-free approximation to the standard \\ac{GMM} log-likelihood.\nExperiments on a variety of visual and non-visual benchmarks show that our SGD approach can be trained completely without, for instance, k-means based centroid initialization, and compares favorably to sEM, an online variant of EM.", "keywords": "Gaussian Mixture Models;Stochastic Gradient Descent;Unsupervised Representation Learning;Continual Learning", "primary_area": "", "supplementary_material": "", "author": "Alexander Gepperth;Benedikt Pf\u00fclb", "authorids": "~Alexander_Gepperth1;~Benedikt_Pf\u00fclb1", "gender": "M;M", "homepage": "http://www.gepperth.net;https://www.hs-fulda.de/", "dblp": "05/11166;", "google_scholar": "QR2zb3IAAAAJ;", "orcid": "0000-0003-2216-7808;", "linkedin": ";", "or_profile": "~Alexander_Gepperth1;~Benedikt_Pf\u00fclb1", "aff": "HAW Fulda;University of Applied Sciences Fulda", "aff_domain": "informatik.hs-fulda.de;cs.hs-fulda.de", "position": "Full Professor;PhD student", "bibtex": "@misc{\ngepperth2021gradientbased,\ntitle={Gradient-based training of Gaussian Mixture Models for High-Dimensional Streaming Data},\nauthor={Alexander Gepperth and Benedikt Pf{\\\"u}lb},\nyear={2021},\nurl={https://openreview.net/forum?id=BW5PuV4V-rL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=BW5PuV4V-rL", "pdf_size": 0, "rating": "5;5;5;5;5", "confidence": "3;4;3;4;2", "wc_review": "243;188;491;298;81", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "377;406;462;352;145", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 3.2, 0.7483314773547882 ], "wc_review_avg": [ 260.2, 135.88141889162037 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 348.4, 108.09736352011551 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 31, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11623994680380924907&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 25, "aff_unique_index": "0;1", "aff_unique_norm": "Fulda University of Applied Sciences;University of Applied Sciences Fulda", "aff_unique_dep": ";", "aff_unique_url": "https://www.haw-fulda.de;https://www.hs-fulda.de", "aff_unique_abbr": "HAW Fulda;", "aff_campus_unique_index": "0", "aff_campus_unique": "Fulda;", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "title": "Counterfactual Generative Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2892", "id": "BXewfAYMmJw", "poster": "", "openreview": "https://openreview.net/forum?id=BXewfAYMmJw", "slides": "https://iclr.cc/virtual/2021/poster/2892", "video": "https://iclr.cc/virtual/2021/poster/2892", "author_site": "Axel Sauer, Andreas Geiger", "tldr": "", "abstract": "Neural networks are prone to learning shortcuts -- they often model simple correlations, ignoring more complex ones that potentially generalize better. Prior works on image classification show that instead of learning a connection to object shape, deep classifiers tend to exploit spurious correlations with low-level texture or the background for solving the classification task. In this work, we take a step towards more robust and interpretable classifiers that explicitly expose the task's causal structure. Building on current advances in deep generative modeling, we propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision. By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background; hence, they allow for generating counterfactual images. We demonstrate the ability of our model to generate such images on MNIST and ImageNet. Further, we show that the counterfactual images can improve out-of-distribution robustness with a marginal drop in performance on the original classification task, despite being synthetic. Lastly, our generative model can be trained efficiently on a single GPU, exploiting common pre-trained models as inductive biases.", "keywords": "Causality;Counterfactuals;Generative Models;Robustness;Image Classification;Data Augmentation", "primary_area": "", "supplementary_material": "", "author": "Axel Sauer;Andreas Geiger", "authorids": "~Axel_Sauer1;~Andreas_Geiger3", "gender": "M;M", "homepage": "https://axelsauer.com/;http://www.cvlibs.net", "dblp": ";40/5825-1", "google_scholar": "https://scholar.google.de/citations?user=ZsDn16sAAAAJ;https://scholar.google.ca/citations?hl=en", "orcid": ";0000-0002-8151-3726", "linkedin": ";", "or_profile": "~Axel_Sauer1;~Andreas_Geiger3", "aff": "Max Planck Institute for Intelligent Systems, Max-Planck Institute;University of Tuebingen", "aff_domain": "tue.mpg.de;uni-tuebingen.de", "position": "PhD student;Professor", "bibtex": "@inproceedings{\nsauer2021counterfactual,\ntitle={Counterfactual Generative Networks},\nauthor={Axel Sauer and Andreas Geiger},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=BXewfAYMmJw}\n}", "github": "[![github](/images/github_icon.svg) autonomousvision/counterfactual_generative_networks](https://github.com/autonomousvision/counterfactual_generative_networks)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "5;5;7;8", "confidence": "3;5;3;4", "wc_review": "421;624;558;350", "wc_reply_reviewers": "0;287;235;0", "wc_reply_authors": "969;2574;1312;706", "reply_reviewers": "0;1;1;0", "reply_authors": "3;5;3;1", "rating_avg": [ 6.25, 1.299038105676658 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 488.25, 108.31522284517537 ], "wc_reply_reviewers_avg": [ 130.5, 131.78865656800664 ], "wc_reply_authors_avg": [ 1390.25, 716.4210964928378 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 3.0, 1.4142135623730951 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 156, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=445809661981357040&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=BXewfAYMmJw", "email": "tue.mpg.de;uni-tuebingen.de", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Max Planck Institute for Intelligent Systems;University of Tuebingen", "aff_unique_dep": "Intelligent Systems;", "aff_unique_url": "https://www.mpi-is.mpg.de;https://www.uni-tuebingen.de/", "aff_unique_abbr": "MPI-IS;Uni T\u00fcbingen", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "BZeewPpFBI6", "title": "Dual Adversarial Training for Unsupervised Domain Adaptation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Deep neural networks obtain remarkable achievements in diverse real-world applications. However, their success relies on the availability of large amounts of labeled data. A trained model may fail to generalize well on a domain whose distribution differs from the training data distribution. Collecting abundant labeled data for all domains of interest are expensive and time-consuming, sometimes even impossible. Domain adaptation sets out to address this problem, aiming to leverage labeled data in the source domain to learn a good predictive model for the target domain whose labels are scarce or unavailable. A mainstream approach is adversarial domain adaptation, which learns domain invariant-features by performing alignment across different distributions. Most domain adaptation methods focus on reducing the divergence between two domains to make the improvement. A prerequisite of domain adaptation is the adaptability, which is measured by the expected error of the ideal joint hypothesis on the source and target domains, should be kept at a small value in the process of domain alignment. However, adversarial learning may degrade the adaptability, since it distorts the original distributions by suppressing the domain-specific information. In this paper, we propose an approach, which focuses on strengthening the model's adaptability, for domain adaptation. Our proposed dual adversarial training (DAT) method introduces class-invariant features to enhance the discriminability of the latent space without sacrificing the transferability. The class-invariant features, extracted from the source domain, can play a positive role in the classification on the target domain. We demonstrate the effectiveness of our method by yielding state-of-the-art results on several benchmarks.", "keywords": "Domain Adaptation;Class-Invariant Features;Adversarial Learning", "primary_area": "", "supplementary_material": "", "author": "Yuan Wu;Diana Inkpen;Ahmed El-Roby", "authorids": "~Yuan_Wu2;~Diana_Inkpen1;~Ahmed_El-Roby1", "gender": "M;F;M", "homepage": ";http://www.site.uottawa.ca/~diana/;http://people.scs.carleton.ca/~ahmedelroby", "dblp": "41/5176-2;i/DianaInkpen;https://dblp.org/pers/e/El=Roby:Ahmed.html", "google_scholar": "KVeRu2QAAAAJ;66pwIBcAAAAJ;https://scholar.google.ca/citations?user=DA68vjUAAAAJ", "orcid": ";0000-0002-0202-2444;", "linkedin": "yuan-wu-953208150/;https://ca.linkedin.com/in/diana-inkpen-a877353;", "or_profile": "~Yuan_Wu2;~Diana_Inkpen1;~Ahmed_El-Roby1", "aff": "Carleton University;University of Ottawa;Carleton University", "aff_domain": "carleton.ca;uottawa.ca;carleton.ca", "position": "PhD student;Full Professor;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=BZeewPpFBI6", "pdf_size": 0, "rating": "2;3;3;5", "confidence": "5;5;5;5", "wc_review": "224;614;279;159", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1006;2300;1425;399", "reply_reviewers": "0;0;0;0", "reply_authors": "2;4;2;1", "rating_avg": [ 3.25, 1.0897247358851685 ], "confidence_avg": [ 5.0, 0.0 ], "wc_review_avg": [ 319.0, 175.5348968154196 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1282.5, 691.4906000807241 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 1.0897247358851685 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:8b_x1B5mNNYJ:scholar.google.com/&scioq=Dual+Adversarial+Training+for+Unsupervised+Domain+Adaptation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Carleton University;University of Ottawa", "aff_unique_dep": ";", "aff_unique_url": "https://carleton.ca;https://www.uottawa.ca", "aff_unique_abbr": "Carleton;U Ottawa", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Canada" }, { "title": "The Risks of Invariant Risk Minimization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2752", "id": "BbNIbVPJ-42", "poster": "", "openreview": "https://openreview.net/forum?id=BbNIbVPJ-42", "slides": "https://iclr.cc/virtual/2021/poster/2752", "video": "https://iclr.cc/virtual/2021/poster/2752", "author_site": "Elan Rosenfeld, Pradeep K Ravikumar, Andrej Risteski", "tldr": "", "abstract": "Invariant Causal Prediction (Peters et al., 2016) is a technique for out-of-distribution generalization which assumes that some aspects of the data distribution vary across the training set but that the underlying causal mechanisms remain constant. Recently, Arjovsky et al. (2019) proposed Invariant Risk Minimization (IRM), an objective based on this idea for learning deep, invariant features of data which are a complex function of latent variables; many alternatives have subsequently been suggested. However, formal guarantees for all of these works are severely lacking. In this paper, we present the first analysis of classification under the IRM objective\u2014as well as these recently proposed alternatives\u2014under a fairly natural and general model. In the linear case, we show simple conditions under which the optimal solution succeeds or, more often, fails to recover the optimal invariant predictor. We furthermore present the very first results in the non-linear regime: we demonstrate that IRM can fail catastrophically unless the test data is sufficiently similar to the training distribution\u2014this is precisely the issue that it was intended to solve. Thus, in this setting we find that IRM and its alternatives fundamentally do not improve over standard Empirical Risk Minimization.", "keywords": "out-of-distribution generalization;causality;representation learning;deep learning", "primary_area": "", "supplementary_material": "/attachment/1b3d0d96755bc44df6a34f05a649652ca955d934.zip", "author": "Elan Rosenfeld;Pradeep Kumar Ravikumar;Andrej Risteski", "authorids": "~Elan_Rosenfeld1;~Pradeep_Kumar_Ravikumar1;~Andrej_Risteski2", "gender": "M;M;M", "homepage": ";http://www.cs.cmu.edu/~pradeepr/;", "dblp": "236/4508;94/3594;63/11143", "google_scholar": "f0j0K8QAAAAJ;https://scholar.google.com.tw/citations?user=Q4DTPw4AAAAJ;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Elan_Rosenfeld1;~Pradeep_Kumar_Ravikumar1;~Andrej_Risteski2", "aff": "Carnegie Mellon University;School of Computer Science, Carnegie Mellon University;Carnegie Mellon University", "aff_domain": "andrew.cmu.edu;cs.cmu.edu;cmu.edu", "position": "PhD student;Associate Professor;Assistant Professor", "bibtex": "@inproceedings{\nrosenfeld2021the,\ntitle={The Risks of Invariant Risk Minimization},\nauthor={Elan Rosenfeld and Pradeep Kumar Ravikumar and Andrej Risteski},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=BbNIbVPJ-42}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;2;2;3", "wc_review": "1859;239;133;270", "wc_reply_reviewers": "2217;153;0;0", "wc_reply_authors": "3591;571;260;336", "reply_reviewers": "4;2;0;0", "reply_authors": "6;2;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 2.75, 0.82915619758885 ], "wc_review_avg": [ 625.25, 714.1149679848477 ], "wc_reply_reviewers_avg": [ 592.5, 939.9831115504151 ], "wc_reply_authors_avg": [ 1189.5, 1391.238387193223 ], "reply_reviewers_avg": [ 1.5, 1.6583123951777 ], "reply_authors_avg": [ 2.5, 2.0615528128088303 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 367, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16629143072867731417&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=BbNIbVPJ-42", "email": "andrew.cmu.edu;cs.cmu.edu;cmu.edu", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "Be7Z5EfYp-Q", "title": "Practical Phase Retrieval: Low-Photon Holography with Untrained Priors", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Phase retrieval is the inverse problem of recovering a signal from magnitude-only Fourier measurements, and underlies numerous imaging modalities, such as Coherent Diffraction Imaging (CDI). A variant of this setup, known as holography, includes a reference object that is placed adjacent to the specimen of interest before measurements are collected. The resulting inverse problem, known as holographic phase retrieval, is well-known to have improved problem conditioning relative to the original. This innovation, i.e. Holographic CDI, becomes crucial at the nanoscale, where imaging specimens such as viruses, proteins, and crystals require low-photon measurements. This data is highly corrupted by Poisson shot noise, and often lacks low-frequency content as well. In this work, we introduce a dataset-free deep learning framework for holographic phase retrieval adapted to these challenges. The key ingredients of our approach are the explicit and flexible incorporation of the physical forward model into the automatic differentiation procedure, the Poisson log-likelihood objective function, and an optional untrained deep image prior. We perform extensive evaluation under realistic conditions. Compared to competing classical methods, our method recovers signal from higher noise levels and is more resilient to suboptimal reference design, as well as to large missing regions of low-frequencies in the observations. To the best of our knowledge, this is the first work to consider a dataset-free machine learning approach for holographic phase retrieval.", "keywords": "inverse problems;phase retrieval;generative priors;holography;coherent diffraction imaging", "primary_area": "", "supplementary_material": "/attachment/3d4975e4c4a701f9ff7073f4411319d9248177b8.zip", "author": "Hannah Lawrence;David Barmherzig;Henry Li;Michael Eickenberg;Marylou Gabri\u00e9", "authorids": "hanlaw@mit.edu;dbarmherzig@flatironinstitute.org;~Henry_Li2;~Michael_Eickenberg3;~Marylou_Gabri\u00e91", "gender": ";;;;F", "homepage": ";;https://hnry.li;;https://marylou-gabrie.github.io/", "dblp": ";;31/6498;;164/5772", "google_scholar": ";;o7-TIlcAAAAJ;;5m1DvLwAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "hanlaw@mit.edu;dbarmherzig@flatironinstitute.org;~Henry_Li2;~Michael_Eickenberg3;~Marylou_Gabri\u00e91", "aff": ";;Yale University;;Flatiron Institute", "aff_domain": ";;yale.edu;;flatiroininstitute.org", "position": ";;PhD student;;Postdoc", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Be7Z5EfYp-Q", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "5;1;4;3", "wc_review": "318;81;719;341", "wc_reply_reviewers": "0;0;373;0", "wc_reply_authors": "678;706;1393;603", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;3;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.25, 1.479019945774904 ], "wc_review_avg": [ 364.75, 228.44952943702904 ], "wc_reply_reviewers_avg": [ 93.25, 161.5137378057978 ], "wc_reply_authors_avg": [ 845.0, 318.6212485067498 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.19999999999999998, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:3ZUgLNM0ywgJ:scholar.google.com/&scioq=Practical+Phase+Retrieval:+Low-Photon+Holography+with+Untrained+Priors&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Yale University;Flatiron Institute", "aff_unique_dep": ";", "aff_unique_url": "https://www.yale.edu;https://flatironinstitute.org", "aff_unique_abbr": "Yale;Flatiron", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "BeDgBhZP7S", "title": "Catching the Long Tail in Deep Neural Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Learning dynamics in deep neural networks are still a subject of debate. In particular, the identification of eventual differences regarding how deep models learn from frequent versus rare training examples is still an area of active research. In this work, we focus on studying the dynamics of memorization in deep neural networks, where we understand memorization as the process of learning from rare or unusual training examples that are part of the long-tail of a dataset. As a working hypothesis, we speculate that during learning some weights focus on mining patterns from frequent examples while others are in charge of memorizing rare long-tail samples. Using this idea, we develop a method for uncovering which weights focus on mining frequent patterns and which ones focus on memorization. Following previous studies, we empirically verify that deep neural networks learn frequent patterns first and then focus on memorizing long-tail examples. Furthermore, our results show that during training a small proportion of the total weights present an early convergence to model frequent patterns, while the vast majority of the weights present a slow convergence to model long-tail examples. We also find that memorization happens mostly at the first layers of a network and not at the level of classification. Finally, by analyzing performance differences for models trained with varying levels of long-tail samples, we find that a larger number of long-tail samples has a negative impact on learning frequent patterns, by a process we conjecture to force the model to learn frequent patterns as memorization.", "keywords": "Deep Learning;Memorization;Long Tail", "primary_area": "", "supplementary_material": "/attachment/ba125572d639de164d5cea361b1c2652d1fe9e62.zip", "author": "Julio Hurtado;Alain Raymond;Alvaro Soto", "authorids": "~Julio_Hurtado1;~Alain_Raymond1;~Alvaro_Soto1", "gender": "M;M;M", "homepage": "https://warwick.ac.uk/fac/sci/camacs/people/hurtado;https://ialab.ing.puc.cl/;http://asoto.ing.puc.cl", "dblp": "178/4255;294/8655;25/3682", "google_scholar": "https://scholar.google.com/citations?hl=es;j8KhaCIAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;", "linkedin": ";;", "or_profile": "~Julio_Hurtado1;~Alain_Raymond1;~Alvaro_Soto1", "aff": "Pontificia Universidad Cat\u00f3lica;Pontificia Universidad Catolica de Chile;Universidad Cat\u00f3lica de Chile", "aff_domain": "uc.cl;uc.cl;uc.cl", "position": "PhD student;PhD student;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=BeDgBhZP7S", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;3;4", "wc_review": "488;420;715", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 541.0, 126.12956301623608 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:GVFN4UT_0cUJ:scholar.google.com/&scioq=Catching+the+Long+Tail+in+Deep+Neural+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Pontificia Universidad Cat\u00f3lica;Pontificia Universidad Catolica de Chile;Universidad Cat\u00f3lica de Chile", "aff_unique_dep": ";;", "aff_unique_url": "https://www.puc.cl;https://www.puc.cl;https://www.uc.cl", "aff_unique_abbr": "PUC;PUC;PUC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Chile" }, { "id": "BfayGoTV4iQ", "title": "SketchEmbedNet: Learning Novel Concepts by Imitating Drawings", "track": "main", "status": "Reject", "tldr": "", "abstract": "Sketch drawings are an intuitive visual domain that appeals to human instinct. Previous work has shown that recurrent neural networks are capable of producing sketch drawings of a single or few classes at a time. In this work we investigate representations developed by training a generative model to produce sketches from pixel images across many classes in a sketch domain. We find that the embeddings learned by this sketching model are extremely informative for visual tasks and infer a unique visual understanding. We then use them to exceed state-of-the-art performance in unsupervised few-shot classification on the Omniglot and mini-ImageNet benchmarks. We also leverage the generative capacity of our model to produce high quality sketches of novel classes based on just a single example. ", "keywords": "generative;probabilistic;sketch;drawing;few-shot learning;classification;embedding learning", "primary_area": "", "supplementary_material": "/attachment/80b09a62f150a90d678403290403239d68386a0c.zip", "author": "Alexander Wang;Mengye Ren;Richard Zemel", "authorids": "~Alexander_Wang1;~Mengye_Ren1;~Richard_Zemel1", "gender": "M;;M", "homepage": "https://www.cs.toronto.edu/~alexw/;http://www.cs.toronto.edu/~mren;http://www.cs.columbia.edu/~zemel", "dblp": "367/7251;163/1952;16/6366", "google_scholar": "X0n8TX0AAAAJ;XcQ9WqMAAAAJ;https://scholar.google.ca/citations?user=iBeDoRAAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Alexander_Wang1;~Mengye_Ren1;~Richard_Zemel1", "aff": "Department of Computer Science, University of Toronto;University of Toronto;Department of Computer Science, University of Toronto", "aff_domain": "cs.toronto.edu;toronto.edu;cs.toronto.edu", "position": "MS student;PhD student;Full Professor", "bibtex": "@misc{\nwang2021sketchembednet,\ntitle={SketchEmbedNet: Learning Novel Concepts by Imitating Drawings},\nauthor={Alexander Wang and Mengye Ren and Richard Zemel},\nyear={2021},\nurl={https://openreview.net/forum?id=BfayGoTV4iQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=BfayGoTV4iQ", "pdf_size": 0, "rating": "4;6;6;9", "confidence": "3;3;4;4", "wc_review": "400;465;650;165", "wc_reply_reviewers": "0;0;174;0", "wc_reply_authors": "839;768;1226;158", "reply_reviewers": "0;0;1;0", "reply_authors": "2;1;2;1", "rating_avg": [ 6.25, 1.7853571071357126 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 420.0, 173.45748758701654 ], "wc_reply_reviewers_avg": [ 43.5, 75.34421012924616 ], "wc_reply_authors_avg": [ 747.75, 382.51298997550396 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.7001400420140049, "gs_citation": 34, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8136668361593221793&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Toronto", "aff_unique_dep": "Department of Computer Science", "aff_unique_url": "https://www.utoronto.ca", "aff_unique_abbr": "U of T", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Toronto;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Canada" }, { "id": "BgEGeFRGof", "title": "Anomaly detection and regime searching in fitness-tracker data", "track": "main", "status": "Desk Reject", "tldr": "", "abstract": "In our project, we solve the problem of human activity monitoring based on data from sensors attached to the hands of various workers. First of all, the recognition results help to increase labor productivity and optimize production processes at a building site. Also, the analysis of the behavior of workers allows us to track a person's well-being, compliance with safety measures and accident prevention. \nData collected from the fitness tracker, require careful preprocessing. The Gaussian Process model was applied to fill in the gaps in time series and extract outliers, that increase metrics of the models. The comparison of several models for activity recognition was performed if form of supervised learning. An anomaly detection approach was applied and provided useful results for activity monitoring during construction work. In addition, the neural network based on the architecture of variational autoencoder allowed us to extract main work regimes.\nThe fitness tracker time series data set was collected, tagged and published for further research.", "keywords": "time series analysis;neural networks;variational autoencoders;anomaly detection", "primary_area": "", "supplementary_material": "", "author": "Anonymous", "authorids": "ICLR.cc/2021/Conference/Paper1800/Authors", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nanonymous2021anomaly,\ntitle={Anomaly detection and regime searching in fitness-tracker data},\nauthor={Anonymous},\nbooktitle={Submitted to International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=BgEGeFRGof},\nnote={under review}\n}", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=BgEGeFRGof", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1 }, { "id": "Bi2OvVf1KPn", "title": "Provable Robust Learning for Deep Neural Networks under Agnostic Corrupted Supervision", "track": "main", "status": "Reject", "tldr": "", "abstract": "Training deep neural models in the presence of corrupted supervisions is challenging as the corrupted data points may significantly impact the generalization performance. To alleviate this problem, we present an efficient robust algorithm that achieves strong guarantees without any assumption on the type of corruption and provides a unified framework for both classification and regression problems. Different from many existing approaches that quantify the quality of individual data points (e.g., loss values) and filter out data points accordingly, the proposed algorithm focuses on controlling the collective impact of data points on the averaged gradient. Even when a corrupted data point failed to be excluded by the proposed algorithm, the data point will have very limited impacts on the overall loss, as compared with state-of-the-art filtering data points based on loss values. Extensive empirical results on multiple benchmark datasets have demonstrated the robustness of the proposed method under different types of corruption.", "keywords": "Noisy Label;Corrupted Supervision;Robustness;Optimization", "primary_area": "", "supplementary_material": "/attachment/0b34fc7642d25cc282cefc0d01e5629a95e9ab17.zip", "author": "Boyang Liu;Mengying Sun;Ding Wang;Pang-Ning Tan;Jiayu Zhou", "authorids": "~Boyang_Liu1;~Mengying_Sun1;wangdin1@msu.edu;~Pang-Ning_Tan1;~Jiayu_Zhou1", "gender": "M;F;;M;M", "homepage": ";;;http://www.cse.msu.edu/~ptan;http://jiayuzhou.github.io/", "dblp": "165/8466;203/9353;;t/PangNingTan.html;73/1353", "google_scholar": ";ga9roa8AAAAJ;;https://scholar.google.com.tw/citations?user=xNs4D2QAAAAJ;https://scholar.google.com.tw/citations?user=yQKlLTQAAAAJ", "orcid": ";;;;0000-0003-4336-6777", "linkedin": ";;;;jiayuzhou/", "or_profile": "~Boyang_Liu1;~Mengying_Sun1;wangdin1@msu.edu;~Pang-Ning_Tan1;~Jiayu_Zhou1", "aff": "Michigan State University;Michigan State University;;Michigan State University;Michigan State University", "aff_domain": "msu.edu;msu.edu;;msu.edu;msu.edu", "position": "PhD student;PhD student;;Assistant Professor;Assistant Professor", "bibtex": "@misc{\nliu2021provable,\ntitle={Provable Robust Learning for Deep Neural Networks under Agnostic Corrupted Supervision},\nauthor={Boyang Liu and Mengying Sun and Ding Wang and Pang-Ning Tan and Jiayu Zhou},\nyear={2021},\nurl={https://openreview.net/forum?id=Bi2OvVf1KPn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=Bi2OvVf1KPn", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "5;5;4;3", "wc_review": "664;193;285;245", "wc_reply_reviewers": "139;0;0;0", "wc_reply_authors": "1099;436;817;211", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 346.75, 186.04619721993782 ], "wc_reply_reviewers_avg": [ 34.75, 60.188765563018485 ], "wc_reply_authors_avg": [ 640.75, 341.93009153919166 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8528028654224418, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:aMDWrNaUYPkJ:scholar.google.com/&scioq=Provable+Robust+Learning+for+Deep+Neural+Networks+under+Agnostic+Corrupted+Supervision&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Michigan State University", "aff_unique_dep": "", "aff_unique_url": "https://www.msu.edu", "aff_unique_abbr": "MSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "BnokSKnhC7F", "title": "Maximum Reward Formulation In Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon). However, several crucial applications in the real world, such as drug discovery, do not fit within this framework because an RL agent only needs to identify states (molecules) that achieve the highest reward within a trajectory and does not need to optimize for the expected cumulative return. In this work, we formulate an objective function to maximize the expected maximum reward along a trajectory, derive a novel functional form of the Bellman equation, introduce the corresponding Bellman operators, and provide a proof of convergence. Using this formulation, we achieve state-of-the-art results on the task of molecule generation that mimics a real-world drug discovery pipeline.", "keywords": "Reinforcement Learning;Theoretical Reinforcement Learning;Drug Discovery;Molecule Generation;de novo drug design", "primary_area": "", "supplementary_material": "/attachment/01793fede069ade9fdd719a2ea048daf765d0cec.zip", "author": "SaiKrishna Gottipati;Yashaswi Pathak;Rohan Nuttall;. Sahir;Raviteja Chunduru;Ahmed Touati;Sriram Ganapathi Subramanian;Matthew E. Taylor;Sarath Chandar", "authorids": "~SaiKrishna_Gottipati1;~Yashaswi_Pathak1;rnuttall@ualberta.ca;~._Sahir1;~Raviteja_Chunduru1;~Ahmed_Touati1;~Sriram_Ganapathi_Subramanian1;~Matthew_E._Taylor2;~Sarath_Chandar1", "gender": ";M;;M;M;M;M;;M", "homepage": "https://saikrishna-1996.github.io;https://yp201.github.ip;;;;;https://sriramsubramanian.com;;http://sarathchandar.in/", "dblp": ";;;276/1581;;147/5871;217/9729;;45/8542", "google_scholar": "9syQKRIAAAAJ;;;;;https://scholar.google.fr/citations?user=D4LT5xAAAAAJ;O2jvQAYAAAAJ;;https://scholar.google.co.in/citations?user=yxWtZLAAAAAJ", "orcid": ";;;;;;;;", "linkedin": "saikrishna-1996/;;;sahir-noor-ali-25943141/;ravitej310/;ahmed-touati-4a132a76/;sriram-ganapathi-subramanian-7518a9a2/;;", "or_profile": "~SaiKrishna_Gottipati1;~Yashaswi_Pathak1;rnuttall@ualberta.ca;~._Sahir1;~Raviteja_Chunduru1;~Ahmed_Touati1;~Sriram_Ganapathi_Subramanian1;~Matthew_E._Taylor2;~Sarath_Chandar1", "aff": "99andBeyond;International Institute of Information Technology Hyderabad;;University of Alberta;McGill University;;University of Waterloo;;\u00c9cole Polytechnique de Montr\u00e9al", "aff_domain": "99andbeyond.com;iiit.ac.in;;ualberta.ca;mcgill.ca;;uwaterloo.ca;;polymtl.ca", "position": "Machine Learning Researcher;MS student;;MS student;MS student;;PhD student;;Assistant Professor", "bibtex": "@misc{\ngottipati2021maximum,\ntitle={Maximum Reward Formulation In Reinforcement Learning},\nauthor={SaiKrishna Gottipati and Yashaswi Pathak and Rohan Nuttall and . Sahir and Raviteja Chunduru and Ahmed Touati and Sriram Ganapathi Subramanian and Matthew E. Taylor and Sarath Chandar},\nyear={2021},\nurl={https://openreview.net/forum?id=BnokSKnhC7F}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=BnokSKnhC7F", "pdf_size": 0, "rating": "3;4;5;5;6", "confidence": "4;3;3;3;3", "wc_review": "144;728;303;404;1097", "wc_reply_reviewers": "165;291;0;0;318", "wc_reply_authors": "568;1086;413;464;759", "reply_reviewers": "1;1;0;0;1", "reply_authors": "3;2;1;1;2", "rating_avg": [ 4.6, 1.0198039027185568 ], "confidence_avg": [ 3.2, 0.39999999999999997 ], "wc_review_avg": [ 535.2, 339.6936266696801 ], "wc_reply_reviewers_avg": [ 154.8, 136.53922513329275 ], "wc_reply_authors_avg": [ 658.0, 244.60008176613513 ], "reply_reviewers_avg": [ 0.6, 0.48989794855663565 ], "reply_authors_avg": [ 1.8, 0.7483314773547883 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": -0.7844645405527362, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3507506340145776803&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;2;3;4;5", "aff_unique_norm": "99andBeyond;International Institute of Information Technology;University of Alberta;McGill University;University of Waterloo;\u00c9cole Polytechnique de Montr\u00e9al", "aff_unique_dep": ";;;;;", "aff_unique_url": ";https://iiit Hyderabad.ac.in;https://www.ualberta.ca;https://www.mcgill.ca;https://uwaterloo.ca;https://www.polymtl.ca", "aff_unique_abbr": ";IIIT Hyderabad;UAlberta;McGill;UW;Polytechnique Montr\u00e9al", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Hyderabad;Montr\u00e9al", "aff_country_unique_index": "1;2;2;2;2", "aff_country_unique": ";India;Canada" }, { "id": "BntruCi1uvF", "title": "Truly Deterministic Policy Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we present a policy gradient method that avoids exploratory noise injection and performs policy search over the deterministic landscape. By avoiding noise injection all sources of estimation variance can be eliminated in systems with deterministic dynamics (up to the initial state distribution). Since deterministic policy regularization is impossible using traditional non-metric measures such as the KL divergence, we derive a Wasserstein-based quadratic model for our purposes. We state conditions on the system model under which it is possible to establish a monotonic policy improvement guarantee, propose a surrogate function for policy gradient estimation, and show that it is possible to compute exact advantage estimates if both the state transition model and the policy are deterministic. Finally, we describe two novel robotic control environments---one with non-local rewards in the frequency domain and the other with a long horizon (8000 time-steps)---for which our policy gradient method (TDPO) significantly outperforms existing methods (PPO, TRPO, DDPG, and TD3).", "keywords": "Deterministic Policy Gradient;Deterministic Exploration;Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Ehsan Saleh;Saba Ghaffari;Matthew West;Tim Bretl", "authorids": "~Ehsan_Saleh1;sabag2@illinois.edu;~Matthew_West1;~Tim_Bretl1", "gender": ";;;M", "homepage": ";;http://lagrange.mechse.illinois.edu;http://bretl.csl.illinois.edu/", "dblp": ";;;29/2834", "google_scholar": ";;;https://scholar.google.com.tw/citations?user=ab_0lGcAAAAJ", "orcid": ";;0000-0002-7605-0050;", "linkedin": ";;;", "or_profile": "~Ehsan_Saleh1;sabag2@illinois.edu;~Matthew_West1;~Tim_Bretl1", "aff": ";;;University of Illinois, Urbana Champaign", "aff_domain": ";;;illinois.edu", "position": ";;;Associate Professor", "bibtex": "@misc{\nsaleh2021truly,\ntitle={Truly Deterministic Policy Optimization},\nauthor={Ehsan Saleh and Saba Ghaffari and Matthew West and Tim Bretl},\nyear={2021},\nurl={https://openreview.net/forum?id=BntruCi1uvF}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=BntruCi1uvF", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;3;4;3", "wc_review": "296;379;421;716", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "931;1006;486;1541", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;3", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 453.0, 158.36508453570187 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 991.0, 374.6164705402046 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11328055735791293135&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0", "aff_unique_norm": "University of Illinois Urbana-Champaign", "aff_unique_dep": "", "aff_unique_url": "https://illinois.edu", "aff_unique_abbr": "UIUC", "aff_campus_unique_index": "0", "aff_campus_unique": "Urbana-Champaign", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "Bpw_O132lWT", "title": "Dynamic of Stochastic Gradient Descent with State-dependent Noise", "track": "main", "status": "Reject", "tldr": "", "abstract": "Stochastic gradient descent (SGD) and its variants are mainstream methods to train deep neural networks. Since neural networks are non-convex, more and more works study the dynamic behavior of SGD and its impact to generalization, especially the escaping efficiency from local minima. However, these works make the over-simplified assumption that the distribution of gradient noise is state-independent, although it is state-dependent. In this work, we propose a novel power-law dynamic with state-dependent diffusion to approximate the dynamic of SGD. Then, we prove that the stationary distribution of power-law dynamic is heavy-tailed, which matches the existing empirical observations. Next, we study the escaping efficiency from local minimum of power-law dynamic and prove that the mean escaping time is in polynomial order of the barrier height of the basin, much faster than exponential order of previous dynamics. It indicates that SGD can escape deep sharp minima efficiently and tends to stop at flat minima that have lower generalization error. Finally, we conduct experiments to compare SGD and power-law dynamic, and the results verify our theoretical findings.", "keywords": "state-dependent noise;power-law dynamic;stochastic gradient descent;generalization;deep neural network;heavy-tailed;escape time", "primary_area": "", "supplementary_material": "", "author": "Qi Meng;Shiqi Gong;Wei Chen;Zhi-Ming Ma;Tie-Yan Liu", "authorids": "~Qi_Meng1;~Shiqi_Gong1;~Wei_Chen1;~Zhi-Ming_Ma1;~Tie-Yan_Liu1", "gender": "F;M;F;;M", "homepage": ";;https://weichen-cas.github.io/;http://homepage.amss.ac.cn/research/homePage/8eb59241e2e74d828fb84eec0efadba5/myHomePage.html;http://member.acm.org/~tieyanliu", "dblp": ";;;;l/TieYanLiu", "google_scholar": "t-z3K34AAAAJ;;https://scholar.google.com/citations?hl=en;;Nh832fgAAAAJ", "orcid": ";;;;0000-0002-0476-8020", "linkedin": ";https://www.linkedin.com/public-profile/in/shiqi-gong-081847129;;;", "or_profile": "~Qi_Meng1;~Shiqi_Gong1;~Wei_Chen1;~Zhi-Ming_Ma1;~Tie-Yan_Liu1", "aff": "Microsoft;Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Chinese Academy of Sciences;;Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Chinese Academy of Sciences;Microsoft", "aff_domain": "microsoft.com;amss.ac.cn;;amss.ac.cn;microsoft.com", "position": "associate researcher;PhD student;;Full Professor;Distinguished Scientist", "bibtex": "@misc{\nmeng2021dynamic,\ntitle={Dynamic of Stochastic Gradient Descent with State-dependent Noise},\nauthor={Qi Meng and Shiqi Gong and Wei Chen and Zhi-Ming Ma and Tie-Yan Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=Bpw_O132lWT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=Bpw_O132lWT", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "5;3;4;3", "wc_review": "414;151;553;421", "wc_reply_reviewers": "0;0;45;0", "wc_reply_authors": "631;405;813;347", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 384.75, 145.87387531700116 ], "wc_reply_reviewers_avg": [ 11.25, 19.48557158514987 ], "wc_reply_authors_avg": [ 549.0, 185.71483516402236 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14465769628376600927&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "Microsoft;Chinese Academy of Sciences", "aff_unique_dep": "Microsoft Corporation;Academy of Mathematics and Systems Science", "aff_unique_url": "https://www.microsoft.com;http://www.cas.cn", "aff_unique_abbr": "Microsoft;CAS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "United States;China" }, { "id": "BqC9lL-hzY_", "title": "Disentanglement, Visualization and Analysis of Complex Features in DNNs", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "This paper aims to define, visualize, and analyze the feature complexity that is learned by a DNN. We propose a generic definition for the feature complexity. Given the feature of a certain layer in the DNN, our method disentangles and visualizes feature components of different complexity orders from the feature. The disentanglement of feature components enables us to evaluate the reliability, the effectiveness, and the significance of over-fitting of these feature components. Furthermore, such analysis helps to improve the performance of DNNs. As a generic method, the feature complexity also provides new insights into existing deep-learning techniques, such as network compression and knowledge distillation. We will release the code when the paper is accepted.", "keywords": "Interpretability", "primary_area": "", "supplementary_material": "", "author": "Jie Ren;Mingjie Li;Zexu Liu;Quanshi Zhang", "authorids": "~Jie_Ren1;~Mingjie_Li3;~Zexu_Liu2;~Quanshi_Zhang1", "gender": "F;M;M;M", "homepage": "https://jie-ren.github.io/;http://lmjjjjjj.github.io;https://i.sjtu.edu.cn/xsxxxggl/xsgrxxwh_cxXsgrxx.html?gnmkdm=N100801&layout=default&su=518021910581;http://qszhang.com", "dblp": "r/JieRen-18;48/10103;;http://dblp.uni-trier.de/pers/hd/z/Zhang:Quanshi", "google_scholar": "https://scholar.google.com/citations?hl=zh-CN;7dXDygoAAAAJ;;iFFhHK0AAAAJ", "orcid": "0000-0001-9918-3000;;;", "linkedin": ";;;", "or_profile": "~Jie_Ren1;~Mingjie_Li3;~Zexu_Liu2;~Quanshi_Zhang1", "aff": "Shanghai Jiaotong University;Shanghai Jiaotong University;Shanghai Jiaotong University;Shanghai Jiaotong University", "aff_domain": "sjtu.edu.cn;sjtu.edu.cn;sjtu.edu.cn;sjtu.edu.cn", "position": "PhD student;Undergrad student;Undergrad student;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=BqC9lL-hzY_", "pdf_size": 0, "rating": "3;3;4;6", "confidence": "3;2;4;4", "wc_review": "385;53;459;521", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 1.224744871391589 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 354.5, 180.60661671157013 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.7385489458759963, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:1E1t5CiLOqcJ:scholar.google.com/&scioq=Disentanglement,+Visualization+and+Analysis+of+Complex+Features+in+DNNs&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Shanghai Jiao Tong University", "aff_unique_dep": "", "aff_unique_url": "https://www.sjtu.edu.cn", "aff_unique_abbr": "SJTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "title": "Coping with Label Shift via Distributionally Robust Optimisation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3242", "id": "BtZhsSGNRNi", "poster": "", "openreview": "https://openreview.net/forum?id=BtZhsSGNRNi", "slides": "https://iclr.cc/virtual/2021/poster/3242", "video": "https://iclr.cc/virtual/2021/poster/3242", "author_site": "Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra", "tldr": "", "abstract": "The label shift problem refers to the supervised learning setting where the train and test label distributions do not match. Existing work addressing label shift usually assumes access to an unlabelled test sample. This sample may be used to estimate the test label distribution, and to then train a suitably re-weighted classifier. While approaches using this idea have proven effective, their scope is limited as it is not always feasible to access the target domain; further, they require repeated retraining if the model is to be deployed in multiple test environments. Can one instead learn a single classifier that is robust to arbitrary label shifts from a broad family? In this paper, we answer this question by proposing a model that minimises an objective based on distributionally robust optimisation (DRO). We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective. Finally, through experiments on CIFAR-100 and ImageNet, we show that our technique can significantly improve performance over a number of baselines in settings where label shift is present.", "keywords": "Label shift;distributional robust optimization", "primary_area": "", "supplementary_material": "", "author": "Jingzhao Zhang;Aditya Krishna Menon;Andreas Veit;Srinadh Bhojanapalli;Sanjiv Kumar;Suvrit Sra", "authorids": "~Jingzhao_Zhang2;~Aditya_Krishna_Menon1;~Andreas_Veit1;~Srinadh_Bhojanapalli1;~Sanjiv_Kumar1;~Suvrit_Sra1", "gender": "M;;M;;;M", "homepage": "https://sites.google.com/view/jingzhao/home;http://andreasveit.eu/;https://bsrinadh.github.io/;http://www.sanjivk.com/;https://optml.mit.edu;https://akmenon.github.io/", "dblp": "220/5559;133/1801;131/6700;;90/930;89/3514", "google_scholar": "8NudxYsAAAAJ;UA9Hb2EAAAAJ;bpSF_9EAAAAJ;https://scholar.google.com/citations?hl=en;eyCw9goAAAAJ;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Jingzhao_Zhang2;~Andreas_Veit1;~Srinadh_Bhojanapalli1;~Sanjiv_Kumar1;~Suvrit_Sra1;~Aditya_Menon1", "aff": "Massachusetts Institute of Technology;Google;Google;Google;Massachusetts Institute of Technology;Australian National University", "aff_domain": "mit.edu;google.com;google.com;google.com;mit.edu;anu.edu.au", "position": "PhD student;Senior Research Scientist;Research Scientist;Research Scientist;Associate Professor;Fellow", "bibtex": "@inproceedings{\nzhang2021coping,\ntitle={Coping with Label Shift via Distributionally Robust Optimisation},\nauthor={Jingzhao Zhang and Aditya Krishna Menon and Andreas Veit and Srinadh Bhojanapalli and Sanjiv Kumar and Suvrit Sra},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=BtZhsSGNRNi}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "4;6;7", "confidence": "4;4;4", "wc_review": "409;1074;463", "wc_reply_reviewers": "0;368;88", "wc_reply_authors": "657;1319;687", "reply_reviewers": "0;1;2", "reply_authors": "1;2;2", "rating_avg": [ 5.666666666666667, 1.247219128924647 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 648.6666666666666, 301.56296560125253 ], "wc_reply_reviewers_avg": [ 152.0, 156.90336728912695 ], "wc_reply_authors_avg": [ 887.6666666666666, 305.2445285704925 ], "reply_reviewers_avg": [ 1.0, 0.816496580927726 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 91, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4399458559204203801&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=BtZhsSGNRNi", "email": "mit.edu;google.com;google.com;google.com;mit.edu;anu.edu.au", "author_num": 6, "aff_unique_index": "0;1;1;1;0;2", "aff_unique_norm": "Massachusetts Institute of Technology;Google;Australian National University", "aff_unique_dep": ";Google;", "aff_unique_url": "https://web.mit.edu;https://www.google.com;https://www.anu.edu.au", "aff_unique_abbr": "MIT;Google;ANU", "aff_campus_unique_index": "1;1;1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0;0;0;1", "aff_country_unique": "United States;Australia" }, { "id": "BvrKnFq_454", "title": "Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many popular adaptive gradient methods such as Adam and RMSProp rely on an exponential moving average (EMA) to normalize their stepsizes. While the EMA makes these methods highly responsive to new gradient information, recent research has shown that it also causes divergence on at least one convex optimization problem. We propose a novel method called Expectigrad, which adjusts stepsizes according to a per-component unweighted mean of all historical gradients and computes a bias-corrected momentum term jointly between the numerator and denominator. We prove that Expectigrad cannot diverge on every instance of the optimization problem known to cause Adam to diverge. We also establish a regret bound in the general stochastic nonconvex setting that suggests Expectigrad is less susceptible to gradient variance than existing methods are. Testing Expectigrad on several high-dimensional machine learning tasks, we find it often performs favorably to state-of-the-art methods with little hyperparameter tuning.", "keywords": "deep learning;gradient descent;optimization", "primary_area": "", "supplementary_material": "/attachment/55f1e2c404997c8e3bd906be9038915a50c974bb.zip", "author": "Brett Daley;Christopher Amato", "authorids": "~Brett_Daley1;~Christopher_Amato1", "gender": "M;M", "homepage": "https://brett-daley.github.io/;http://www.ccs.neu.edu/home/camato/index.html", "dblp": "157/3749;10/3254", "google_scholar": "PP2_bZ8AAAAJ;-8-sD-sAAAAJ", "orcid": "0000-0002-6402-0751;", "linkedin": "brettdaley/;", "or_profile": "~Brett_Daley1;~Christopher_Amato1", "aff": "Northeastern University;Northeastern University", "aff_domain": "northeastern.edu;neu.edu", "position": "PhD student;Assistant Professor", "bibtex": "@misc{\ndaley2021expectigrad,\ntitle={Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties},\nauthor={Brett Daley and Christopher Amato},\nyear={2021},\nurl={https://openreview.net/forum?id=BvrKnFq_454}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=BvrKnFq_454", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;3;5;3", "wc_review": "637;293;483;518", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "170;361;295;338", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 482.75, 123.53213144765212 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 291.0, 73.76652357268844 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0909090909090909, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10735131716613613053&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Northeastern University", "aff_unique_dep": "", "aff_unique_url": "https://www.northeastern.edu", "aff_unique_abbr": "NEU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "Bw7VC-DJUM", "title": "Learning Spatiotemporal Features via Video and Text Pair Discrimination", "track": "main", "status": "Reject", "tldr": "", "abstract": "Current video representations heavily rely on learning from manually annotated video datasets which are time-consuming and expensive to acquire. We observe videos are naturally accompanied by abundant text information such as YouTube titles and Instagram captions. In this paper, we leverage this visual-textual connection to learn spatiotemporal features in an efficient weakly-supervised manner. We present a general cross-modal pair discrimination (CPD) framework to capture this correlation between a video and its associated text. We train our CPD models on both standard video dataset (Kinetics-210k) and uncurated web video dataset (Instagram-300k) to demonstrate its effectiveness. Without further fine-tuning, the learnt models obtain competitive results for action classification on Kinetics under the linear classification protocol. Moreover, our visual model provides an effective initialization to fine-tune on downstream tasks, which yields a remarkable performance gain for action recognition on UCF101 and HMDB51, compared with the existing state-of-the-art self-supervised training methods. In addition, our CPD demonstrates that pre-training on a relatively small dataset is able to yield a comparable performance to those methods of using order magnitude more data, which is meaningful and practicable for the scenarios with limited computational facilities.", "keywords": "Spatiotemporal Feature Learning;Video and Text Pair Discrimination;Self-/Weakly Supervised Learning", "primary_area": "", "supplementary_material": "", "author": "Tianhao Li;Limin Wang", "authorids": "~Tianhao_Li1;~Limin_Wang1", "gender": "M;M", "homepage": "https://github.com/LLLLLLI;https://wanglimin.github.io", "dblp": "69/2238;68/6610-2", "google_scholar": ";HEuN8PcAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Tianhao_Li1;~Limin_Wang2", "aff": "Nanjing University;Nanjing University", "aff_domain": "nju.edu.cn;nju.edu.cn", "position": "MS student;Full Professor", "bibtex": "@misc{\nli2021learning,\ntitle={Learning Spatiotemporal Features via Video and Text Pair Discrimination},\nauthor={Tianhao Li and Limin Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=Bw7VC-DJUM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=Bw7VC-DJUM", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;5;5;3", "wc_review": "313;485;210;706", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "292;387;323;359", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 428.5, 187.93682449163603 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 340.25, 35.92613950871983 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.6363636363636364, "gs_citation": 53, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=720027539974116232&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Nanjing University", "aff_unique_dep": "", "aff_unique_url": "https://www.nju.edu.cn", "aff_unique_abbr": "Nanjing U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "Bx05YH2W8bE", "title": "DyHCN: Dynamic Hypergraph Convolutional Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Hypergraph Convolutional Network (HCN) has become a default choice for capturing high-order relations among nodes, \\emph{i.e., } encoding the structure of a hypergraph. However, existing HCN models ignore the dynamic evolution of hypergraphs in the real-world scenarios, \\emph{i.e., } nodes and hyperedges in a hypergraph change dynamically over time. To capture the evolution of high-order relations and facilitate relevant analytic tasks, we formulate dynamic hypergraph and devise the Dynamic Hypergraph Convolution Networks (DyHCN). In general, DyHCN consists of a Hypergraph Convolution (HC) to encode the hypergraph structure at a time point and a Temporal Evolution module (TE) to capture the varying of the relations. The HC is delicately designed by equipping inner attention and outer attention, which adaptively aggregate nodes' features to hyperedge and estimate the importance of each hyperedge connected to the centroid node, respectively. Extensive experiments on the Tiigo and Stocktwits datasets show that DyHCN achieves superior performance over existing methods, which implies the effectiveness of capturing the property of dynamic hypergraphs by HC and TE modules.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Nan Yin;zhigang luo;wenjie wang;Fuli Feng;Xiang Zhang", "authorids": "~Nan_Yin1;zgluo@nudt.edu.cn;wenjiewang96@gmail.com;~Fuli_Feng1;~Xiang_Zhang7", "gender": ";;;M;", "homepage": ";;;https://fulifeng.github.io/;", "dblp": ";;;183/9198;", "google_scholar": ";;;https://scholar.google.com.sg/citations?user=QePM4u8AAAAJ;", "orcid": ";;;0000-0002-5828-9842;", "linkedin": ";;;;", "or_profile": "~Nan_Yin1;zgluo@nudt.edu.cn;wenjiewang96@gmail.com;~Fuli_Feng1;~Xiang_Zhang7", "aff": ";;;National University of Singapore;", "aff_domain": ";;;nus.edu.sg;", "position": ";;;Postdoc;", "bibtex": "@misc{\nyin2021dyhcn,\ntitle={Dy{\\{}HCN{\\}}: Dynamic Hypergraph Convolutional Networks},\nauthor={Nan Yin and zhigang luo and wenjie wang and Fuli Feng and Xiang Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=Bx05YH2W8bE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Bx05YH2W8bE", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;3;4;3", "wc_review": "419;292;192;248", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "925;742;482;621", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 287.75, 83.65517019288168 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 692.5, 162.73367813701012 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:xkrckZnhnboJ:scholar.google.com/&scioq=DyHCN:+Dynamic+Hypergraph+Convolutional+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "National University of Singapore", "aff_unique_dep": "", "aff_unique_url": "https://www.nus.edu.sg", "aff_unique_abbr": "NUS", "aff_country_unique_index": "0", "aff_country_unique": "Singapore" }, { "title": "Neural networks with late-phase weights", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2930", "id": "C0qJUx5dxFb", "poster": "", "openreview": "https://openreview.net/forum?id=C0qJUx5dxFb", "slides": "https://iclr.cc/virtual/2021/poster/2930", "video": "https://iclr.cc/virtual/2021/poster/2930", "author_site": "Johannes von Oswald, Seijin Kobayashi, Joao Sacramento, Alexander Meulemans, Christian Henning, Benjamin F Grewe", "tldr": "", "abstract": "The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space. To avoid incurring increased computational costs, we investigate a family of low-dimensional late-phase weight models which interact multiplicatively with the remaining parameters. Our results show that augmenting standard models with late-phase weights improves generalization in established benchmarks such as CIFAR-10/100, ImageNet and enwik8. These findings are complemented with a theoretical analysis of a noisy quadratic problem which provides a simplified picture of the late phases of neural network learning.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/92f433c3960cdfafa364cbe3782305746f5b858a.zip", "author": "Johannes von Oswald;Seijin Kobayashi;Joao Sacramento;Alexander Meulemans;Christian Henning;Benjamin F Grewe", "authorids": "~Johannes_von_Oswald2;seijink@ethz.ch;~Joao_Sacramento1;~Alexander_Meulemans1;~Christian_Henning1;~Benjamin_F_Grewe1", "gender": "Not Specified;;M;M;M;M", "homepage": "https://as.inf.ethz.ch/people/members/voswaldj/index.html;;http://www.joaosacramento.com;http://alexandermeulemans.com/;https://www.ini.uzh.ch/en/institute/people?uname=christian;https://www.ini.uzh.ch/en/institute/people?uname=bgrewe", "dblp": "242/8029;;59/9214;267/9546;;", "google_scholar": "https://scholar.google.ch/citations?user=jdnL-PgAAAAJ;;9hpcmYUAAAAJ;https://scholar.google.ch/citations?user=nnMccw4AAAAJ;u6QSFrsAAAAJ;https://scholar.google.de/citations?user=ZA-1rh8AAAAJ", "orcid": ";;;;;0000-0001-8560-2120", "linkedin": "johswald/?originalSubdomain=de;;;alexander-meulemans-72589b146/;christian-henning/;", "or_profile": "~Johannes_von_Oswald2;seijink@ethz.ch;~Joao_Sacramento1;~Alexander_Meulemans1;~Christian_Henning1;~Benjamin_F_Grewe1", "aff": "Swiss Federal Institute of Technology;;Department of Computer Science, ETHZ - ETH Zurich;Swiss Federal Institute of Technology;Swiss Federal Institute of Technology;ETHZ - ETH Zurich", "aff_domain": "ethz.ch;;inf.ethz.ch;ethz.ch;ethz.ch;ethz.ch", "position": "PhD student;;Principal Researcher;PhD student;PhD student;Assistant Professor", "bibtex": "@inproceedings{\noswald2021neural,\ntitle={Neural networks with late-phase weights},\nauthor={Johannes von Oswald and Seijin Kobayashi and Joao Sacramento and Alexander Meulemans and Christian Henning and Benjamin F Grewe},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=C0qJUx5dxFb}\n}", "github": "[![github](/images/github_icon.svg) google/uncertainty-baselines](https://github.com/google/uncertainty-baselines) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=C0qJUx5dxFb)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;2;4;4", "wc_review": "343;647;355;332", "wc_reply_reviewers": "60;0;106;146", "wc_reply_authors": "953;1075;1067;1000", "reply_reviewers": "1;0;1;1", "reply_authors": "3;2;3;3", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 419.25, 131.74288405830504 ], "wc_reply_reviewers_avg": [ 78.0, 54.35071296680477 ], "wc_reply_authors_avg": [ 1023.75, 50.16659745288692 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.75, 0.4330127018922193 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 40, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18006401618845787633&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=C0qJUx5dxFb", "email": "ethz.ch;;inf.ethz.ch;ethz.ch;ethz.ch;ethz.ch", "author_num": 6, "aff_unique_index": "0;1;0;0;1", "aff_unique_norm": "Swiss Federal Institute of Technology;ETH Zurich", "aff_unique_dep": ";Department of Computer Science", "aff_unique_url": "https://www.ethz.ch;https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich;ETHZ", "aff_campus_unique_index": "1", "aff_campus_unique": ";Zurich", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "Switzerland" }, { "id": "C1VUD8RZ5wq", "title": "A Closer Look at Codistillation for Distributed Training", "track": "main", "status": "Reject", "tldr": "", "abstract": "Codistillation has been proposed as a mechanism to share knowledge among concurrently trained models by encouraging them to represent the same function through an auxiliary loss. This contrasts with the more commonly used fully-synchronous data-parallel stochastic gradient descent methods, where different model replicas average their gradients (or parameters) at every iteration and thus maintain identical parameters. We investigate codistillation in a distributed training setup, complementing previous work which focused on extremely large batch sizes. Surprisingly, we find that even at moderate batch sizes, models trained with codistillation can perform as well as models trained with synchronous data-parallel methods, despite using a much weaker synchronization mechanism. These findings hold across a range of batch sizes and learning rate schedules, as well as different kinds of models and datasets. Obtaining this level of accuracy, however, requires properly accounting for the regularization effect of codistillation, which we highlight through several empirical observations. Overall, this work contributes to a better understanding of codistillation and how to best take advantage of it in a distributed computing environment.", "keywords": "Distributed Training;Distillation;Neural Networks;Deep Learning;Large-scale Learning", "primary_area": "", "supplementary_material": "/attachment/af1c1b3346a61b58220787fa1cc444198e2c3494.zip", "author": "Shagun Sodhani;Olivier Delalleau;Mido Assran;Koustuv Sinha;Nicolas Ballas;Michael Rabbat", "authorids": "~Shagun_Sodhani1;~Olivier_Delalleau1;~Mido_Assran1;~Koustuv_Sinha1;~Nicolas_Ballas1;~Michael_Rabbat1", "gender": "M;M;M;;M;M", "homepage": "https://shagunsodhani.com;;https://koustuvsinha.com/;;;http://www.midoassran.ca/", "dblp": "http://dblp.uni-trier.de/pers/hd/s/Sodhani:Shagun;68/2192;210/0890;120/9066;47/1744;216/2717", "google_scholar": "ixp-vqMAAAAJ;https://scholar.google.ca/citations?user=zqLpO2QAAAAJ;9P9QcckAAAAJ;euUV4iUAAAAJ;https://scholar.google.ch/citations?user=cMPKe9UAAAAJ;gcQTTvkAAAAJ", "orcid": ";0000-0002-0610-7226;;;;0000-0001-9159-8447", "linkedin": "shagun-sodhani-b2239879;odelalleau;;;;", "or_profile": "~Shagun_Sodhani1;~Olivier_Delalleau1;~Koustuv_Sinha1;~Nicolas_Ballas1;~Michael_Rabbat1;~Mahmoud_Assran1", "aff": "Meta Facebook;Meta AI (FAIR);McGill University / Mila;Meta;Mila;Meta Facebook", "aff_domain": "fb.com;fb.com;mcgill.ca;meta.com;mila.quebec;fb.com", "position": "Researcher;Research Engineering Manager;PhD student;Researcher;Associate Member;Researcher", "bibtex": "@misc{\nsodhani2021a,\ntitle={A Closer Look at Codistillation for Distributed Training},\nauthor={Shagun Sodhani and Olivier Delalleau and Mido Assran and Koustuv Sinha and Nicolas Ballas and Michael Rabbat},\nyear={2021},\nurl={https://openreview.net/forum?id=C1VUD8RZ5wq}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=C1VUD8RZ5wq", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;4;4;2", "wc_review": "572;491;371;772", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "930;749;1165;782", "reply_reviewers": "0;0;0;0", "reply_authors": "3;3;4;3", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 551.5, 146.01455406910642 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 906.5, 164.0739162694668 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 3.25, 0.4330127018922193 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9688134754845472471&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;1;0;2;0", "aff_unique_norm": "Meta;McGill University;Mila", "aff_unique_dep": "Meta Platforms, Inc.;Mila;Quebec Artificial Intelligence Institute", "aff_unique_url": "https://meta.com;https://www.mcgill.ca;https://mila.quebec", "aff_unique_abbr": "Meta;McGill;Mila", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0;1;0", "aff_country_unique": "United States;Canada" }, { "title": "Understanding Over-parameterization in Generative Adversarial Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3216", "id": "C3qvk5IQIJY", "poster": "", "openreview": "https://openreview.net/forum?id=C3qvk5IQIJY", "slides": "https://iclr.cc/virtual/2021/poster/3216", "video": "https://iclr.cc/virtual/2021/poster/3216", "author_site": "Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat, Mucong Ding, Dominik St\u00f6ger, Mahdi Soltanolkotabi, Soheil Feizi", "tldr": "", "abstract": "A broad class of unsupervised deep learning methods such as Generative Adversarial Networks (GANs) involve training of overparameterized models where the number of parameters of the model exceeds a certain threshold. Indeed, most successful GANs used in practice are trained using overparameterized generator and discriminator networks, both in terms of depth and width. A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using Gradient Descent/Ascent (GDA).\nThe role and benefits of model overparameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this work, we present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. We theoretically show that in an overparameterized GAN model with a $1$-layer neural network generator and a linear discriminator, GDA converges to a global saddle point of the underlying non-convex concave min-max problem. To the best of our knowledge, this is the first result for global convergence of GDA in such settings. Our theory is based on a more general result that holds for a broader class of nonlinear generators and discriminators that obey certain assumptions (including deeper generators and random feature discriminators). Our theory utilizes and builds upon a novel connection with the convergence analysis of linear time-varying dynamical systems which may have broader implications for understanding the convergence behavior of GDA for non-convex concave problems involving overparameterized models. We also empirically study the role of model overparameterization in GANs using several large-scale experiments on CIFAR-10 and Celeb-A datasets. Our experiments show that overparameterization improves the quality of generated samples across various model architectures and datasets. Remarkably, we observe that overparameterization leads to faster and more stable convergence behavior of GDA across the board.", "keywords": "GAN;Over-parameterization;min-max optimization", "primary_area": "", "supplementary_material": "", "author": "Yogesh Balaji;Mohammadmahdi Sajedi;Neha Mukund Kalibhat;Mucong Ding;Dominik St\u00f6ger;Mahdi Soltanolkotabi;Soheil Feizi", "authorids": "~Yogesh_Balaji1;sajedi@usc.edu;~Neha_Mukund_Kalibhat1;~Mucong_Ding1;~Dominik_St\u00f6ger1;~Mahdi_Soltanolkotabi1;~Soheil_Feizi2", "gender": "M;;F;M;M;M;M", "homepage": "https://yogeshbalaji.github.io/;;https://sites.google.com/view/nehakalibhat;http://www.cs.umd.edu/~mcding/;;http://www-bcf.usc.edu/~soltanol/;https://www.cs.umd.edu/~sfeizi/", "dblp": "185/6906;;276/0300;232/1754.html;199/2106;75/6691;57/2132", "google_scholar": "0I2qH0oAAAAJ;;HYT-q5MAAAAJ;_bVao2MAAAAJ;https://scholar.google.de/citations?user=-aLITVUAAAAJ;narJyMAAAAAJ;lptAmrMAAAAJ", "orcid": ";;;0000-0002-6173-8055;;;", "linkedin": ";;neha-kalibhat/;mucong-ding-489296104;;;", "or_profile": "~Yogesh_Balaji1;sajedi@usc.edu;~Neha_Mukund_Kalibhat1;~Mucong_Ding1;~Dominik_St\u00f6ger1;~Mahdi_Soltanolkotabi1;~Soheil_Feizi2", "aff": "Department of Computer Science, University of Maryland, College Park;;University of Maryland, College Park;Department of Computer Science, University of Maryland, College Park;University of Southern California;University of Southern California;University of Maryland, College Park", "aff_domain": "cs.umd.edu;;umd.edu;cs.umd.edu;usc.edu;usc.edu;umd.edu", "position": "PhD student;;PhD student;PhD student;Postdoc;Assistant Professor;Assistant Professor", "bibtex": "@inproceedings{\nbalaji2021understanding,\ntitle={Understanding Over-parameterization in Generative Adversarial Networks},\nauthor={Yogesh Balaji and Mohammadmahdi Sajedi and Neha Mukund Kalibhat and Mucong Ding and Dominik St{\\\"o}ger and Mahdi Soltanolkotabi and Soheil Feizi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=C3qvk5IQIJY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "5;2;3;3", "wc_review": "499;328;742;392", "wc_reply_reviewers": "355;439;0;0", "wc_reply_authors": "962;1585;732;235", "reply_reviewers": "1;1;0;0", "reply_authors": "2;3;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.25, 1.0897247358851685 ], "wc_review_avg": [ 490.25, 157.6647947387114 ], "wc_reply_reviewers_avg": [ 198.5, 200.7093669961619 ], "wc_reply_authors_avg": [ 878.5, 485.1981553963288 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.7894736842105263, "gs_citation": 37, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14372289851354274809&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=C3qvk5IQIJY", "email": "cs.umd.edu;;umd.edu;cs.umd.edu;usc.edu;usc.edu;umd.edu", "author_num": 7, "aff_unique_index": "0;1;0;2;2;1", "aff_unique_norm": "University of Maryland, College Park;University of Maryland;University of Southern California", "aff_unique_dep": "Department of Computer Science;;", "aff_unique_url": "https://www/umd.edu;https://www/umd.edu;https://www.usc.edu", "aff_unique_abbr": "UMD;UMD;USC", "aff_campus_unique_index": "0;0;0;1;1;0", "aff_campus_unique": "College Park;Los Angeles", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "C4-QQ1EHNcI", "title": "Expressive yet Tractable Bayesian Deep Learning via Subnetwork Inference", "track": "main", "status": "Reject", "tldr": "", "abstract": "The Bayesian paradigm has the potential to solve some of the core issues in modern deep learning, such as poor calibration, data inefficiency, and catastrophic forgetting. However, scaling Bayesian inference to the high-dimensional parameter spaces of deep neural networks requires restrictive approximations. In this paper, we propose performing inference over only a small subset of the model parameters while keeping all others as point estimates. This enables us to use expressive posterior approximations that would otherwise be intractable for the full model. In particular, we develop a practical and scalable Bayesian deep learning method that first trains a point estimate, and then infers a full covariance Gaussian posterior approximation over a subnetwork. We propose a subnetwork selection procedure which aims to maximally preserve posterior uncertainty. We empirically demonstrate the effectiveness of our approach compared to point-estimated networks and methods that use less expressive posterior approximations over the full network.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/58d642a42455eb2a510c3017158d73ff67fd84f9.zip", "author": "Erik Daxberger;Eric Nalisnick;James Allingham;Javier Antoran;Jos\u00e9 Miguel Hern\u00e1ndez-Lobato", "authorids": "~Erik_Daxberger1;~Eric_Nalisnick1;jua23@cam.ac.uk;~Javier_Antoran1;~Jos\u00e9_Miguel_Hern\u00e1ndez-Lobato1", "gender": "M;M;;Unspecified;", "homepage": ";https://enalisnick.github.io;;https://javierantoran.github.io/about/;", "dblp": ";136/4057;;234/8818.html;", "google_scholar": "7L4W8KwAAAAJ;cb1ZN7AAAAAJ;;_b-Cs2cAAAAJ;", "orcid": ";;;0000-0003-2877-2689;", "linkedin": "edaxberger;;;javier-antoran/;", "or_profile": "~Erik_Daxberger1;~Eric_Nalisnick1;jua23@cam.ac.uk;~Javier_Antoran1;~Jos\u00e9_Miguel_Hern\u00e1ndez-Lobato1", "aff": "Max-Planck Institute for Intelligent Systems;University of Amsterdam;;University of Cambridge;", "aff_domain": "mpg.de;uva.nl;;cam.ac.uk;", "position": "PhD student;Assistant Professor;;PhD student;", "bibtex": "@misc{\ndaxberger2021expressive,\ntitle={Expressive yet Tractable Bayesian Deep Learning via Subnetwork Inference},\nauthor={Erik Daxberger and Eric Nalisnick and James Allingham and Javier Antoran and Jos{\\'e} Miguel Hern{\\'a}ndez-Lobato},\nyear={2021},\nurl={https://openreview.net/forum?id=C4-QQ1EHNcI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=C4-QQ1EHNcI", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "5;4;4;4", "wc_review": "431;1169;595;624", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "683;868;964;454", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;2;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 704.75, 277.9535707631762 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 742.25, 194.6694313445231 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17876529081418563176&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;2", "aff_unique_norm": "Max-Planck Institute for Intelligent Systems;University of Amsterdam;University of Cambridge", "aff_unique_dep": ";;", "aff_unique_url": "https://www.mpi-is.mpg.de;https://www.uva.nl;https://www.cam.ac.uk", "aff_unique_abbr": "MPI-IS;UvA;Cambridge", "aff_campus_unique_index": "1", "aff_campus_unique": ";Cambridge", "aff_country_unique_index": "0;1;2", "aff_country_unique": "Germany;Netherlands;United Kingdom" }, { "id": "C5kn825mU19", "title": "A Coach-Player Framework for Dynamic Team Composition", "track": "main", "status": "Reject", "tldr": "", "abstract": "In real-world multi-agent teams, agents with different capabilities may join or leave \"on the fly\" without altering the team's overarching goals. Coordinating teams with such dynamic composition remains a challenging problem: the optimal team strategy may vary with its composition. Inspired by real-world team sports, we propose a coach-player framework to tackle this problem. We assume that the players only have a partial view of the environment, while the coach has a complete view. The coach coordinates the players by distributing individual strategies. Specifically, we 1) propose an attention mechanism for both the players and the coach; 2) incorporate a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with different players. Our attention mechanism on the players and the coach allows for a varying number of heterogeneous agents, and can thus tackle the dynamic team composition. We validate our methods on resource collection tasks in multi-agent particle environment. We demonstrate zero-shot generalization to new team compositions with varying numbers of heterogeneous agents. The performance of our method is comparable or even better than the setting where all players have a full view of the environment, but no coach. Moreover, we see that the performance stays nearly the same even when the coach communicates as little as 13% of the time using our adaptive communication strategy. These results demonstrate the significance of a coach to coordinate players in dynamic teams.", "keywords": "Multiagent reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/654ca78b1fcb903345f14493809d6b6dab9f2c2b.zip", "author": "Bo Liu;qiang liu;Peter Stone;Animesh Garg;Yuke Zhu;Anima Anandkumar", "authorids": "~Bo_Liu13;~qiang_liu4;~Peter_Stone1;~Animesh_Garg1;~Yuke_Zhu1;~Anima_Anandkumar1", "gender": "M;M;M;M;F;M", "homepage": "https://cranial-xix.github.io/;http://www.cs.utexas.edu/~pstone;http://animesh.garg.tech;https://cs.utexas.edu/~yukez/;http://tensorlab.cms.caltech.edu/users/anima/;https://www.cs.utexas.edu/~lqiang/", "dblp": ";s/PeterStone;123/5728;133/1772;;61/3234-1", "google_scholar": "https://scholar.google.com/citations?hl=en;qnwjcfAAAAAJ;zp8V7ZMAAAAJ;mWGyYMsAAAAJ;bEcLezcAAAAJ;https://scholar.google.com.tw/citations?user=2qDh4WUAAAAJ", "orcid": ";0000-0002-6795-420X;0000-0003-0482-4296;;;", "linkedin": ";;animeshgarg/;;anima-anandkumar-35171b1/;", "or_profile": "~Bo_Liu13;~Peter_Stone1;~Animesh_Garg1;~Yuke_Zhu1;~anima_anandkumar1;~Qiang_Liu1", "aff": "University of Texas, Austin;University of Texas, Austin;University of Toronto;Computer Science Department, University of Texas, Austin;California Institute of Technology;University of Texas, Austin", "aff_domain": "cs.utexas.edu;utexas.edu;toronto.edu;cs.utexas.edu;caltech.edu;utexas.edu", "position": "PhD student;Full Professor;Assistant Professor;Assistant Professor;Full Professor;Assistant Professor", "bibtex": "@misc{\nliu2021a,\ntitle={A Coach-Player Framework for Dynamic Team Composition},\nauthor={Bo Liu and qiang liu and Peter Stone and Animesh Garg and Yuke Zhu and Anima Anandkumar},\nyear={2021},\nurl={https://openreview.net/forum?id=C5kn825mU19}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=C5kn825mU19", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "3;3;3;2", "wc_review": "629;395;466;312", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "384;350;434;395", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 450.5, 116.58151654529118 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 390.75, 29.978117018918983 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.7745966692414834, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:1uWx9ijfrCQJ:scholar.google.com/&scioq=A+Coach-Player+Framework+for+Dynamic+Team+Composition&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;1;0;2;0", "aff_unique_norm": "University of Texas at Austin;University of Toronto;California Institute of Technology", "aff_unique_dep": ";;", "aff_unique_url": "https://www.utexas.edu;https://www.utoronto.ca;https://www.caltech.edu", "aff_unique_abbr": "UT Austin;U of T;Caltech", "aff_campus_unique_index": "0;0;0;2;0", "aff_campus_unique": "Austin;;Pasadena", "aff_country_unique_index": "0;0;1;0;0;0", "aff_country_unique": "United States;Canada" }, { "id": "C5th0zC9NPQ", "title": "Sensory Resilience based on Synesthesia", "track": "main", "status": "Reject", "tldr": "", "abstract": "Situated cognition depends on accessing environmental state through sensors. Engineering and cost constraints usually lead to limited \u201cpathways\u201d where, for example, a vision sub-system only includes a camera and the software to deal with it. This traditional and rational design style entails any hardware defect on the pathway causes the system to grind to a halt until repair. We propose a \u201csensoriplexer\u201d as drop-in neural component architecture to address this issue, under the common scenario of multiple sensors availability. This component architecture learns to mix and relate pathways, such that an agent facing failure in a sensory sub-system can degrade gracefully and coherently by relying on its other sub- systems. The architecture is inspired by the concept of synesthesia, and relies on statistical coupling between sensor signals. We show the benefit and limitation of the architecture on a simple shape recognition and a more complex emotion recognition scenarios.", "keywords": "perception;resilience;robotics;synesthesia", "primary_area": "", "supplementary_material": "/attachment/35a29e349d4b703a2289f4fca517cbe4880c6469.zip", "author": "Eric Platon;Tom Sonoda", "authorids": "~Eric_Platon1;~Tom_Sonoda1", "gender": ";M", "homepage": ";https://github.com/tomsonoda", "dblp": ";", "google_scholar": ";", "orcid": "0000-0002-1999-3335;", "linkedin": ";tomsonoda/", "or_profile": "~Eric_Platon1;~Tom_Sonoda1", "aff": "Cosmos Times;", "aff_domain": "cosmosx.ai;", "position": "Founder;", "bibtex": "@misc{\nplaton2021sensory,\ntitle={Sensory Resilience based on Synesthesia},\nauthor={Eric Platon and Tom Sonoda},\nyear={2021},\nurl={https://openreview.net/forum?id=C5th0zC9NPQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=C5th0zC9NPQ", "pdf_size": 0, "rating": "2;3;5", "confidence": "5;4;4", "wc_review": "235;303;353", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "779;444;836", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 3.3333333333333335, 1.247219128924647 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 297.0, 48.359762888859024 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 686.3333333333334, 172.92837309771411 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7559289460184545, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:T2w1I05gfIQJ:scholar.google.com/&scioq=Sensory+Resilience+based+on+Synesthesia&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Cosmos Times", "aff_unique_dep": "", "aff_unique_url": "", "aff_unique_abbr": "" }, { "title": "Multi-Level Local SGD: Distributed SGD for Heterogeneous Hierarchical Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3195", "id": "C70cp4Cn32", "poster": "", "openreview": "https://openreview.net/forum?id=C70cp4Cn32", "slides": "https://iclr.cc/virtual/2021/poster/3195", "video": "https://iclr.cc/virtual/2021/poster/3195", "author_site": "Timothy Castiglia, Anirban Das, Stacy Patterson", "tldr": "", "abstract": "We propose Multi-Level Local SGD, a distributed stochastic gradient method for learning a smooth, non-convex objective in a multi-level communication network with heterogeneous workers. Our network model consists of a set of disjoint sub-networks, with a single hub and multiple workers; further, workers may have different operating rates. The hubs exchange information with one another via a connected, but not necessarily complete communication network. In our algorithm, sub-networks execute a distributed SGD algorithm, using a hub-and-spoke paradigm, and the hubs periodically average their models with neighboring hubs. We first provide a unified mathematical framework that describes the Multi-Level Local SGD algorithm. We then present a theoretical analysis of the algorithm; our analysis shows the dependence of the convergence error on the worker node heterogeneity, hub network topology, and the number of local, sub-network, and global iterations. We illustrate the effectiveness of our algorithm in a multi-level network with slow workers via simulation-based experiments.", "keywords": "Machine Learning;Stochastic Gradient Descent;Federated Learning;Hierarchical Networks;Distributed;Heterogeneous;Convergence Analysis", "primary_area": "", "supplementary_material": "", "author": "Timothy Castiglia;Anirban Das;Stacy Patterson", "authorids": "~Timothy_Castiglia1;dasa2@rpi.edu;~Stacy_Patterson1", "gender": "M;;", "homepage": ";;https://www.cs.rpi.edu/~pattes3/", "dblp": ";;", "google_scholar": "5zGUUmUAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Timothy_Castiglia1;dasa2@rpi.edu;~Stacy_Patterson1", "aff": "Rensselaer Polytechnic Institute;;Rensselaer Polytechnic Institute", "aff_domain": "rpi.edu;;rpi.edu", "position": "PhD student;;Associate Professor", "bibtex": "@inproceedings{\ncastiglia2021multilevel,\ntitle={Multi-Level Local {\\{}SGD{\\}}: Distributed {\\{}SGD{\\}} for Heterogeneous Hierarchical Networks},\nauthor={Timothy Castiglia and Anirban Das and Stacy Patterson},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=C70cp4Cn32}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;3;4;5", "wc_review": "217;292;379;384", "wc_reply_reviewers": "0;0;145;0", "wc_reply_authors": "418;466;1107;466", "reply_reviewers": "0;0;2;0", "reply_authors": "1;1;4;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 318.0, 68.8367634335026 ], "wc_reply_reviewers_avg": [ 36.25, 62.7868417743718 ], "wc_reply_authors_avg": [ 614.25, 285.1634399778485 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 53, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6037996595799249493&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=C70cp4Cn32", "email": "rpi.edu;;rpi.edu", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Rensselaer Polytechnic Institute", "aff_unique_dep": "", "aff_unique_url": "https://www.rpi.edu", "aff_unique_abbr": "RPI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "CBYttzvzAS", "title": "Incremental Policy Gradients for Online Reinforcement Learning Control", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Policy gradient methods are built on the policy gradient theorem, which involves a term representing the complete sum of rewards into the future: the return. Due to this, one usually either waits until the end of an episode before performing updates, or learns an estimate of this return--a so-called critic. Our emphasis is on the first approach in this work, detailing an incremental policy gradient update which neither waits until the end of the episode, nor relies on learning estimates of the return. We provide on-policy and off-policy variants of our algorithm, for both the discounted return and average reward settings. Theoretically, we draw a connection between the traces our methods use and the stationary distributions of the discounted and average reward settings. We conclude with an experimental evaluation of our methods on both simple-to-understand and complex domains.", "keywords": "reinforcement learning;policy gradient;incremental;online;eligibility traces", "primary_area": "", "supplementary_material": "/attachment/2726b44a30c1a3f6708cb3ce53f7da9cbf43d500.zip", "author": "Kristopher De Asis;Alan Chan;Yi Wan;Richard S. Sutton", "authorids": "~Kristopher_De_Asis1;~Alan_Chan2;~Yi_Wan1;~Richard_S._Sutton1", "gender": "Unspecified;M;M;M", "homepage": "https://kris.pengy.ca/;https://achan.ca;https://sites.google.com/view/yi-wan/;http://richsutton.com", "dblp": "198/1319;;;48/6070", "google_scholar": "https://scholar.google.ca/citations?user=NCPKYKUAAAAJ;lmQmYPgAAAAJ;zMVstroAAAAJ;https://scholar.google.ca/citations?user=6m4wv6gAAAAJ", "orcid": ";;;0000-0002-3679-3415", "linkedin": "krisdeasis/;alan-chan-51858378/;;richard-sutton-0653545/", "or_profile": "~Kristopher_De_Asis1;~Alan_Chan2;~Yi_Wan1;~Richard_S_Sutton1", "aff": "University of Alberta;University of Montreal;University of Alberta;Google DeepMind", "aff_domain": "ualberta.ca;umontreal.ca;ualberta.ca;deepmind.com", "position": "PhD student;PhD student;PhD student;Research Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=CBYttzvzAS", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:dOy3XcMF2j4J:scholar.google.com/&scioq=Incremental+Policy+Gradients+for+Online+Reinforcement+Learning+Control&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "University of Alberta;University of Montreal;Google", "aff_unique_dep": ";;Google DeepMind", "aff_unique_url": "https://www.ualberta.ca;https://wwwumontreal.ca;https://deepmind.com", "aff_unique_abbr": "UAlberta;UM;DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "Canada;United Kingdom" }, { "title": "Optimism in Reinforcement Learning with Generalized Linear Function Approximation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2563", "id": "CBmJwzneppz", "poster": "", "openreview": "https://openreview.net/forum?id=CBmJwzneppz", "slides": "https://iclr.cc/virtual/2021/poster/2563", "video": "https://iclr.cc/virtual/2021/poster/2563", "author_site": "Yining Wang, Ruosong Wang, Simon Du, Akshay Krishnamurthy", "tldr": "", "abstract": "We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call ``optimistic closure,'' which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of $\\widetilde{O}\\left(H\\sqrt{d^3 T}\\right)$ where $H$ is the horizon, $d$ is the dimensionality of the state-action features and $T$ is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions.", "keywords": "reinforcement learning;optimism;exploration;function approximation;theory;regret analysis;provable sample efficiency", "primary_area": "", "supplementary_material": "", "author": "Yining Wang;Ruosong Wang;Simon Shaolei Du;Akshay Krishnamurthy", "authorids": "~Yining_Wang1;~Ruosong_Wang1;~Simon_Shaolei_Du1;~Akshay_Krishnamurthy1", "gender": "M;M;M;M", "homepage": "https://yining-wang.com;http://www.cs.cmu.edu/~ruosongw/;http://simonshaoleidu.com;https://www.cics.umass.edu/~akshay/", "dblp": "04/7235;183/6164;176/5602;85/8024", "google_scholar": "HpQGq54AAAAJ;n8ZpnWMAAAAJ;OttawxUAAAAJ;https://scholar.google.com.tw/citations?user=K0kaNvkAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Yining_Wang1;~Ruosong_Wang1;~Simon_Shaolei_Du1;~Akshay_Krishnamurthy1", "aff": "University of Florida;Carnegie Mellon University;Meta Facebook;Microsoft Research", "aff_domain": "ufl.edu;cmu.edu;fb.com;research.microsoft.com", "position": "Assistant Professor;PhD student;Visiting Professor;Principal Researcher", "bibtex": "@inproceedings{\nwang2021optimism,\ntitle={Optimism in Reinforcement Learning with Generalized Linear Function Approximation},\nauthor={Yining Wang and Ruosong Wang and Simon Shaolei Du and Akshay Krishnamurthy},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=CBmJwzneppz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;4;3;3", "wc_review": "835;888;339;249", "wc_reply_reviewers": "414;234;0;0", "wc_reply_authors": "604;323;131;163", "reply_reviewers": "1;1;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 577.75, 286.14277467725793 ], "wc_reply_reviewers_avg": [ 162.0, 174.05171645232343 ], "wc_reply_authors_avg": [ 305.25, 187.19291519713025 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 186, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6949434075641133803&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=CBmJwzneppz", "email": "ufl.edu;cmu.edu;fb.com;research.microsoft.com", "author_num": 4, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "University of Florida;Carnegie Mellon University;Meta;Microsoft", "aff_unique_dep": ";;Meta Platforms, Inc.;Microsoft Research", "aff_unique_url": "https://www.ufl.edu;https://www.cmu.edu;https://meta.com;https://www.microsoft.com/en-us/research", "aff_unique_abbr": "UF;CMU;Meta;MSR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Spatio-Temporal Graph Scattering Transform", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3080", "id": "CF-ZIuSMXRz", "poster": "", "openreview": "https://openreview.net/forum?id=CF-ZIuSMXRz", "slides": "https://iclr.cc/virtual/2021/poster/3080", "video": "https://iclr.cc/virtual/2021/poster/3080", "author_site": "Chao Pan, Siheng Chen, Antonio Ortega", "tldr": "", "abstract": "Although spatio-temporal graph neural networks have achieved great empirical success in handling multiple correlated time series, they may be impractical in some real-world scenarios due to a lack of sufficient high-quality training data. Furthermore, spatio-temporal graph neural networks lack theoretical interpretation. To address these issues, we put forth a novel mathematically designed framework to analyze spatio-temporal data. Our proposed spatio-temporal graph scattering transform (ST-GST) extends traditional scattering transform to the spatio-temporal domain. It performs iterative applications of spatio-temporal graph wavelets and nonlinear activation functions, which can be viewed as a forward pass of spatio-temporal graph convolutional networks without training. Since all the filter coefficients in ST-GST are mathematically designed, it is promising for the real-world scenarios with limited training data, and also allows for a theoretical analysis, which shows that the proposed ST-GST is stable to small perturbations of input signals and structures. Finally, our experiments show that i) ST-GST outperforms spatio-temporal graph convolutional networks by an increase of 35% in accuracy for MSR Action3D dataset; ii) it is better and computationally more efficient to design the transform based on separable spatio-temporal graphs than the joint ones; and iii) nonlinearity in ST-GST is critical to empirical performance.", "keywords": "scattering transform;spatio-temporal graph;graph neural networks;skeleton-based action recognition", "primary_area": "", "supplementary_material": "/attachment/f2a9bd450f0133b73b7f811386e5491fb704e24d.zip", "author": "Chao Pan;Siheng Chen;Antonio Ortega", "authorids": "~Chao_Pan2;~Siheng_Chen1;~Antonio_Ortega1", "gender": "M;M;M", "homepage": ";http://biron.usc.edu/wiki/index.php/Antonio_Ortega;https://siheng-chen.github.io/", "dblp": "06/7730-3;o/AntonioOrtega;136/4945", "google_scholar": "M3T3YPIAAAAJ;K4bCJYcAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": "0000-0002-9275-7072;0000-0001-5403-0940;", "linkedin": "chao-pan-5abb7314b/;ortegaantonio/;", "or_profile": "~Chao_Pan2;~Antonio_Ortega1;~Siheng_Chen2", "aff": "University of Illinois, Urbana Champaign;University of Southern California;Shanghai Jiaotong University", "aff_domain": "illinois.edu;usc.edu;sjtu.edu.cn", "position": "PhD student;Full Professor;Associate Professor", "bibtex": "@inproceedings{\npan2021spatiotemporal,\ntitle={Spatio-Temporal Graph Scattering Transform},\nauthor={Chao Pan and Siheng Chen and Antonio Ortega},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=CF-ZIuSMXRz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;9", "confidence": "5;4;3;2", "wc_review": "114;5;71;641", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "340;27;312;1252", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 7.0, 1.224744871391589 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 207.75, 253.1317591690146 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 482.75, 460.7023849514999 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9128709291752768, "gs_citation": 30, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3511539600898607834&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=CF-ZIuSMXRz", "email": "illinois.edu;usc.edu;sjtu.edu.cn", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Illinois Urbana-Champaign;University of Southern California;Shanghai Jiao Tong University", "aff_unique_dep": ";;", "aff_unique_url": "https://illinois.edu;https://www.usc.edu;https://www.sjtu.edu.cn", "aff_unique_abbr": "UIUC;USC;SJTU", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Urbana-Champaign;Los Angeles;", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;China" }, { "id": "CGFN_nV1ql", "title": "Non-Attentive Tacotron: Robust and controllable neural TTS synthesis including unsupervised duration modeling", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-to-speech model, replacing the attention mechanism with an explicit duration predictor. This improves robustness significantly as measured by unaligned duration ratio and word deletion rate, two metrics introduced in this paper for large-scale robustness evaluation using a pre-trained speech recognition model. With the use of Gaussian upsampling, Non-Attentive Tacotron achieves a 5-scale mean opinion score for naturalness of 4.41, slightly outperforming Tacotron 2. The duration predictor enables both utterance-wide and per-phoneme control of duration at inference time. When accurate target durations are scarce or unavailable in the training data, we propose a method using a fine-grained variational auto-encoder to train the duration predictor in a semi-supervised or unsupervised manner, with results almost as good as supervised training.", "keywords": "tts;text-to-speech", "primary_area": "", "supplementary_material": "/attachment/f0824bc2aa6228482b4187c83193046fb9728447.zip", "author": "Jonathan Shen;Ye Jia;Mike Chrzanowski;Yu Zhang;Isaac Elias;Heiga Zen;Yonghui Wu", "authorids": "~Jonathan_Shen1;~Ye_Jia1;~Mike_Chrzanowski2;~Yu_Zhang2;isaace@google.com;~Heiga_Zen1;~Yonghui_Wu1", "gender": "M;M;;M;;M;M", "homepage": ";;;;;https://research.google/people/heigazen;", "dblp": "192/1539;217/2520;173/5380;50/671-33;;42/7014;26/2189", "google_scholar": "yDonAm4AAAAJ;kaO4R1kAAAAJ;pVJgaD0AAAAJ;;;z3IRvDwAAAAJ;55FnA9wAAAAJ", "orcid": ";;;;;0000-0002-8959-5471;", "linkedin": "jonathanasdf/;;mike-chrzanowski-259640155/;;;heiga-zen-b1a64b3;", "or_profile": "~Jonathan_Shen1;~Ye_Jia1;~Mike_Chrzanowski2;~Yu_Zhang2;isaace@google.com;~Heiga_Zen1;~Yonghui_Wu1", "aff": "Google;Google;NVIDIA;Google;;Google;", "aff_domain": "google.com;google.com;nvidia.com;google.com;;google.com;", "position": "Employee;Researcher;Research Scientist;Research Scientist;;Researcher;", "bibtex": "@misc{\nshen2021nonattentive,\ntitle={Non-Attentive Tacotron: Robust and controllable neural {\\{}TTS{\\}} synthesis including unsupervised duration modeling},\nauthor={Jonathan Shen and Ye Jia and Mike Chrzanowski and Yu Zhang and Isaac Elias and Heiga Zen and Yonghui Wu},\nyear={2021},\nurl={https://openreview.net/forum?id=CGFN_nV1ql}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=CGFN_nV1ql", "pdf_size": 0, "rating": "4;5;6;8", "confidence": "4;3;2;4", "wc_review": "625;379;257;445", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1526;759;519;135", "reply_reviewers": "0;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 5.75, 1.479019945774904 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 426.5, 132.97650168356813 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 734.75, 508.1615761743503 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.050964719143762556, "gs_citation": 107, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1135826872589262488&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;1;0;0", "aff_unique_norm": "Google;NVIDIA", "aff_unique_dep": "Google;NVIDIA Corporation", "aff_unique_url": "https://www.google.com;https://www.nvidia.com", "aff_unique_abbr": "Google;NVIDIA", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Task-Agnostic Morphology Evolution", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2987", "id": "CGQ6ENUMX6", "poster": "", "openreview": "https://openreview.net/forum?id=CGQ6ENUMX6", "slides": "https://iclr.cc/virtual/2021/poster/2987", "video": "https://iclr.cc/virtual/2021/poster/2987", "author_site": "Donald Hejna III, Pieter Abbeel, Lerrel Pinto", "tldr": "", "abstract": "Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form. So, how should one go about finding a morphology fit for solving tasks in a given environment? Current approaches that co-adapt morphology and behavior use a specific task's reward as a signal for morphology optimization. However, this often requires expensive policy optimization and results in task-dependent morphologies that are not built to generalize. In this work, we propose a new approach, Task-Agnostic Morphology Evolution (TAME), to alleviate both of these issues. Without any task or reward specification, TAME evolves morphologies by only applying randomly sampled action primitives on a population of agents. This is accomplished using an information-theoretic objective that efficiently ranks agents by their ability to reach diverse states in the environment and the causality of their actions. Finally, we empirically demonstrate that across 2D, 3D, and manipulation environments TAME can evolve morphologies that match the multi-task performance of those learned with task supervised algorithms. Our code and videos can be found at https://sites.google.com/view/task-agnostic-evolution .\n", "keywords": "morphology;unsupervised;evolution;information theory;empowerment", "primary_area": "", "supplementary_material": "/attachment/b08bf4bd4cc07ccb0cb85ace3f304e26fb1c0245.zip", "author": "Donald Joseph Hejna III;Pieter Abbeel;Lerrel Pinto", "authorids": "~Donald_Joseph_Hejna_III1;~Pieter_Abbeel2;~Lerrel_Pinto1", "gender": "M;M;M", "homepage": "https://joeyhejna.com;https://people.eecs.berkeley.edu/~pabbeel/;https://www.lerrelpinto.com/", "dblp": "336/3297;;168/8304", "google_scholar": "y_sLoXoAAAAJ;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ;pmVPj94AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Donald_Joseph_Hejna_III1;~Pieter_Abbeel2;~Lerrel_Pinto1", "aff": "University of California, Berkeley;Covariant;New York University", "aff_domain": "berkeley.edu;covariant.ai;cs.nyu.edu", "position": "Undergrad student;Founder;Assistant Professor", "bibtex": "@inproceedings{\niii2021taskagnostic,\ntitle={Task-Agnostic Morphology Evolution},\nauthor={Donald Joseph Hejna III and Pieter Abbeel and Lerrel Pinto},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=CGQ6ENUMX6}\n}", "github": "[![github](/images/github_icon.svg) jhejna/morphology-opt](https://github.com/jhejna/morphology-opt)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "3;4;4;4", "wc_review": "314;277;804;410", "wc_reply_reviewers": "110;126;246;18", "wc_reply_authors": "626;629;1235;298", "reply_reviewers": "1;1;2;1", "reply_authors": "1;1;2;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 451.25, 209.3649624459642 ], "wc_reply_reviewers_avg": [ 125.0, 81.11103500757464 ], "wc_reply_authors_avg": [ 697.0, 338.49298367913036 ], "reply_reviewers_avg": [ 1.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 28, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14695430945522716780&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=CGQ6ENUMX6", "email": "berkeley.edu;covariant.ai;cs.nyu.edu", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of California, Berkeley;Covariant;New York University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.berkeley.edu;;https://www.nyu.edu", "aff_unique_abbr": "UC Berkeley;;NYU", "aff_campus_unique_index": "0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States;" }, { "title": "Single-Photon Image Classification", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3024", "id": "CHLhSw9pSw8", "poster": "", "openreview": "https://openreview.net/forum?id=CHLhSw9pSw8", "slides": "https://iclr.cc/virtual/2021/poster/3024", "video": "https://iclr.cc/virtual/2021/poster/3024", "author_site": "Thomas Fischbacher, Luciano Sbaiz", "tldr": "", "abstract": "Quantum Computing based Machine Learning mainly focuses on quantum computing hardware that is experimentally challenging to realize due to requiring quantum gates that operate at very low temperature. We demonstrate the existence of a \"quantum computing toy model\" that illustrates key aspects of quantum information processing while being experimentally accessible with room temperature optics. Pondering the question of the theoretical classification accuracy performance limit for MNIST (respectively \"Fashion-MNIST\") classifiers, subject to the constraint that a decision has to be made after detection of the very first photon that passed through an image-filter, we show that a machine learning system that is permitted to use quantum interference on the photon's state can substantially outperform any machine learning system that can not. Specifically, we prove that a \"classical\" MNIST (respectively \"Fashion-MNIST\") classifier cannot achieve an accuracy of better than $21.28\\%$ (respectively $18.28\\%$ for \"Fashion-MNIST\") if it must make a decision after seeing a single photon falling on one of the $28\\times 28$ image pixels of a detector array. We further demonstrate that a classifier that is permitted to employ quantum interference by optically transforming the photon state prior to detection can achieve a classification accuracy of at least $41.27\\%$ for MNIST (respectively $36.14\\%$ for \"Fashion-MNIST\"). We show in detail how to train the corresponding quantum state transformation with TensorFlow and also explain how this example can serve as a teaching tool for the measurement process in quantum mechanics.\n", "keywords": "quantum mechanics;image classification;quantum machine learning;theoretical limits", "primary_area": "", "supplementary_material": "/attachment/0a1456645e6659863f50a6fe0070140fe4dce066.zip", "author": "Thomas Fischbacher;Luciano Sbaiz", "authorids": "~Thomas_Fischbacher1;~Luciano_Sbaiz1", "gender": ";M", "homepage": "https://research.google/;", "dblp": ";11/4154", "google_scholar": ";fKBmhcUAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Thomas_Fischbacher1;~Luciano_Sbaiz1", "aff": ";Google", "aff_domain": ";google.com", "position": ";Research Scientist", "bibtex": "@inproceedings{\nfischbacher2021singlephoton,\ntitle={Single-Photon Image Classification},\nauthor={Thomas Fischbacher and Luciano Sbaiz},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=CHLhSw9pSw8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "3;6;7;8", "confidence": "1;3;3;3", "wc_review": "172;253;562;283", "wc_reply_reviewers": "0;0;92;0", "wc_reply_authors": "614;308;1707;130", "reply_reviewers": "0;0;1;0", "reply_authors": "2;1;4;1", "rating_avg": [ 6.0, 1.8708286933869707 ], "confidence_avg": [ 2.5, 0.8660254037844386 ], "wc_review_avg": [ 317.5, 146.88515922311552 ], "wc_reply_reviewers_avg": [ 23.0, 39.83716857408418 ], "wc_reply_authors_avg": [ 689.75, 612.2884838864765 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 1.224744871391589 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.9258200997725515, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1731199934323581566&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=CHLhSw9pSw8", "email": ";google.com", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "CHTHamtufWN", "title": "Continual Invariant Risk Minimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Empirical risk minimization can lead to poor generalization behaviour on unseen environments if the learned model does not capture invariant feature represen- tations. Invariant risk minimization (IRM) is a recent proposal for discovering environment-invariant representations. It was introduced by Arjovsky et al. (2019) and extended by Ahuja et al. (2020). The assumption of IRM is that all environ- ments are available to the learning system at the same time. With this work, we generalize the concept of IRM to scenarios where environments are observed se- quentially. We show that existing approaches, including those designed for contin- ual learning, fail to identify the invariant features and models across sequentially presented environments. We extend IRM under a variational Bayesian and bilevel framework, creating a general approach to continual invariant risk minimization. We also describe a strategy to solve the optimization problems using a variant of the alternating direction method of multiplier (ADMM). We show empirically us- ing multiple datasets and with multiple sequential environments that the proposed methods outperforms or is competitive with prior approaches.", "keywords": "Supervised Learning;Causal Learning;Invariant Risk Minimization;Continual Learning", "primary_area": "", "supplementary_material": "", "author": "Francesco Alesiani;Shujian Yu;Mathias Niepert", "authorids": "~Francesco_Alesiani1;~Shujian_Yu1;~Mathias_Niepert1", "gender": ";M;M", "homepage": "https://falesiani.github.io/;https://sjyucnel.github.io/;http://www.matlog.net", "dblp": "122/8256;154/5763.html;n/MathiasNiepert", "google_scholar": "0puEQdgAAAAJ;O8kpnMoAAAAJ;https://scholar.google.de/citations?user=p5vLzq0AAAAJ", "orcid": "0000-0003-4413-7247;;", "linkedin": "francesco-alesiani-2b48b74;;", "or_profile": "~Francesco_Alesiani1;~Shujian_Yu1;~Mathias_Niepert1", "aff": "NEC;NEC;NEC", "aff_domain": "neclab.eu;neclab.eu;neclab.eu", "position": "Senior Researcher;Research Scientist;Research Scientist", "bibtex": "@misc{\nalesiani2021continual,\ntitle={Continual Invariant Risk Minimization},\nauthor={Francesco Alesiani and Shujian Yu and Mathias Niepert},\nyear={2021},\nurl={https://openreview.net/forum?id=CHTHamtufWN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=CHTHamtufWN", "pdf_size": 0, "rating": "3;5;6;6", "confidence": "4;5;4;2", "wc_review": "757;1467;271;251", "wc_reply_reviewers": "85;221;0;0", "wc_reply_authors": "371;1181;285;518", "reply_reviewers": "1;2;0;0", "reply_authors": "2;3;2;2", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 686.5, 494.0776760793792 ], "wc_reply_reviewers_avg": [ 76.5, 90.35623940824452 ], "wc_reply_authors_avg": [ 588.75, 351.93918153567387 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.37463432463267754, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9432925604530630373&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "NEC Corporation", "aff_unique_dep": "", "aff_unique_url": "https://www.nec.com", "aff_unique_abbr": "NEC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Japan" }, { "id": "CJmMqnXthgX", "title": "An Empirical Study of the Expressiveness of Graph Kernels and Graph Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph neural networks and graph kernels have achieved great success in solving machine learning problems on graphs. Recently, there has been considerable interest in determining the expressive power mainly of graph neural networks and of graph kernels, to a lesser extent. Most studies have focused on the ability of these approaches to distinguish non-isomorphic graphs or to identify specific graph properties. However, there is often a need for algorithms whose produced graph representations can accurately capture similarity/distance of graphs. This paper studies the expressive power of graph neural networks and graph kernels from an empirical perspective. Specifically, we compare the graph representations and similarities produced by these algorithms against those generated by a well-accepted, but intractable graph similarity function. We also investigate the impact of node attributes on the performance of the different models and kernels. Our results reveal interesting findings. For instance, we find that theoretically more powerful models do not necessarily yield higher-quality representations, while graph kernels are shown to be very competitive with graph neural networks.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Giannis Nikolentzos;George Panagopoulos;Michalis Vazirgiannis", "authorids": "~Giannis_Nikolentzos1;george.panagopoulos@polytechnique.edu;~Michalis_Vazirgiannis1", "gender": "M;;M", "homepage": "http://users.uop.gr/~nikolentzos/;;", "dblp": "163/6278;;v/MVazirgiannis", "google_scholar": "bdom4I8AAAAJ;;https://scholar.google.gr/citations?user=aWGJYcMAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Giannis_Nikolentzos1;george.panagopoulos@polytechnique.edu;~Michalis_Vazirgiannis1", "aff": "Ecole polytechnique;;Ecole Polytechnique, France", "aff_domain": "polytechnique.edu;;polytechnique.fr", "position": "Postdoc;;Full Professor", "bibtex": "@misc{\nnikolentzos2021an,\ntitle={An Empirical Study of the Expressiveness of Graph Kernels and Graph Neural Networks},\nauthor={Giannis Nikolentzos and George Panagopoulos and Michalis Vazirgiannis},\nyear={2021},\nurl={https://openreview.net/forum?id=CJmMqnXthgX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=CJmMqnXthgX", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "4;5;4;4", "wc_review": "358;298;569;451", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 419.0, 102.33034740486323 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:q3FPaT_mwnQJ:scholar.google.com/&scioq=An+Empirical+Study+of+the+Expressiveness+of+Graph+Kernels+and+Graph+Neural+Networks&hl=en&as_sdt=0,33", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "Ecole Polytechnique", "aff_unique_dep": "", "aff_unique_url": "https://www.polytechnique.edu", "aff_unique_abbr": "X", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "France" }, { "id": "CLYe1Yke1r", "title": "Box-To-Box Transformation for Modeling Joint Hierarchies", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning representations of entities and relations in knowledge graphs is an active area of research, with much emphasis placed on choosing the appropriate geometry to capture tree-like structures. Box embeddings (Vilnis et al., 2018; Li et al., 2019; Dasgupta et al., 2020), which represent concepts as n-dimensional hyperrectangles, are capable of embedding trees by training on a subset of the transitive closure. In Patel et al. (2020), the authors demonstrate that only the transitive reduction is required, and further extend box embeddings to capture joint hierarchies by augmenting the graph with new nodes. While it is possible to represent joint hierarchies with this method, the parameters for each hierarchy are decoupled, making generalization between hierarchies infeasible. In this work, we introduce a learned box-to-box transformation which respects the geometric structure of the box embeddings. We demonstrate that this not only improves the capability of modeling cross-hierarchy compositional edges but is also capable of generalizing from a subset of the transitive reduction.", "keywords": "Box embeddings;Representation Learning;Joint Hierarchy;transitive relations;knowledge graph embedding;relational learning.", "primary_area": "", "supplementary_material": "/attachment/0142bc32e509e2ae778fab77c3a7a22c2d5f95b6.zip", "author": "Shib Sankar Dasgupta;Xiang Li;Michael Boratko;Dongxu Zhang;Andrew McCallum", "authorids": "~Shib_Sankar_Dasgupta2;~Xiang_Li2;~Michael_Boratko1;~Dongxu_Zhang1;~Andrew_McCallum1", "gender": "M;F;M;;M", "homepage": "https://ssdasgupta.github.io/;https://people.cs.pitt.edu/~xianglli/;https://people.cs.umass.edu/~mboratko/;https://zhangdongxu.github.io/;http://www.cs.umass.edu/~mccallum", "dblp": "222/9398;40/1491-69;222/1939;;m/AndrewMcCallum", "google_scholar": "0KpQR94AAAAJ;SRgRwSoAAAAJ;YKZGpnkAAAAJ;M_i8Rr8AAAAJ;yILa1y0AAAAJ", "orcid": ";;;;0009-0004-5487-2848", "linkedin": "shib-sankar-dasgupta-iisc/;;michaelboratko/;;andrew-mccallum-a412", "or_profile": "~Shib_Sankar_Dasgupta2;~Xiang_Li2;~Michael_Boratko1;~Dongxu_Zhang1;~Andrew_McCallum1", "aff": "University of Massachusetts, Amherst;Department of Computer Science, University of Massachusetts, Amherst;University of Massachusetts, Amherst;;University of Massachusetts Amherst", "aff_domain": "umass.edu;cs.umass.edu;umass.edu;;cs.umass.edu", "position": "PhD student;PhD student;Postdoc;;Distinguished Professor", "bibtex": "@misc{\ndasgupta2021boxtobox,\ntitle={Box-To-Box Transformation for Modeling Joint Hierarchies},\nauthor={Shib Sankar Dasgupta and Xiang Li and Michael Boratko and Dongxu Zhang and Andrew McCallum},\nyear={2021},\nurl={https://openreview.net/forum?id=CLYe1Yke1r}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=CLYe1Yke1r", "pdf_size": 0, "rating": "4;4;6;8", "confidence": "4;4;4;4", "wc_review": "527;885;286;56", "wc_reply_reviewers": "406;0;0;0", "wc_reply_authors": "1928;1094;148;19", "reply_reviewers": "2;0;0;0", "reply_authors": "6;3;1;1", "rating_avg": [ 5.5, 1.6583123951777 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 438.5, 306.90267186846063 ], "wc_reply_reviewers_avg": [ 101.5, 175.80315696824104 ], "wc_reply_authors_avg": [ 797.25, 773.6043481651328 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.75, 2.0463381929681126 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1100994274830924323&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "University of Massachusetts Amherst", "aff_unique_dep": "", "aff_unique_url": "https://www.umass.edu", "aff_unique_abbr": "UMass Amherst", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Amherst", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "CLnj31GZ4cI", "title": "K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters", "track": "main", "status": "Reject", "tldr": "", "abstract": "We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, they may suffer from catastrophic forgetting. To address this, we propose K-Adapter, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion. Taking RoBERTa as the pre-trained model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus different adapters are efficiently trained in a distributed way. We inject two kinds of knowledge, including factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge obtained from dependency parsing. Results on three knowledge-driven tasks (total six datasets) including relation classification, entity typing and question answering demonstrate that each adapter improves the performance, and the combination of both adapters brings further improvements. Probing experiments further indicate that K-Adapter captures richer factual and commonsense knowledge than RoBERTa.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/05b5859fa9d61295008fa17c838eaf4e0edd031e.zip", "author": "Ruize Wang;Duyu Tang;Nan Duan;zhongyu wei;Xuanjing Huang;Jianshu Ji;Guihong Cao;Daxin Jiang;Ming Zhou", "authorids": "~Ruize_Wang1;~Duyu_Tang1;~Nan_Duan1;~zhongyu_wei1;~Xuanjing_Huang1;jianshuj@microsoft.com;gucao@microsoft.com;djiang@microsoft.com;~Ming_Zhou1", "gender": "M;M;M;M;F;;;;", "homepage": "https://ruizewang.github.io;https://tangduyu.github.io//;https://nanduan.github.io/;http://www.sdspeople.fudan.edu.cn/zywei/;https://xuanjing-huang.github.io/;;;;", "dblp": "254/0867;135/6318;;31/10489;05/6735-1;;;;", "google_scholar": "ojrU9qEAAAAJ;9uz-D-kAAAAJ;Qaa6OxIAAAAJ;AjLDxxgAAAAJ;RGsMgZA4H78C;;;;", "orcid": ";;;;0000-0001-9197-9426;;;;", "linkedin": ";;;;;;;;", "or_profile": "~Ruize_Wang1;~Duyu_Tang1;~Nan_Duan1;~zhongyu_wei1;~Xuanjing_Huang1;jianshuj@microsoft.com;gucao@microsoft.com;djiang@microsoft.com;~Ming_Zhou1", "aff": "Fudan University;;Microsoft Research Asia;Fudan University;Fudan University;;;;", "aff_domain": "fudan.edu.cn;;microsoft.com;fudan.edu.cn;fudan.edu.cn;;;;", "position": "MS student;;Principal Researcher;Associate Professor;Full Professor;;;;", "bibtex": "@misc{\nwang2021kadapter,\ntitle={K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters},\nauthor={Ruize Wang and Duyu Tang and Nan Duan and zhongyu wei and Xuanjing Huang and Jianshu Ji and Guihong Cao and Daxin Jiang and Ming Zhou},\nyear={2021},\nurl={https://openreview.net/forum?id=CLnj31GZ4cI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=CLnj31GZ4cI", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;3;4;3", "wc_review": "780;759;128;497", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1274;528;145;701", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 541.0, 263.2251887642974 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 662.0, 406.6048450277001 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 615, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4166695802214878222&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Fudan University;Microsoft", "aff_unique_dep": ";Research", "aff_unique_url": "https://www.fudan.edu.cn;https://www.microsoft.com/en-us/research/group/asia", "aff_unique_abbr": "Fudan;MSR Asia", "aff_campus_unique_index": "1", "aff_campus_unique": ";Asia", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "CMsvjAnW1zE", "title": "Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD). Most related works study SMD by focusing on \"effective learning rate\" in \"equilibrium\" condition, where weight norm remains unchanged. However, their discussions on why equilibrium condition can be reached in SMD is either absent or less convincing. Our work investigates SMD by directly exploring the cause of equilibrium condition. Specifically, 1) we introduce the assumptions that can lead to equilibrium condition in SMD, and prove that weight norm can converge at linear rate with given assumptions; 2) we propose \"angular update\" as a substitute for effective learning rate to measure the evolving of neural network in SMD, and prove angular update can also converge to its theoretical value at linear rate; 3) we verify our assumptions and theoretical results on various computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations.", "keywords": "Normalization;Weight decay;SGD;Momentum", "primary_area": "", "supplementary_material": "", "author": "Ruosi Wan;Zhanxing Zhu;Xiangyu Zhang;Jian Sun", "authorids": "~Ruosi_Wan4;~Zhanxing_Zhu1;~Xiangyu_Zhang1;~Jian_Sun4", "gender": "M;M;M;M", "homepage": ";https://zhanxingzhu.github.io/;;http://www.jiansun.org", "dblp": "222/3350;87/7756.html;95/3760-5.html;68/4942-15", "google_scholar": "TSlNpN4AAAAJ;a2sHceIAAAAJ;yuB-cfoAAAAJ;ALVSZAYAAAAJ", "orcid": ";;0000-0003-2138-4608;", "linkedin": ";;;", "or_profile": "~Ruosi_Wan4;~Zhanxing_Zhu1;~Xiangyu_Zhang1;~Jian_Sun4", "aff": "Megvii Technology Inc.;Peking University;MEGVII Technology;Megvii Technology", "aff_domain": "megvii.com;pku.edu.cn;megvii.com;megvii.com", "position": "Researcher;Assistant Professor;Principal Researcher;Chief Scientist", "bibtex": "@misc{\nwan2021spherical,\ntitle={Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and {\\{}SGD{\\}}},\nauthor={Ruosi Wan and Zhanxing Zhu and Xiangyu Zhang and Jian Sun},\nyear={2021},\nurl={https://openreview.net/forum?id=CMsvjAnW1zE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=CMsvjAnW1zE", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;3;2;4", "wc_review": "696;445;192;328", "wc_reply_reviewers": "0;0;0;236", "wc_reply_authors": "1778;535;248;1577", "reply_reviewers": "0;0;0;2", "reply_authors": "3;1;1;3", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 415.25, 185.17474854849945 ], "wc_reply_reviewers_avg": [ 59.0, 102.19099764656376 ], "wc_reply_authors_avg": [ 1034.5, 654.8245948343724 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.1348399724926484, "gs_citation": 23, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11920205074008590314&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Megvii Technology;Peking University", "aff_unique_dep": ";", "aff_unique_url": "https://www.megvii.com;http://www.pku.edu.cn", "aff_unique_abbr": "Megvii;Peking U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "CNA6ZrpNDar", "title": "On the Decision Boundaries of Neural Networks. A Tropical Geometry Perspective", "track": "main", "status": "Reject", "tldr": "", "abstract": "This work tackles the problem of characterizing and understanding the decision boundaries of neural networks with piecewise linear non-linearity activations. We use tropical geometry, a new development in the area of algebraic geometry, to characterize the decision boundaries of a simple network of the form (Affine, ReLU, Affine). Our main finding is that the decision boundaries are a subset of a tropical hypersurface, which is intimately related to a polytope formed by the convex hull of two zonotopes. The generators of these zonotopes are functions of the network parameters. This geometric characterization provides new perspectives to three tasks. Specifically, we propose a new tropical perspective to the lottery ticket hypothesis, where we view the effect of different initializations on the tropical geometric representation of a network's decision boundaries. Moreover, we propose new tropical based optimization problems that directly influence the decision boundaries of the network for the tasks of network pruning (removing network parameters not contributing to the tropical geometric representation of the decision boundaries) and the generation of adversarial attacks.", "keywords": "Tropical Geometry;Decision Boundaries;Neural Networks", "primary_area": "", "supplementary_material": "/attachment/b77a9e4606e390615f3a7eca8034d484f4b8d311.zip", "author": "Motasem Alfarra;Adel Bibi;Hasan Abed Al Kader Hammoud;Mohamed Gaafar;Bernard Ghanem", "authorids": "~Motasem_Alfarra1;~Adel_Bibi1;~Hasan_Abed_Al_Kader_Hammoud1;~Mohamed_Gaafar1;~Bernard_Ghanem1", "gender": "M;M;M;;M", "homepage": "https://motasemalfarra.netlify.app/;http://adelbibi.com;https://cemse.kaust.edu.sa/vcc/people/person/hasan-abed-al-kader-hammoud;;https://ivul.kaust.edu.sa", "dblp": "255/5192;176/0964;259/0615;;37/2516", "google_scholar": "https://scholar.google.com/citations?hl=en;Q4j2laYAAAAJ;Plf1JSIAAAAJ;Rb0gj-oAAAAJ;rVsGTeEAAAAJ", "orcid": ";0000-0002-6169-3918;;;0000-0002-5534-587X", "linkedin": ";adel-bibi-ba3671ab/;hasan-abed-al-kader-hammoud-56392a147/;;bernardghanem/", "or_profile": "~Motasem_Alfarra1;~Adel_Bibi1;~Hasan_Abed_Al_Kader_Hammoud1;~Mohamed_Gaafar1;~Bernard_Ghanem1", "aff": "KAUST;University of Oxford;;Zalando SE;King Abdullah University of Science and Technology", "aff_domain": "kaust.edu.sa;ox.ac.uk;;zalando.de;kaust.edu.sa", "position": "PhD student;Postdoc;;Applied Scientist;Associate Professor", "bibtex": "@misc{\nalfarra2021on,\ntitle={On the Decision Boundaries of Neural Networks. A Tropical Geometry Perspective},\nauthor={Motasem Alfarra and Adel Bibi and Hasan Abed Al Kader Hammoud and Mohamed Gaafar and Bernard Ghanem},\nyear={2021},\nurl={https://openreview.net/forum?id=CNA6ZrpNDar}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=CNA6ZrpNDar", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;3;1;3", "wc_review": "536;656;121;1354", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "514;512;168;1039", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 2.75, 1.0897247358851685 ], "wc_review_avg": [ 666.75, 443.66844320956614 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 558.25, 311.2526104308203 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 31, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11867777183594277703&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 13, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "King Abdullah University of Science and Technology;University of Oxford;Zalando SE", "aff_unique_dep": ";;", "aff_unique_url": "https://www.kaust.edu.sa;https://www.ox.ac.uk;https://www.zalando.de", "aff_unique_abbr": "KAUST;Oxford;Zalando", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2;0", "aff_country_unique": "Saudi Arabia;United Kingdom;Germany" }, { "id": "CPfjKI8Yzx", "title": "Robust Imitation via Decision-Time Planning", "track": "main", "status": "Reject", "tldr": "", "abstract": "The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approaches infers the (unknown) reward function via inverse reinforcement learning (IRL) followed by maximizing this reward function via reinforcement learning (RL). The policies learned via these approaches are however very brittle in practice and deteriorate quickly even with small test-time perturbations due to compounding errors. We propose Imitation with Planning at Test-time (IMPLANT), a new algorithm for imitation learning that utilizes decision-time planning to correct for compounding errors of any base imitation policy. In contrast to existing approaches, we retain both the imitation policy and the rewards model at decision-time, thereby benefiting from the learning signal of the two components. Empirically, we demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments and excels at zero-shot generalization when subject to challenging perturbations in test-time dynamics.", "keywords": "imitation learning;reinforcement learning;inverse reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/fcd83751c6ca5af08fecd3f006b2e95fbbdb268d.zip", "author": "Carl Qi;Pieter Abbeel;Aditya Grover", "authorids": "~Carl_Qi1;~Pieter_Abbeel2;~Aditya_Grover1", "gender": "M;M;M", "homepage": "https://carl-qi.github.io/;https://people.eecs.berkeley.edu/~pabbeel/;https://aditya-grover.github.io", "dblp": ";;162/5052", "google_scholar": "CdmHB_oAAAAJ;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ;oOhnPUgAAAAJ", "orcid": ";;", "linkedin": "carlqi/;;", "or_profile": "~Carl_Qi1;~Pieter_Abbeel2;~Aditya_Grover1", "aff": "University of California, Berkeley;Covariant;University of California, Berkeley", "aff_domain": "berkeley.edu;covariant.ai;berkeley.edu", "position": "Undergrad student;Founder;Postdoc", "bibtex": "@misc{\nqi2021robust,\ntitle={Robust Imitation via Decision-Time Planning},\nauthor={Carl Qi and Pieter Abbeel and Aditya Grover},\nyear={2021},\nurl={https://openreview.net/forum?id=CPfjKI8Yzx}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=CPfjKI8Yzx", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "5;5;5;3", "wc_review": "837;499;288;468", "wc_reply_reviewers": "139;0;0;0", "wc_reply_authors": "1273;517;530;438", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 523.0, 198.38220686341808 ], "wc_reply_reviewers_avg": [ 34.75, 60.188765563018485 ], "wc_reply_authors_avg": [ 689.5, 338.71854097465643 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9271726499455306, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:8a_0bkvWPjcJ:scholar.google.com/&scioq=Robust+Imitation+via+Decision-Time+Planning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of California, Berkeley;Covariant", "aff_unique_dep": ";", "aff_unique_url": "https://www.berkeley.edu;", "aff_unique_abbr": "UC Berkeley;", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States;" }, { "title": "Contrastive Learning with Hard Negative Samples", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2801", "id": "CR1XOQ0UTh-", "poster": "", "openreview": "https://openreview.net/forum?id=CR1XOQ0UTh-", "slides": "https://iclr.cc/virtual/2021/poster/2801", "video": "https://iclr.cc/virtual/2021/poster/2801", "author_site": "Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, Stefanie Jegelka", "tldr": "", "abstract": "We consider the question: how can you sample good negative examples for contrastive learning? We argue that, as with metric learning, learning contrastive representations benefits from hard negative samples (i.e., points that are difficult to distinguish from an anchor point). The key challenge toward using hard negatives is that contrastive methods must remain unsupervised, making it infeasible to adopt existing negative sampling strategies that use label information. In response, we develop a new class of unsupervised methods for selecting hard negative samples where the user can control the amount of hardness. A limiting case of this sampling results in a representation that tightly clusters each class, and pushes different classes as far apart as possible. The proposed method improves downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.\n", "keywords": "contrastive learning;unsupervised representation learning;hard negative sampling", "primary_area": "", "supplementary_material": "/attachment/daa02e9bde6fdac4d04e1396f30d0c1a87f80b98.zip", "author": "Joshua David Robinson;Ching-Yao Chuang;Suvrit Sra;Stefanie Jegelka", "authorids": "~Joshua_David_Robinson1;~Ching-Yao_Chuang1;~Suvrit_Sra1;~Stefanie_Jegelka3", "gender": "M;M;;F", "homepage": "https://joshrobinson.mit.edu/;https://chingyaoc.github.io/;https://optml.mit.edu;http://people.csail.mit.edu/stefje/", "dblp": "15/4759;190/7522;90/930;38/7003", "google_scholar": "E02doCkAAAAJ;fpUICd0AAAAJ;eyCw9goAAAAJ;gTWUZlsAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Joshua_David_Robinson1;~Ching-Yao_Chuang1;~Suvrit_Sra1;~Stefanie_Jegelka3", "aff": "Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology", "aff_domain": "mit.edu;mit.edu;mit.edu;mit.edu", "position": "PhD student;PhD student;Associate Professor;Associate Professor", "bibtex": "@inproceedings{\nrobinson2021contrastive,\ntitle={Contrastive Learning with Hard Negative Samples},\nauthor={Joshua David Robinson and Ching-Yao Chuang and Suvrit Sra and Stefanie Jegelka},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=CR1XOQ0UTh-}\n}", "github": "[![github](/images/github_icon.svg) joshr17/HCL](https://github.com/joshr17/HCL)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;4;4;3", "wc_review": "747;292;530;291", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "810;347;379;370", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 465.0, 189.70635202860234 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 476.5, 192.89958527689996 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 963, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9395538845107330163&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=CR1XOQ0UTh-", "email": "mit.edu;mit.edu;mit.edu;mit.edu", "author_num": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Massachusetts Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://web.mit.edu", "aff_unique_abbr": "MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3034", "id": "CU0APx9LMaL", "poster": "", "openreview": "https://openreview.net/forum?id=CU0APx9LMaL", "slides": "https://iclr.cc/virtual/2021/poster/3034", "video": "https://iclr.cc/virtual/2021/poster/3034", "author_site": "Abhinav Mehrotra, Alberto Gil Couto Pimentel Ramos, Sourav Bhattacharya, \u0141ukasz Dudziak, Ravichander Vipperla, Thomas C Chau, Mohamed Abdelfattah, Samin Ishtiaq, Nicholas Lane", "tldr": "", "abstract": "Powered by innovations in novel architecture design, noise tolerance techniques and increasing model capacity, Automatic Speech Recognition (ASR) has made giant strides in reducing word-error-rate over the past decade. ASR models are often trained with tens of thousand hours of high quality speech data to produce state-of-the-art (SOTA) results. Industry-scale ASR model training thus remains computationally heavy and time-consuming, and consequently has attracted little attention in adopting automatic techniques. On the other hand, Neural Architecture Search (NAS) has gained a lot of interest in the recent years thanks to its successes in discovering efficient architectures, often outperforming handcrafted alternatives. However, by changing the standard training process into a bi-level optimisation problem, NAS approaches often require significantly more time and computational power compared to single-model training, and at the same time increase complexity of the overall process. As a result, NAS has been predominately applied to problems which do not require as extensive training as ASR, and even then reproducibility of NAS algorithms is often problematic. Lately, a number of benchmark datasets has been introduced to address reproducibility issues by pro- viding NAS researchers with information about performance of different models obtained through exhaustive evaluation. However, these datasets focus mainly on computer vision and NLP tasks and thus suffer from limited coverage of application domains. In order to increase diversity in the existing NAS benchmarks, and at the same time provide systematic study of the effects of architectural choices for ASR, we release NAS-Bench-ASR \u2013 the first NAS benchmark for ASR models. The dataset consists of 8, 242 unique models trained on the TIMIT audio dataset for three different target epochs, and each starting from three different initializations. The dataset also includes runtime measurements of all the models on a diverse set of hardware platforms. Lastly, we show that identified good cell structures in our search space for TIMIT transfer well to a much larger LibriSpeech dataset.", "keywords": "NAS;ASR;Benchmark", "primary_area": "", "supplementary_material": "", "author": "Abhinav Mehrotra;Alberto Gil C. P. Ramos;Sourav Bhattacharya;\u0141ukasz Dudziak;Ravichander Vipperla;Thomas Chau;Mohamed S Abdelfattah;Samin Ishtiaq;Nicholas Donald Lane", "authorids": "~Abhinav_Mehrotra1;a.gilramos@samsung.com;~Sourav_Bhattacharya1;~\u0141ukasz_Dudziak1;~Ravichander_Vipperla1;thomas.chau@samsung.com;~Mohamed_S_Abdelfattah1;s.ishtiaq@samsung.com;~Nicholas_Donald_Lane1", "gender": "M;;M;M;M;;M;;", "homepage": "https://abhinavmehrotra.github.io/;;;;;;https://mohsaied.github.io/;;", "dblp": "154/4273;;69/3637;228/7987;https://dblp.uni-trier.de/pers/hd/v/Vipperla:Ravichander;;124/7095;;", "google_scholar": "https://scholar.google.co.uk/citations?user=AbeyFKwAAAAJ;;EU-ESvsAAAAJ;R47NvpoAAAAJ;NYG-shkAAAAJ;;https://scholar.google.ca/citations?user=q4wBpWAAAAAJ;;", "orcid": ";;;;;;;;", "linkedin": ";;;;ravichander-vipperla-3827522/?originalSubdomain=uk;;mabdelfattah/;;", "or_profile": "~Abhinav_Mehrotra1;a.gilramos@samsung.com;~Sourav_Bhattacharya1;~\u0141ukasz_Dudziak1;~Ravichander_Vipperla1;thomas.chau@samsung.com;~Mohamed_S_Abdelfattah1;s.ishtiaq@samsung.com;~Nicholas_Donald_Lane1", "aff": "Samsung AI Center;;Samsung AI Center;Samsung;;;Samsung AI Center;;", "aff_domain": "samsung.com;;samsung.com;samsung.com;;;samsung.com;;", "position": "Researcher;;Principal Researcher;Software Engineer;;;Principal Scientist;;", "bibtex": "@inproceedings{\nmehrotra2021nasbenchasr,\ntitle={{\\{}NAS{\\}}-Bench-{\\{}ASR{\\}}: Reproducible Neural Architecture Search for Speech Recognition},\nauthor={Abhinav Mehrotra and Alberto Gil C. P. Ramos and Sourav Bhattacharya and {\\L}ukasz Dudziak and Ravichander Vipperla and Thomas Chau and Mohamed S Abdelfattah and Samin Ishtiaq and Nicholas Donald Lane},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=CU0APx9LMaL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "4;5;6;6;7", "confidence": "5;5;4;4;3", "wc_review": "243;312;348;367;241", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "1154;1040;298;318;65", "reply_reviewers": "0;0;0;0;0", "reply_authors": "2;2;1;1;1", "rating_avg": [ 5.6, 1.0198039027185568 ], "confidence_avg": [ 4.2, 0.7483314773547882 ], "wc_review_avg": [ 302.2, 52.235620030779764 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 575.0, 436.8853396487458 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.4, 0.4898979485566356 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": -0.9434563530497265, "gs_citation": 86, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12117737367325513994&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=CU0APx9LMaL", "email": "samsung.com;;samsung.com;samsung.com;;;samsung.com;;", "author_num": 9, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Samsung", "aff_unique_dep": "AI Center", "aff_unique_url": "https://www.samsung.com/global/careers/ai-center/", "aff_unique_abbr": "Samsung AI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "CVZMcRg_bd", "title": "Differentiable Programming for Piecewise Polynomial Functions", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The paradigm of differentiable programming has considerably enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the models be differentiable, limiting their applicability. We introduce a new, principled approach to extend gradient-based optimization to piecewise smooth models, such as k-histograms, splines, and segmentation maps. We derive an accurate form to the weak Jacobian of such functions, and show that it exhibits a block-sparse structure that can be computed implicitly and efficiently. We show that using the redesigned Jacobian leads to improved performance in applications such as denoising with piecewise polynomial regression models, data-free generative model training, and image segmentation.", "keywords": "Differentiable Programming;piecewise polynomial regression;generative models;segmentation", "primary_area": "", "supplementary_material": "", "author": "Minsu Cho;Ameya Joshi;Xian Yeow Lee;Aditya Balu;Adarsh Krishnamurthy;Baskar Ganapathysubramanian;Soumik Sarkar;Chinmay Hegde", "authorids": "~Minsu_Cho2;~Ameya_Joshi2;~Xian_Yeow_Lee1;~Aditya_Balu1;~Adarsh_Krishnamurthy1;~Baskar_Ganapathysubramanian1;~Soumik_Sarkar1;~Chinmay_Hegde1", "gender": "M;M;M;M;M;M;M;M", "homepage": ";;https://web.me.iastate.edu/idealab/p-krishnamurthy.html;;http://web.me.iastate.edu/soumiks/index.html;https://chinmayhegde.github.io/;https://ameya005.github.io;", "dblp": "220/5717;192/1502;;;33/7053;39/2056;148/8731;", "google_scholar": "sjAA1AQAAAAJ;;https://scholar.google.com/citations?hl=en;R1JIs4cAAAAJ;-rmRjqIAAAAJ;eJAV17IAAAAJ;jZgsp_sAAAAJ;1pcqgUYAAAAJ", "orcid": ";;;;;;;", "linkedin": "bernard-leexy/;;;baskar-ganapathysubramanian-5b22a51a6/?original_referer=;;;;", "or_profile": "~Xian_Yeow_Lee1;~Aditya_Balu1;~Adarsh_Krishnamurthy1;~Baskar_Ganapathysubramanian1;~Soumik_Sarkar1;~Chinmay_Hegde1;~Ameya_A_Joshi1;~Minsu_Cho3", "aff": "Iowa State University;Iowa State University;Iowa State University;Iowa State University;Iowa State University;New York University;New York University;New York University", "aff_domain": "iastate.edu;iastate.edu;iastate.edu;iastate.edu;iastate.edu;nyu.edu;nyu.edu;nyu.edu", "position": "PhD student;Postdoc;Associate Professor;Professor;Associate Professor;Assistant Professor;PhD Student;PhD student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=CVZMcRg_bd", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "3;3;3;4", "wc_review": "324;352;109;325", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 277.5, 97.9298218113359 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;0;0;0;0;1;1;1", "aff_unique_norm": "Iowa State University;New York University", "aff_unique_dep": ";", "aff_unique_url": "https://www.iastate.edu;https://www.nyu.edu", "aff_unique_abbr": "ISU;NYU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "CYHMIhbuLFl", "title": "Contextual HyperNetworks for Novel Feature Adaptation", "track": "main", "status": "Reject", "tldr": "", "abstract": "While deep learning has obtained state-of-the-art results in many applications, the adaptation of neural network architectures to incorporate new features remains a research challenge. This issue is particularly severe in online learning settings, where new features are added continually with few or no associated observations. As such, methods for adapting neural networks to novel features which are both time and data-efficient are desired. To address this, we propose the Contextual HyperNetwork (CHN), which predicts the network weights associated with new features by incorporating information from both existing data as well as the few observations for the new feature and any associated feature metadata. At prediction time, the CHN requires only a single forward pass through a small neural network, yielding a significant speed-up when compared to re-training and fine-tuning approaches. In order to showcase the performance of CHNs, in this work we use a CHN to augment a partial variational autoencoder (P-VAE), a flexible deep generative model which can impute the values of missing features in sparsely-observed data. We show that this system obtains significantly improved performance for novel feature adaptation over existing imputation and meta-learning baselines across recommender systems, e-learning, and healthcare tasks.", "keywords": "Meta learning;few-shot learning;continual learning;recommender systems;deep learning", "primary_area": "", "supplementary_material": "", "author": "Angus Lamb;Evgeny Saveliev;Yingzhen Li;Sebastian Tschiatschek;Camilla Longden;Simon Woodhead;Jos\u00e9 Miguel Hern\u00e1ndez-Lobato;Richard E Turner;Pashmina Cameron;Cheng Zhang", "authorids": "~Angus_Lamb1;e.s.saveliev@gmail.com;~Yingzhen_Li1;~Sebastian_Tschiatschek1;camilla.longden@microsoft.com;simon.woodhead@eedi.co.uk;~Jos\u00e9_Miguel_Hern\u00e1ndez-Lobato1;~Richard_E_Turner1;~Pashmina_Cameron1;~Cheng_Zhang1", "gender": "M;;F;M;;;;M;F;F", "homepage": ";;http://yingzhenli.net/home/en/;https://www.tschiatschek.net;;;;https://rich-turner-group.github.io/;https://www.microsoft.com/en-us/research/people/pcameron/;http://cheng-zhang.org", "dblp": ";;117/9230;33/10810;;;;40/5352;94/8938;82/6384-5", "google_scholar": ";;https://scholar.google.se/citations?hl=en;;;;;https://scholar.google.co.uk/citations?user=DgLEyZgAAAAJ;https://scholar.google.com/citations?hl=en;r40iAwIAAAAJ", "orcid": ";;;;;;;;0009-0009-0444-1755;", "linkedin": "angusjlamb/;;;;;;;;pashmina-cameron-7424b51/;", "or_profile": "~Angus_Lamb1;e.s.saveliev@gmail.com;~Yingzhen_Li1;~Sebastian_Tschiatschek1;camilla.longden@microsoft.com;simon.woodhead@eedi.co.uk;~Jos\u00e9_Miguel_Hern\u00e1ndez-Lobato1;~Richard_E_Turner1;~Pashmina_Cameron1;~Cheng_Zhang1", "aff": ";;Imperial College London;University of Vienna;;;;University of Cambridge;Microsoft;Microsoft", "aff_domain": ";;imperial.ac.uk;univie.ac.at;;;;cam.ac.uk;microsoft.com;microsoft.com", "position": ";;Lecturer;Assistant Professor;;;;Professor;Principal Scientist;Principal Researcher", "bibtex": "@misc{\nlamb2021contextual,\ntitle={Contextual HyperNetworks for Novel Feature Adaptation},\nauthor={Angus Lamb and Evgeny Saveliev and Yingzhen Li and Sebastian Tschiatschek and Camilla Longden and Simon Woodhead and Jos{\\'e} Miguel Hern{\\'a}ndez-Lobato and Richard E Turner and Pashmina Cameron and Cheng Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=CYHMIhbuLFl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=CYHMIhbuLFl", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "3;3;4;2", "wc_review": "433;387;226;387", "wc_reply_reviewers": "391;0;0;0", "wc_reply_authors": "768;600;344;379", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 358.25, 78.63006740426972 ], "wc_reply_reviewers_avg": [ 97.75, 169.30796643985775 ], "wc_reply_authors_avg": [ 522.75, 172.28664341730035 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 10, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2506678688968907294&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;2;3;3", "aff_unique_norm": "Imperial College London;University of Vienna;University of Cambridge;Microsoft", "aff_unique_dep": ";;;Microsoft Corporation", "aff_unique_url": "https://www.imperial.ac.uk;https://univie.ac.at;https://www.cam.ac.uk;https://www.microsoft.com", "aff_unique_abbr": "ICL;UV;Cambridge;Microsoft", "aff_campus_unique_index": "1", "aff_campus_unique": ";Cambridge", "aff_country_unique_index": "0;1;0;2;2", "aff_country_unique": "United Kingdom;Austria;United States" }, { "title": "Simple Spectral Graph Convolution", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3377", "id": "CYO5T-YjWZV", "poster": "", "openreview": "https://openreview.net/forum?id=CYO5T-YjWZV", "slides": "https://iclr.cc/virtual/2021/poster/3377", "video": "https://iclr.cc/virtual/2021/poster/3377", "author_site": "Hao Zhu, Piotr Koniusz", "tldr": "", "abstract": "Graph Convolutional Networks (GCNs) are leading methods for learning graph representations. However, without specially designed architectures, the performance of GCNs degrades quickly with increased depth. As the aggregated neighborhood size and neural network depth are two completely orthogonal aspects of graph representation, several methods focus on summarizing the neighborhood by aggregating K-hop neighborhoods of nodes while using shallow neural networks. However, these methods still encounter oversmoothing, and suffer from high computation and storage costs. In this paper, we use a modified Markov Diffusion Kernel to derive a variant of GCN called Simple Spectral Graph Convolution (SSGC). Our spectral analysis shows that our simple spectral graph convolution used in SSGC is a trade-off of low- and high-pass filter bands which capture the global and local contexts of each node. We provide two theoretical claims which demonstrate that we can aggregate over a sequence of increasingly larger neighborhoods compared to competitors while limiting severe oversmoothing. Our experimental evaluations show that SSGC with a linear learner is competitive in text and node classification tasks. Moreover, SSGC is comparable to other state-of-the-art methods for node clustering and community prediction tasks.", "keywords": "Graph Convolutional Network;Oversmoothing", "primary_area": "", "supplementary_material": "", "author": "Hao Zhu;Piotr Koniusz", "authorids": "~Hao_Zhu2;~Piotr_Koniusz1", "gender": ";", "homepage": ";https://www.koniusz.com", "dblp": ";25/8616", "google_scholar": ";https://scholar.google.co.uk/citations?user=wZ7-1tUAAAAJ", "orcid": ";0000-0002-6340-5289", "linkedin": ";", "or_profile": "~Hao_Zhu2;~Piotr_Koniusz1", "aff": ";Data61, CSIRO", "aff_domain": ";data61.csiro.au", "position": ";senior research scientist", "bibtex": "@inproceedings{\nzhu2021simple,\ntitle={Simple Spectral Graph Convolution},\nauthor={Hao Zhu and Piotr Koniusz},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=CYO5T-YjWZV}\n}", "github": "[![github](/images/github_icon.svg) allenhaozhu/SSGC](https://github.com/allenhaozhu/SSGC) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=CYO5T-YjWZV)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;4;4;4", "wc_review": "660;349;344;474", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 456.75, 128.38491928571673 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 411, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3312425761995361615&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=CYO5T-YjWZV", "email": ";data61.csiro.au", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "CSIRO", "aff_unique_dep": "Data61", "aff_unique_url": "https://www.csiro.au", "aff_unique_abbr": "CSIRO", "aff_country_unique_index": "0", "aff_country_unique": "Australia" }, { "title": "What Should Not Be Contrastive in Contrastive Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2809", "id": "CZ8Y3NzuVzO", "poster": "", "openreview": "https://openreview.net/forum?id=CZ8Y3NzuVzO", "slides": "https://iclr.cc/virtual/2021/poster/2809", "video": "https://iclr.cc/virtual/2021/poster/2809", "author_site": "Tete Xiao, Xiaolong Wang, Alexei Efros, trevor darrell", "tldr": "", "abstract": "Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations. However, these methods implicitly assume a particular set of representational invariances (e.g., invariance to color), and can perform poorly when a downstream task violates this assumption (e.g., distinguishing red vs. yellow cars). We introduce a contrastive learning framework which does not require prior knowledge of specific, task-dependent invariances. Our model learns to capture varying and invariant factors for visual representations by constructing separate embedding spaces, each of which is invariant to all but one augmentation. We use a multi-head network with a shared backbone which captures information across each augmentation and alone outperforms all baselines on downstream tasks. We further find that the concatenation of the invariant and varying spaces performs best across all tasks we investigate, including coarse-grained, fine-grained, and few-shot downstream classification tasks, and various data corruptions.", "keywords": "Self-supervised learning;Contrastive learning;Representation learning", "primary_area": "", "supplementary_material": "", "author": "Tete Xiao;Xiaolong Wang;Alexei A Efros;Trevor Darrell", "authorids": "~Tete_Xiao1;~Xiaolong_Wang3;~Alexei_A_Efros1;~Trevor_Darrell2", "gender": "M;M;M;M", "homepage": "http://tetexiao.com;https://xiaolonw.github.io/;http://www.eecs.berkeley.edu/~efros/;https://people.eecs.berkeley.edu/~trevor/", "dblp": "200/8130;91/952-4;40/6158;d/TrevorDarrell", "google_scholar": "U4RqBdAAAAAJ;Y8O9N_0AAAAJ;https://scholar.google.com.tw/citations?user=d97bGd8AAAAJ;https://scholar.google.com.tw/citations?user=bh-uRFMAAAAJ", "orcid": ";;0000-0001-5720-8070;", "linkedin": ";;alexei-efros-890736a3/;", "or_profile": "~Tete_Xiao1;~Xiaolong_Wang3;~Alyosha_Efros1;~trevor_darrell1", "aff": "Facebook AI Research;University of California, San Diego;University of California, Berkeley;Electrical Engineering & Computer Science Department", "aff_domain": "facebook.com;ucsd.edu;berkeley.edu;eecs.berkeley.edu", "position": "Researcher;Assistant Professor;Professor;Professor", "bibtex": "@inproceedings{\nxiao2021what,\ntitle={What Should Not Be Contrastive in Contrastive Learning},\nauthor={Tete Xiao and Xiaolong Wang and Alexei A Efros and Trevor Darrell},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=CZ8Y3NzuVzO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "4;3;3;4", "wc_review": "878;227;288;349", "wc_reply_reviewers": "204;0;0;0", "wc_reply_authors": "1041;306;115;197", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 435.5, 259.09312997453253 ], "wc_reply_reviewers_avg": [ 51.0, 88.33459118601274 ], "wc_reply_authors_avg": [ 414.75, 367.85892336601 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 361, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2373021916505066512&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=CZ8Y3NzuVzO", "email": "facebook.com;ucsd.edu;berkeley.edu;eecs.berkeley.edu", "author_num": 4, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Meta;University of California, San Diego;University of California, Berkeley;Electrical Engineering & Computer Science Department", "aff_unique_dep": "Facebook AI Research;;;Electrical Engineering & Computer Science", "aff_unique_url": "https://research.facebook.com;https://www.ucsd.edu;https://www.berkeley.edu;", "aff_unique_abbr": "FAIR;UCSD;UC Berkeley;", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";San Diego;Berkeley", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States;" }, { "id": "C_p3TDhOXW_", "title": "Prior Preference Learning From Experts: Designing A Reward with Active Inference", "track": "main", "status": "Reject", "tldr": "", "abstract": "Active inference may be defined as Bayesian modeling of a brain with a biologically plausible model of the agent. Its primary idea relies on the free energy principle and the prior preference of the agent. An agent will choose an action that leads to its prior preference for a future observation. In this paper, we claim that active inference can be interpreted using reinforcement learning (RL) algorithms and find a theoretical connection between them. We extend the concept of expected free energy (EFE), which is a core quantity in active inference, and claim that EFE can be treated as a negative value function. Motivated by the concept of prior preference and a theoretical connection, we propose a simple but novel method for learning a prior preference from experts. This illustrates that the problem with RL can be approached with a new perspective of active inference. Experimental results of prior preference learning show the possibility of active inference with EFE-based rewards and its application to an inverse RL problem.", "keywords": "Active Inference;Free Energy Principle;Reinforcement Learning;Reward Design", "primary_area": "", "supplementary_material": "/attachment/8354a1dc6b5aab4e93720b009059de682276b092.zip", "author": "Jin Young Shin;Cheolhyeong Kim;Hyung Ju Hwang", "authorids": "~Jin_Young_Shin1;~Cheolhyeong_Kim1;~Hyung_Ju_Hwang1", "gender": "M;M;", "homepage": ";;http://hjhwang.postech.ac.kr", "dblp": ";;", "google_scholar": ";;", "orcid": "0000-0003-2249-928X;;", "linkedin": ";;", "or_profile": "~Jin_Young_Shin1;~Cheolhyeong_Kim1;~Hyung_Ju_Hwang1", "aff": "POSTECH;POSTECH;POSTECH", "aff_domain": "postech.ac.kr;postech.ac.kr;postech.ac.kr", "position": "PhD student;PhD student;Full Professor", "bibtex": "@misc{\nshin2021prior,\ntitle={Prior Preference Learning From Experts: Designing A Reward with Active Inference},\nauthor={Jin Young Shin and Cheolhyeong Kim and Hyung Ju Hwang},\nyear={2021},\nurl={https://openreview.net/forum?id=C_p3TDhOXW_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=C_p3TDhOXW_", "pdf_size": 0, "rating": "5;5;6", "confidence": "4;3;2", "wc_review": "260;421;224", "wc_reply_reviewers": "191;0;0", "wc_reply_authors": "834;227;155", "reply_reviewers": "2;0;0", "reply_authors": "2;1;1", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 301.6666666666667, 85.65174967402723 ], "wc_reply_reviewers_avg": [ 63.666666666666664, 90.03826347108706 ], "wc_reply_authors_avg": [ 405.3333333333333, 304.5349824823998 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 19, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=627135094414611558&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "Pohang University of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.postech.ac.kr", "aff_unique_abbr": "POSTECH", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Pohang", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "CaCHjsqCBJV", "title": "Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a framework which makes it feasible to directly train deep neural networks with respect to popular families of task-specific non-decomposable per- formance measures such as AUC, multi-class AUC, F -measure and others, as well as models such as non-negative matrix factorization. A common feature of the optimization model that emerges from these tasks is that it involves solving a Linear Programs (LP) during training where representations learned by upstream layers influence the constraints. The constraint matrix is not only large but the constraints are also modified at each iteration. We show how adopting a set of influential ideas proposed by Mangasarian for 1-norm SVMs \u2013 which advocates for solving LPs with a generalized Newton method \u2013 provides a simple and effective solution. In particular, this strategy needs little unrolling, which makes it more efficient during backward pass. While a number of specialized algorithms have been proposed for the models that we de- scribe here, our module turns out to be applicable without any specific adjustments or relaxations. We describe each use case, study its properties and demonstrate the efficacy of the approach over alternatives which use surrogate lower bounds and often, specialized optimization schemes. Frequently, we achieve superior computational behavior and performance improvements on common datasets used in the literature.\n", "keywords": "linear programming;nondecomposable functions;differentiable;AUC;Fscore", "primary_area": "", "supplementary_material": "", "author": "Zihang Meng;Lopamudra Mukherjee;Vikas Singh;Sathya N. Ravi", "authorids": "~Zihang_Meng1;~Lopamudra_Mukherjee1;~Vikas_Singh1;~Sathya_N._Ravi1", "gender": "M;F;M;M", "homepage": "https://pages.cs.wisc.edu/~zihangm/;;http://vsingh-www.cs.wisc.edu/;http://sathyaravi.com", "dblp": "193/5746;;;159/2123", "google_scholar": "z7EMulUAAAAJ;https://scholar.google.com/scholar?hl=en;d32BmwcAAAAJ;FW-0thoAAAAJ", "orcid": ";;;0000-0003-3881-6323", "linkedin": ";;;sathya-narayanan-ravi-74a5a128/", "or_profile": "~Zihang_Meng1;~Lopamudra_Mukherjee1;~Vikas_Singh1;~Sathya_N._Ravi1", "aff": "University of Wisconsin, Madison;University of Wisconsin-Whitewater;University of Wisconsin, Madison;University of Illinois, Chicago", "aff_domain": "wisc.edu;uww.edu;wisc.edu;uic.edu", "position": "PhD student;Associate Professor;Professor;Assistant Professor", "bibtex": "@misc{\nmeng2021differentiable,\ntitle={Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs},\nauthor={Zihang Meng and Lopamudra Mukherjee and Vikas Singh and Sathya N. Ravi},\nyear={2021},\nurl={https://openreview.net/forum?id=CaCHjsqCBJV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=CaCHjsqCBJV", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "5;5;4;2", "wc_review": "596;1179;512;350", "wc_reply_reviewers": "0;0;0;36", "wc_reply_authors": "804;1715;508;570", "reply_reviewers": "0;0;0;1", "reply_authors": "1;3;2;1", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 4.0, 1.224744871391589 ], "wc_review_avg": [ 659.25, 312.83332223406126 ], "wc_reply_reviewers_avg": [ 9.0, 15.588457268119896 ], "wc_reply_authors_avg": [ 899.25, 483.73617551305796 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7492686492653552, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14457188995827731351&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 12, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "University of Wisconsin;University of Wisconsin-Whitewater;University of Illinois at Chicago", "aff_unique_dep": ";;", "aff_unique_url": "https://www.wisc.edu;https://www.uww.edu;https://www.uic.edu", "aff_unique_abbr": "UW;UW-Whitewater;UIC", "aff_campus_unique_index": "0;1;0;2", "aff_campus_unique": "Madison;Whitewater;Chicago", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Network Pruning That Matters: A Case Study on Retraining Variants", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2991", "id": "Cb54AMqHQFP", "poster": "", "openreview": "https://openreview.net/forum?id=Cb54AMqHQFP", "slides": "https://iclr.cc/virtual/2021/poster/2991", "video": "https://iclr.cc/virtual/2021/poster/2991", "author_site": "Duong Le, Binh-Son Hua", "tldr": "", "abstract": "Network pruning is an effective method to reduce the computational expense of over-parameterized neural networks for deployment on low-resource systems. Recent state-of-the-art techniques for retraining pruned networks such as weight rewinding and learning rate rewinding have been shown to outperform the traditional fine-tuning technique in recovering the lost accuracy (Renda et al., 2020), but so far it is unclear what accounts for such performance. In this work, we conduct extensive experiments to verify and analyze the uncanny effectiveness of learning rate rewinding. We find that the reason behind the success of learning rate rewinding is the usage of a large learning rate. Similar phenomenon can be observed in other learning rate schedules that involve large learning rates, e.g., the 1-cycle learning rate schedule (Smith et al., 2019). By leveraging the right learning rate schedule in retraining, we demonstrate a counter-intuitive phenomenon in that randomly pruned networks could even achieve better performance than methodically pruned networks (fine-tuned with the conventional approach). Our results emphasize the cruciality of the learning rate schedule in pruned network retraining - a detail often overlooked by practitioners during the implementation of network pruning. ", "keywords": "Network Pruning", "primary_area": "", "supplementary_material": "", "author": "Duong Hoang Le;Binh-Son Hua", "authorids": "~Duong_Hoang_Le2;~Binh-Son_Hua1", "gender": "M;M", "homepage": "https://lehduong.github.io;https://sonhua.github.io", "dblp": "250/0433;44/8499", "google_scholar": "X3-PqocAAAAJ;sV_VjsAAAAAJ", "orcid": ";0000-0002-5706-8634", "linkedin": "lehduong;binh-son-hua-40895b14/", "or_profile": "~Duong_Hoang_Le2;~Binh-Son_Hua1", "aff": "VinAI Research;VinAI Research", "aff_domain": "vinai.io;vinai.io", "position": "AI Resident;Research Scientist", "bibtex": "@inproceedings{\nle2021network,\ntitle={Network Pruning That Matters: A Case Study on Retraining Variants},\nauthor={Duong Hoang Le and Binh-Son Hua},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Cb54AMqHQFP}\n}", "github": "[![github](/images/github_icon.svg) lehduong/NPTM](https://github.com/lehduong/NPTM)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;8", "confidence": "4;3;5;5", "wc_review": "884;459;796;327", "wc_reply_reviewers": "0;79;0;0", "wc_reply_authors": "223;394;179;159", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.25, 1.0897247358851685 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 616.5, 230.43057522820186 ], "wc_reply_reviewers_avg": [ 19.75, 34.208003449485325 ], "wc_reply_authors_avg": [ 238.75, 92.5753071828552 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.48420012470625223, "gs_citation": 58, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11116406662697084057&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=Cb54AMqHQFP", "email": "vinai.io;vinai.io", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "VinAI Research", "aff_unique_dep": "", "aff_unique_url": "https://www.vinai.io/", "aff_unique_abbr": "VinAI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Vietnam" }, { "id": "ClZ4IcqnFXB", "title": "Active Feature Acquisition with Generative Surrogate Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many real-world situations allow for the acquisition of additional relevant information when making an assessment with limited or uncertain data. However, traditional ML approaches either require all features to be acquired beforehand or regard part of them as missing data that cannot be acquired. In this work, we propose models that perform active feature acquisition (AFA) to improve the prediction assessments at evaluation time. We formulate the AFA problem as a Markov decision process (MDP) and resolve it using reinforcement learning (RL). The AFA problem yields sparse rewards and contains a high-dimensional complicated action space. Thus, we propose learning a generative surrogate model that captures the complicated dependencies among input features to assess potential information gain from acquisitions. We also leverage the generative surrogate model to provide intermediate rewards and auxiliary information to the agent. Furthermore, we extend AFA in a task we coin active instance recognition (AIR) for the unsupervised case where the target variables are the unobserved features themselves and the goal is to collect information for a particular instance in a cost-efficient way. Empirical results demonstrate that our approach achieves considerably better performance than previous state of the art methods on both supervised and unsupervised tasks.", "keywords": "Reinforcement Learning;Active Feature Acquisition;Feature Selection", "primary_area": "", "supplementary_material": "/attachment/022f2346364c369a253fcde208f92a4d525ddc93.zip", "author": "Yang Li;Junier Oliva", "authorids": "~Yang_Li19;~Junier_Oliva1", "gender": ";M", "homepage": ";http://lupalab.com", "dblp": ";137/8390", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Yang_Li19;~Junier_Oliva1", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@misc{\nli2021active,\ntitle={Active Feature Acquisition with Generative Surrogate Models},\nauthor={Yang Li and Junier Oliva},\nyear={2021},\nurl={https://openreview.net/forum?id=ClZ4IcqnFXB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=ClZ4IcqnFXB", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;4;4;4", "wc_review": "414;858;313;358", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "582;782;604;35", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 485.75, 217.87654187635712 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 500.75, 279.86012131062904 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 53, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15824781851708101221&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6 }, { "id": "Cn706AbJaKW", "title": "An Open Review of OpenReview: A Critical Analysis of the Machine Learning Conference Review Process", "track": "main", "status": "Reject", "tldr": "", "abstract": "Mainstream machine learning conferences have seen a dramatic increase in the number of participants, along with a growing range of perspectives, in recent years. Members of the machine learning community are likely to overhear allegations ranging from randomness of acceptance decisions to institutional bias. In this work, we critically analyze the review process through a comprehensive study of papers submitted to ICLR between 2017 and 2020. We quantify reproducibility/randomness in review scores and acceptance decisions, and examine whether scores correlate with paper impact. Our findings suggest strong institutional bias in accept/reject decisions, even after controlling for paper quality. Furthermore, we find evidence for a gender gap, with female authors receiving lower scores, lower acceptance rates, and fewer citations per paper than their male counterparts. We conclude our work with recommendations for future conference organizers. ", "keywords": "Conference Review;OpenReview;Gender;Bias;Reproducibility;Fairness", "primary_area": "", "supplementary_material": "/attachment/90f2576cbcb62a594eb2f7c5bc47c1c312ea3222.zip", "author": "David Tran;Alexander V Valtchanov;Keshav R Ganapathy;Raymond Feng;Eric Victor Slud;Micah Goldblum;Tom Goldstein", "authorids": "~David_Tran1;~Alexander_V_Valtchanov2;~Keshav_R_Ganapathy1;~Raymond_Feng1;slud@umd.edu;~Micah_Goldblum1;~Tom_Goldstein1", "gender": "M;M;M;M;;;M", "homepage": ";;https://keshavganapathy.github.io/;;;;https://www.cs.umd.edu/~tomg/", "dblp": ";;;276/5253;;241/7231;25/8184", "google_scholar": ";;;;;pGDKzuUAAAAJ;KmSuVtgAAAAJ", "orcid": ";;;;;;", "linkedin": "david-tran-a4130a1aa/;alexvaltchanov/;;raymond-feng-4a3473195/;;;", "or_profile": "~David_Tran1;~Alexander_V_Valtchanov2;~Keshav_R_Ganapathy1;~Raymond_Feng1;slud@umd.edu;~Micah_Goldblum1;~Tom_Goldstein1", "aff": "University of California, Berkeley;Princeton University;University of Maryland, College Park;Harvard University;;University of Maryland, College Park;University of Maryland, College Park", "aff_domain": "berkeley.edu;princeton.edu;umd.edu;harvard.edu;;umd.edu;umd.edu", "position": "Undergrad student;Undergrad student;Undergrad student;Undergrad student;;Postdoc;Associate Professor", "bibtex": "@misc{\ntran2021an,\ntitle={An Open Review of OpenReview: A Critical Analysis of the Machine Learning Conference Review Process},\nauthor={David Tran and Alexander V Valtchanov and Keshav R Ganapathy and Raymond Feng and Eric Victor Slud and Micah Goldblum and Tom Goldstein},\nyear={2021},\nurl={https://openreview.net/forum?id=Cn706AbJaKW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=Cn706AbJaKW", "pdf_size": 0, "rating": "3;5;6;6", "confidence": "3;3;3;3", "wc_review": "230;544;235;325", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "203;1206;590;294", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 333.5, 127.27627430122237 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 573.25, 392.338740758544 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 41, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17176781955348319670&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;2;3;2;2", "aff_unique_norm": "University of California, Berkeley;Princeton University;University of Maryland;Harvard University", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.berkeley.edu;https://www.princeton.edu;https://www/umd.edu;https://www.harvard.edu", "aff_unique_abbr": "UC Berkeley;Princeton;UMD;Harvard", "aff_campus_unique_index": "0;2;2;2", "aff_campus_unique": "Berkeley;;College Park", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2816", "id": "Cnon5ezMHtu", "poster": "", "openreview": "https://openreview.net/forum?id=Cnon5ezMHtu", "slides": "https://iclr.cc/virtual/2021/poster/2816", "video": "https://iclr.cc/virtual/2021/poster/2816", "author_site": "Wuyang Chen, Xinyu Gong, Zhangyang Wang", "tldr": "", "abstract": "Neural Architecture Search (NAS) has been explosively studied to automate the discovery of top-performer neural networks. Current works require heavy training of supernet or intensive architecture evaluations, thus suffering from heavy resource consumption and often incurring search bias due to truncated training or approximations. Can we select the best neural architectures without involving any training and eliminate a drastic portion of the search cost? \n\nWe provide an affirmative answer, by proposing a novel framework called \\textit{training-free neural architecture search} ($\\textbf{TE-NAS}$). TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK), and the number of linear regions in the input space. Both are motivated by recent theory advances in deep networks, and can be computed without any training. We show that: (1) these two measurements imply the $\\textit{trainability}$ and $\\textit{expressivity}$ of a neural network; and (2) they strongly correlate with the network's actual test accuracy. Further on, we design a pruning-based NAS mechanism to achieve a more flexible and superior trade-off between the trainability and expressivity during the search. In NAS-Bench-201 and DARTS search spaces, TE-NAS completes high-quality search but only costs $\\textbf{0.5}$ and $\\textbf{4}$ GPU hours with one 1080Ti on CIFAR-10 and ImageNet, respectively. We hope our work to inspire more attempts in bridging between the theoretic findings of deep networks and practical impacts in real NAS applications.", "keywords": "Neural Architecture Search;neural tangent kernel;number of linear regions", "primary_area": "", "supplementary_material": "", "author": "Wuyang Chen;Xinyu Gong;Zhangyang Wang", "authorids": "~Wuyang_Chen1;~Xinyu_Gong1;~Zhangyang_Wang1", "gender": ";M;M", "homepage": ";https://gongxinyuu.github.io;https://vita-group.github.io", "dblp": ";215/5405;119/4026", "google_scholar": ";A8e8UNAAAAAJ;pxFyKAIAAAAJ", "orcid": ";0000-0002-6993-136X;", "linkedin": ";xinyu-gong-b4ab73191/;", "or_profile": "~Wuyang_Chen1;~Xinyu_Gong1;~Zhangyang_Wang1", "aff": ";University of Texas, Austin;University of Texas, Austin", "aff_domain": ";utexas.edu;utexas.edu", "position": ";PhD student;Assistant Professor", "bibtex": "@inproceedings{\nchen2021neural,\ntitle={Neural Architecture Search on ImageNet in Four {\\{}GPU{\\}} Hours: A Theoretically Inspired Perspective},\nauthor={Wuyang Chen and Xinyu Gong and Zhangyang Wang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Cnon5ezMHtu}\n}", "github": "[![github](/images/github_icon.svg) VITA-Group/TENAS](https://github.com/VITA-Group/TENAS) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=Cnon5ezMHtu)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "4;6;6;8", "confidence": "5;4;4;4", "wc_review": "328;682;465;465", "wc_reply_reviewers": "0;0;0;149", "wc_reply_authors": "1275;1474;506;599", "reply_reviewers": "0;0;0;1", "reply_authors": "3;3;2;1", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 485.0, 126.74580860920017 ], "wc_reply_reviewers_avg": [ 37.25, 64.51889258194068 ], "wc_reply_authors_avg": [ 963.5, 418.2729372072738 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 331, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8900374722066786979&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Cnon5ezMHtu", "email": ";utexas.edu;utexas.edu", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "University of Texas at Austin", "aff_unique_dep": "", "aff_unique_url": "https://www.utexas.edu", "aff_unique_abbr": "UT Austin", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Austin", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "CrWzAigsUEu", "title": "FSPN: A New Class of Probabilistic Graphical Model", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We introduce factorize-sum-split-product networks (FSPNs), a new class of probabilistic graphical models (PGMs). FSPNs are designed to overcome the drawbacks of existing PGMs in terms of estimation accuracy and inference efficiency. Specifically, Bayesian networks (BNs) have low inference speed and performance of tree-structured sum-product networks(SPNs) significantly degrades in presence of highly correlated variables. FSPNs absorb their advantages by adaptively modeling the joint distribution of variables according to their dependence degree, so that one can simultaneously attain the two desirable goals\u2014high estimation accuracy and fast inference speed. We present efficient probability inference and structure learning algorithms for FSPNs, along with a theoretical analysis and extensive evaluation evidence. Our experimental results on synthetic and benchmark datasets indicate the superiority of FSPN over other PGMs.", "keywords": "FSPN;Probabilistic Graphical Model;Bayesian Network;Sum-Product Network", "primary_area": "", "supplementary_material": "", "author": "Ziniu Wu;Rong Zhu;Andreas Pfadler;Yuxing Han;Jiangneng Li;Zhengping Qian;Kai Zeng;Jingren Zhou", "authorids": "~Ziniu_Wu1;~Rong_Zhu2;~Andreas_Pfadler1;~Yuxing_Han3;~Jiangneng_Li1;zhengping.qzp@alibaba-inc.com;zengkai.zk@alibaba-inc.com;~Jingren_Zhou1", "gender": "M;M;;M;M;;;M", "homepage": "https://www.ziniuwu.com/;;;;https://www.jiangnengli.com/;;;", "dblp": ";;;91/7908-2;257/7440;;;84/2644", "google_scholar": ";i0cC60cAAAAJ;;;b6qDJ7UAAAAJ;;;", "orcid": ";;;;0000-0002-4387-5320;;;", "linkedin": ";;;yuxing-han-43638057/;;;;", "or_profile": "~Ziniu_Wu1;~Rong_Zhu2;~Andreas_Pfadler1;~Yuxing_Han3;~Jiangneng_Li1;zhengping.qzp@alibaba-inc.com;zengkai.zk@alibaba-inc.com;~Jingren_Zhou1", "aff": ";;Alibaba Group;Alibaba Group;Alibaba Group;;;Alibaba Group", "aff_domain": ";;alibaba-inc.com;alibaba-inc.com;alibaba-inc.com;;;alibaba-inc.com", "position": ";;Senior Algorithm Engineer;Researcher;Intern;;;Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=CrWzAigsUEu", "pdf_size": 0, "rating": "4;4;5;7", "confidence": "5;5;4;5", "wc_review": "2004;331;273;227", "wc_reply_reviewers": "0;28;0;0", "wc_reply_authors": "5339;1592;508;264", "reply_reviewers": "0;1;0;0", "reply_authors": "10;3;1;1", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 708.75, 748.7203666923987 ], "wc_reply_reviewers_avg": [ 7.0, 12.12435565298214 ], "wc_reply_authors_avg": [ 1925.75, 2033.0452989296623 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 3.75, 3.6996621467371855 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10746637589214015996&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Alibaba Group", "aff_unique_dep": "", "aff_unique_url": "https://www.alibaba.com", "aff_unique_abbr": "Alibaba", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "CrY1vHr_wHC", "title": "SmoothLRP: Smoothing Explanations of Neural Network Decisions by Averaging over Stochastic Input Variations", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "With the excessive use of neural networks in safety critical domains the need for understandable explanations of their predictions is rising. Several methods were developed which identify the most relevant inputs, such as sensitivity analysis and most prominently layerwise relevance propagation (LRP). \nIt has been shown that the noise in the explanations from the sensitivity analysis can be heavily reduced by averaging over noisy versions of the input image, a method referred to as SmoothGrad. \nWe investigate the application of the same principle to LRP and find that it smooths the resulting relevance function leading to improved explanations for state-of-the-art LRP rules. The method, that we refer to as SmoothLRP, even produces good explanations on poorly trained neural networks, where former methods show unsatisfactory results. Interestingly, we observed, that SmoothLRP can also be applied to the identification of adversarial examples. ", "keywords": "Explainability;uncertainty;adversarial example detection", "primary_area": "", "supplementary_material": "", "author": "Arne Peter Raulf;Ben Luis Hack;Sina D\u00e4ubener;Axel Mosig;Asja Fischer", "authorids": "arne.raulf@rub.de;ben.hack@rub.de;~Sina_D\u00e4ubener1;~Axel_Mosig1;~Asja_Fischer1", "gender": ";;;;F", "homepage": ";;;;", "dblp": ";;;;76/8485", "google_scholar": ";;;;FyZbyIUAAAAJ", "orcid": ";;;;0000-0002-1916-7033", "linkedin": ";;;;", "or_profile": "arne.raulf@rub.de;ben.hack@rub.de;~Sina_D\u00e4ubener1;~Axel_Mosig1;~Asja_Fischer1", "aff": ";;;;Ruhr-Universit\u00e4t Bochum", "aff_domain": ";;;;ruhr-uni-bochum.de", "position": ";;;;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=CrY1vHr_wHC", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ihPX3btp-HMJ:scholar.google.com/&scioq=SmoothLRP:+Smoothing+Explanations+of+Neural+Network+Decisions+by+Averaging+over+Stochastic+Input+Variations&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Ruhr-Universit\u00e4t Bochum", "aff_unique_dep": "", "aff_unique_url": "https://www.ruhr-uni-bochum.de", "aff_unique_abbr": "RUB", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "title": "Deciphering and Optimizing Multi-Task Learning: a Random Matrix Approach", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2634", "id": "Cri3xz59ga", "poster": "", "openreview": "https://openreview.net/forum?id=Cri3xz59ga", "slides": "https://iclr.cc/virtual/2021/poster/2634", "video": "https://iclr.cc/virtual/2021/poster/2634", "author_site": "Malik Tiomoko, Hafiz Tiomoko Ali, Romain Couillet", "tldr": "", "abstract": "This article provides theoretical insights into the inner workings of multi-task and transfer learning methods, by studying the tractable least-square support vector machine multi-task learning (LS-SVM MTL) method, in the limit of large ($p$) and numerous ($n$) data. By a random matrix analysis applied to a Gaussian mixture data model, the performance of MTL LS-SVM is shown to converge, as $n,p\\to\\infty$, to a deterministic limit involving simple (small-dimensional) statistics of the data.\n\nWe prove (i) that the standard MTL LS-SVM algorithm is in general strongly biased and may dramatically fail (to the point that individual single-task LS-SVMs may outperform the MTL approach, even for quite resembling tasks): our analysis provides a simple method to correct these biases, and that we reveal (ii) the sufficient statistics at play in the method, which can be efficiently estimated, even for quite small datasets. The latter result is exploited to automatically optimize the hyperparameters without resorting to any cross-validation procedure. \n\nExperiments on popular datasets demonstrate that our improved MTL LS-SVM method is computationally-efficient and outperforms sometimes much more elaborate state-of-the-art multi-task and transfer learning techniques.", "keywords": "Transfer Learning;Multi Task Learning;Random Matrix Theory", "primary_area": "", "supplementary_material": "/attachment/704558e8af4bbac57bca69a0da8ea838a4bcad15.zip", "author": "Malik Tiomoko;Hafiz Tiomoko Ali;Romain Couillet", "authorids": "~Malik_Tiomoko1;~Hafiz_Tiomoko_Ali1;~Romain_Couillet1", "gender": "M;M;", "homepage": ";;", "dblp": "228/9231;177/9093;00/2812", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Malik_Tiomoko1;~Hafiz_Tiomoko_Ali1;~Romain_Couillet1", "aff": "UPSud/INRIA University Paris-Saclay;;University of Grenoble-Alpes", "aff_domain": "u-psud.fr;;univ-grenoble-alpes.fr", "position": "PhD student;;Full Professor", "bibtex": "@inproceedings{\ntiomoko2021deciphering,\ntitle={Deciphering and Optimizing Multi-Task Learning: a Random Matrix Approach},\nauthor={Malik Tiomoko and Hafiz Tiomoko Ali and Romain Couillet},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Cri3xz59ga}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "3;3;3;3", "wc_review": "303;279;232;467", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "342;431;401;496", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 320.25, 88.49117187606909 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 417.5, 55.49099025968089 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9351033558751663574&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=Cri3xz59ga", "email": "u-psud.fr;;univ-grenoble-alpes.fr", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University Paris-Saclay;University of Grenoble-Alpes", "aff_unique_dep": ";", "aff_unique_url": "https://www.universite-paris-saclay.fr;https://www.univ-grenoble-alpes.fr", "aff_unique_abbr": "UPSa;UGA", "aff_campus_unique_index": "0", "aff_campus_unique": "Paris-Saclay;", "aff_country_unique_index": "0;0", "aff_country_unique": "France" }, { "id": "Cue2ZEBf12", "title": "Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variational Inference", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent works have applied Bayesian Neural Network (BNN) to adversarial training, and shown the improvement of adversarial robustness via the BNN's strength of stochastic gradient defense. However, we have found that in general, the BNN loses its stochasticity after its training with the BNN's posterior. As a result, the lack of the stochasticity leads to weak regularization effect to the BNN, which increases KL divergence in ELBO from variational inference. In this paper, we propose an enhanced Bayesian regularizer through hierarchical variational inference in order to boost adversarial robustness against gradient-based attack. Furthermore, we also prove that the proposed method allows the BNN's stochasticity to be elevated with the reduced KL divergence. Exhaustive experiment results demonstrate the effectiveness of the proposed method by showing the improvement of adversarial robustness, compared with adversarial training (Madry et al., 2018) and adversarial-BNN (Liu et al., 2019) under PGD attack and EOT-PGD attack to the $L_{\\infty}$ perturbation on CIFAR-10/100, STL-10, and Tiny-ImageNet.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/52a4231e8b47fa50d5d7f8003d07525a6cfdbb7c.zip", "author": "Byung-Kwan Lee;Youngjoon Yu;Yong Man Ro", "authorids": "~Byung-Kwan_Lee1;~Youngjoon_Yu1;~Yong_Man_Ro1", "gender": "M;M;M", "homepage": "https://sites.google.com/view/byungkwanlee;https://sites.google.com/business.kaist.edu/youngjoon-yu;https://www.ivllab.kaist.ac.kr/people/professor", "dblp": "68/55.html/;266/1289;02/1221", "google_scholar": "https://scholar.google.co.kr/citations?hl=en;;https://scholar.google.co.kr/citations?user=IPzfF7cAAAAJ", "orcid": ";;0000-0001-5306-6853", "linkedin": "byung-kwan-lee-82333716a/;;", "or_profile": "~Byung-Kwan_Lee1;~Youngjoon_Yu1;~Yong_Man_Ro1", "aff": "KAIST;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "position": "PhD student;PhD student;Full Professor", "bibtex": "@misc{\nlee2021towards,\ntitle={Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variational Inference},\nauthor={Byung-Kwan Lee and Youngjoon Yu and Yong Man Ro},\nyear={2021},\nurl={https://openreview.net/forum?id=Cue2ZEBf12}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=Cue2ZEBf12", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;4;3;4", "wc_review": "210;756;349;350", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "664;569;934;641", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 416.25, 204.25520189214276 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 702.0, 138.4539634680062 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9816148999594949268&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "CxGPf2BPVA", "title": "Regularization Shortcomings for Continual Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "In most machine learning algorithms, training data is assumed to be independent and identically distributed (iid). \nWhen it is not the case, the performances of the algorithms are challenged, leading to the famous phenomenon of \\textit{catastrophic forgetting}. Algorithms dealing with it are gathered in the \\textit{Continual Learning} research field. In this paper, we study the \\textit{regularization} based approaches to continual learning and show that those approaches can not learn to discriminate classes from different tasks in an elemental continual benchmark, the class-incremental setting.\nWe make theoretical reasoning to prove this shortcoming and illustrate it with experiments.\nMoreover, we show that it can have some important consequences on multi-tasks reinforcement learning or in pre-trained models used for continual learning.\nWe believe this paper to be the first to propose a theoretical description of regularization shortcomings for continual learning. ", "keywords": "Continual Learning;Regularization", "primary_area": "", "supplementary_material": "", "author": "Timothee LESORT;Andrei Stoian", "authorids": "~Timothee_LESORT1;andrei.stoian@thalesgroup.com", "gender": "M;", "homepage": ";", "dblp": ";", "google_scholar": "5NttkuoAAAAJ;", "orcid": ";", "linkedin": "https://fr.linkedin.com/in/timoth\u00e9e-lesort-128039aa;", "or_profile": "~Timothee_LESORT1;andrei.stoian@thalesgroup.com", "aff": "Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal;", "aff_domain": "mila.umontreal.ca;", "position": "Postdoc;", "bibtex": "@misc{\nlesort2021regularization,\ntitle={Regularization Shortcomings for Continual Learning},\nauthor={Timothee LESORT and Andrei Stoian},\nyear={2021},\nurl={https://openreview.net/forum?id=CxGPf2BPVA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=CxGPf2BPVA", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "5;4;4;2", "wc_review": "893;1034;230;307", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 616.0, 352.1114880261648 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7608859102526822, "gs_citation": 54, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5677266845562014662&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "University of Montreal", "aff_unique_dep": "Montreal Institute for Learning Algorithms", "aff_unique_url": "https://www.umontreal.ca", "aff_unique_abbr": "UM", "aff_campus_unique_index": "0", "aff_campus_unique": "Montreal", "aff_country_unique_index": "0", "aff_country_unique": "Canada" }, { "title": "SAFENet: A Secure, Accurate and Fast Neural Network Inference", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3278", "id": "Cz3dbFm5u-", "poster": "", "openreview": "https://openreview.net/forum?id=Cz3dbFm5u-", "slides": "https://iclr.cc/virtual/2021/poster/3278", "video": "https://iclr.cc/virtual/2021/poster/3278", "author_site": "Qian Lou, Yilin Shen, Hongxia Jin, Lei Jiang", "tldr": "", "abstract": "The advances in neural networks have driven many companies to provide prediction services to users in a wide range of applications. However, current prediction systems raise privacy concerns regarding the user's private data. A cryptographic neural network inference service is an efficient way to allow two parties to execute neural network inference without revealing either party\u2019s data or model. Nevertheless, existing cryptographic neural network inference services suffer from huge running latency; in particular, the latency of communication-expensive cryptographic activation function is 3 orders of magnitude higher than plaintext-domain activation function. And activations are the necessary components of the modern neural networks. Therefore, slow cryptographic activation has become the primary obstacle of efficient cryptographic inference. \n\nIn this paper, we propose a new technique, called SAFENet, to enable a Secure, Accurate and Fast nEural Network inference service. To speedup secure inference and guarantee inference accuracy, SAFENet includes channel-wise activation approximation with multiple-degree options. This is implemented by keeping the most useful activation channels and replacing the remaining, less useful, channels with various-degree polynomials. SAFENet also supports mixed-precision activation approximation by automatically assigning different replacement ratios to various layer; further increasing the approximation ratio and reducing inference latency. Our experimental results show SAFENet obtains the state-of-the-art inference latency and performance, reducing latency by $38\\% \\sim 61\\%$ or improving accuracy by $1.8\\% \\sim 4\\%$ over prior techniques on various encrypted datasets.", "keywords": "Cryptographic inference;Channel-Wise Approximated Activation;Hyper-Parameter Optimization;Garbled Circuits", "primary_area": "", "supplementary_material": "", "author": "Qian Lou;Yilin Shen;Hongxia Jin;Lei Jiang", "authorids": "~Qian_Lou1;~Yilin_Shen1;~Hongxia_Jin1;~Lei_Jiang1", "gender": "M;M;;M", "homepage": "https://qlou.org;;;https://www.jianglei.org", "dblp": "207/3962.html;30/383;;96/1994-1.html", "google_scholar": "SBYgXLoAAAAJ;9PSFMzAAAAAJ;;-1sXorAAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Qian_Lou1;~Yilin_Shen1;~Hongxia_Jin1;~Lei_Jiang1", "aff": "Indiana University, Bloomington;Samsung Research America;;Indiana University", "aff_domain": "iu.edu;gmail.com;;iu.edu", "position": "PhD student;Principal Researcher;;Assistant Professor", "bibtex": "@inproceedings{\nlou2021safenet,\ntitle={{\\{}SAFEN{\\}}et: A Secure, Accurate and Fast Neural Network Inference},\nauthor={Qian Lou and Yilin Shen and Hongxia Jin and Lei Jiang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Cz3dbFm5u-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "3;3;3;4", "wc_review": "204;687;184;285", "wc_reply_reviewers": "0;87;0;0", "wc_reply_authors": "513;754;406;361", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 340.0, 203.87864037215866 ], "wc_reply_reviewers_avg": [ 21.75, 37.67210506462308 ], "wc_reply_authors_avg": [ 508.5, 152.1126227503819 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 73, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9991212389106888644&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "pdf": "https://openreview.net/pdf?id=Cz3dbFm5u-", "email": "iu.edu;gmail.com;;iu.edu", "author_num": 4, "aff_unique_index": "0;1;0", "aff_unique_norm": "Indiana University;Samsung", "aff_unique_dep": ";Samsung Research America", "aff_unique_url": "https://www.indiana.edu;https://www.samsung.com/us/careers/research/", "aff_unique_abbr": "IU;SRA", "aff_campus_unique_index": "0", "aff_campus_unique": "Bloomington;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "CzRSsOG6JDw", "title": "The impacts of known and unknown demonstrator irrationality on reward inference", "track": "main", "status": "Reject", "tldr": "", "abstract": "Algorithms inferring rewards from human behavior typically assume that people are (approximately) rational. In reality, people exhibit a wide array of irrationalities. Motivated by understanding the benefits of modeling these irrationalities, we analyze the effects that demonstrator irrationality has on reward inference. We propose operationalizing several forms of irrationality in the language of MDPs, by altering the Bellman optimality equation, and use this framework to study how these alterations affect inference. \n\nWe find that incorrectly assuming noisy-rationality for an irrational demonstrator can lead to remarkably poor reward inference accuracy, even in situations where inference with the correct model leads to good inference. This suggests a need to either model irrationalities or find reward inference algorithms that are more robust to misspecification of the demonstrator model. Surprisingly, we find that if we give the learner access to the correct model of the demonstrator's irrationality, these irrationalities can actually help reward inference. In other words, if we could choose between a world where humans were perfectly rational and the current world where humans have systematic biases, the current world might counter-intuitively be preferable for reward inference. We reproduce this effect in several domains. While this finding is mainly conceptual, it is perhaps actionable as well: we might ask human demonstrators for myopic demonstrations instead of optimal ones, as they are more informative for the learner and might be easier for a human to generate.", "keywords": "irrationality;reward learning;irl", "primary_area": "", "supplementary_material": "", "author": "Lawrence Chan;Andrew Critch;Anca Dragan", "authorids": "~Lawrence_Chan2;~Andrew_Critch1;~Anca_Dragan1", "gender": "M;M;F", "homepage": "https://chanlawrence.me/;http://acritch.com/;http://www.ancadragan.com/", "dblp": "28/2626;;", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;F3_yOXUAAAAJ;", "orcid": ";;", "linkedin": ";acritch;", "or_profile": "~Lawrence_Chan2;~Andrew_Critch1;~Anca_Dragan1", "aff": "University of California, Berkeley;University of California, Berkeley;University of California, Berkeley", "aff_domain": "berkeley.edu;berkeley.edu;berkeley.edu", "position": "PhD student;Postdoc;Associate Professor", "bibtex": "@misc{\nchan2021the,\ntitle={The impacts of known and unknown demonstrator irrationality on reward inference},\nauthor={Lawrence Chan and Andrew Critch and Anca Dragan},\nyear={2021},\nurl={https://openreview.net/forum?id=CzRSsOG6JDw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=CzRSsOG6JDw", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "3;5;3;3", "wc_review": "646;597;609;433", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "897;626;644;795", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 571.25, 81.83634583728674 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 740.5, 111.67475095114384 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9980503548847719359&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of California, Berkeley", "aff_unique_dep": "", "aff_unique_url": "https://www.berkeley.edu", "aff_unique_abbr": "UC Berkeley", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Berkeley", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "D04TGKz5rfF", "title": "A frequency domain analysis of gradient-based adversarial examples", "track": "main", "status": "Reject", "tldr": "", "abstract": "It is well known that deep neural networks are vulnerable to adversarial examples. We attempt to understand adversarial examples from the perspective of frequency analysis. Several works have empirically shown that the gradient-based adversarial attacks perform differently in the low-frequency and high-frequency part of the input data. But there is still a lack of theoretical justification of these phenomena. In this work, we both theoretically and empirically show that the adversarial perturbations gradually increase the concentration in the low-frequency domain of the spectrum during the training process of the model parameters. And the log-spectrum difference of the adversarial examples and clean image is more concentrated in the high-frequency part than the low-frequency part. We also find out that the ratio of the high-frequency and the low-frequency part in the adversarial perturbation is much larger than that in the corresponding natural image. Inspired by these important theoretical findings, we apply low-pass filter to potential adversarial examples before feeding them to the model. The results show that this preprocessing can significantly improve the robustness of the model.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/6dfdb600bc911718d0e700b2f86dff9a090c7236.zip", "author": "Bochen Lv;Pu Yang;Zehao Wang;Zhanxing Zhu", "authorids": "bochen.lv@gmail.com;1700010695@pku.edu.cn;~Zehao_Wang2;~Zhanxing_Zhu1", "gender": ";;M;M", "homepage": ";;https://zehao.mathemusics.com/;https://zhanxingzhu.github.io/", "dblp": ";;;87/7756.html", "google_scholar": ";;;a2sHceIAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "bochen.lv@gmail.com;1700010695@pku.edu.cn;~Zehao_Wang2;~Zhanxing_Zhu1", "aff": ";;Peking University;Peking University", "aff_domain": ";;pku.edu.cn;pku.edu.cn", "position": ";;Undergrad student;Assistant Professor", "bibtex": "@misc{\nlv2021a,\ntitle={A frequency domain analysis of gradient-based adversarial examples},\nauthor={Bochen Lv and Pu Yang and Zehao Wang and Zhanxing Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=D04TGKz5rfF}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=D04TGKz5rfF", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "5;4;4;4", "wc_review": "205;558;251;176", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "317;306;306;211", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 297.5, 152.758796800708 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 285.0, 42.95928304802118 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.6831300510639732, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3047561606290441793&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Peking University", "aff_unique_dep": "", "aff_unique_url": "http://www.pku.edu.cn", "aff_unique_abbr": "Peking U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "D1E1h-K3jso", "title": "Learning from Noisy Data with Robust Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning from noisy data has attracted much attention, where most methods focus on label noise. In this work, we propose a new framework which simultaneously addresses three types of noise commonly seen in real-world data: label noise, out-of-distribution input, and input corruption. In contrast to most existing methods, we combat noise by learning robust representation. Specifically, we embed images into a low-dimensional subspace by training an autoencoder on the deep features. We regularize the geometric structure of the subspace with robust contrastive learning, which includes an unsupervised consistency loss and a supervised mixup prototypical loss. Furthermore, we leverage the structure of the learned subspace for noise cleaning, by aggregating information from neighboring samples. Experiments on multiple benchmarks demonstrate state-of-the-art performance of our method and robustness of the learned representation. Our code will be released.", "keywords": "label noise;out-of-distribution noise;contrastive learning", "primary_area": "", "supplementary_material": "/attachment/f0c8585bb9a1acd4e25213ead95c2fa5f1115a45.zip", "author": "Junnan Li;Caiming Xiong;Steven Hoi", "authorids": "~Junnan_Li2;~Caiming_Xiong1;~Steven_Hoi2", "gender": "M;M;M", "homepage": "http://cmxiong.com/;http://stevenhoi.com;https://sites.google.com/site/junnanlics/", "dblp": "80/7282;;193/6773-1.html", "google_scholar": "vaSdahkAAAAJ;JoLjflYAAAAJ;MuUhwi0AAAAJ", "orcid": ";;", "linkedin": "caiming-xiong-150a1417;;", "or_profile": "~Caiming_Xiong1;~Steven_Hoi2;~Junnan_li1", "aff": "Salesforce Research;Singapore Management University;Salesforce Research", "aff_domain": "salesforce.com;smu.edu.sg;salesforce.com", "position": "Research Scientist;Associate Professor;Research Scientist", "bibtex": "@misc{\nli2021learning,\ntitle={Learning from Noisy Data with Robust Representation Learning},\nauthor={Junnan Li and Caiming Xiong and Steven Hoi},\nyear={2021},\nurl={https://openreview.net/forum?id=D1E1h-K3jso}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=D1E1h-K3jso", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;4;3;4", "wc_review": "161;344;603;339", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "266;487;457;276", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 361.75, 157.58707910231726 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 371.5, 101.11997824366854 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 138, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16046430110914608194&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;0", "aff_unique_norm": "Salesforce;Singapore Management University", "aff_unique_dep": "Salesforce Research;", "aff_unique_url": "https://research.salesforce.com;https://www.smu.edu.sg", "aff_unique_abbr": "Salesforce;SMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;Singapore" }, { "id": "D2Fp_qheYu", "title": "Max-sliced Bures Distance for Interpreting Discrepancies", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose the max-sliced Bures distance, a lower bound on the max-sliced Wasserstein-2 distance, to identify the instances associated with the maximum discrepancy between two samples. The max-slicing can be decomposed into two asymmetric divergences each expressed in terms of an optimal slice or equivalently a witness function that has large magnitude evaluations on a localized subset of instances in one distribution versus the other. We show how witness functions can be used to detect and correct for covariate shift through reweighting and to evaluate generative adversarial networks. Unlike heuristic algorithms for the max-sliced Wasserstein-2 distance that may fail to find the optimal slice, we detail a tractable algorithm that finds the global optimal slice and scales to large sample sizes. As the Bures distance quantifies differences in covariance, we generalize the max-sliced Bures distance by using non-linear mappings, enabling it to capture changes in higher-order statistics. We explore two types of non-linear mappings: positive semidefinite kernels where the witness functions belong to a reproducing kernel Hilbert space, and task-relevant mappings corresponding to a neural network. In the context of samples of natural images, our approach provides an interpretation of the Fr\u00e9chet Inception distance by identifying the synthetic and natural instances that are either over-represented or under-represented with respect to the other sample. We apply the proposed measure to detect imbalances in class distributions in various data sets and to critique generative models.", "keywords": "covariance;covariate shift;distance metrics;divergence;generative adversarial networks;interpretable approaches;kernel methods;probability metric;RKHS", "primary_area": "", "supplementary_material": "/attachment/091f198bce8e4a8031f39afffe1b83e3627087db.zip", "author": "Austin J. Brockmeier;Claudio Cesar Claros;Carlos H. Mendoza-Cardenas;Y\u00fcksel Karahan;Matthew S. Emigh;Luis Gonzalo Sanchez Giraldo", "authorids": "~Austin_J._Brockmeier1;cesar@udel.edu;cmendoza@udel.edu;ykarahan@udel.edu;matthew.emigh@navy.mil;~Luis_Gonzalo_Sanchez_Giraldo2", "gender": "M;;;;;", "homepage": "https://www.eecis.udel.edu/~ajbrock/;;;;;", "dblp": "24/9878;;;;;", "google_scholar": "g_QoCQQAAAAJ;;;;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Austin_J._Brockmeier1;cesar@udel.edu;cmendoza@udel.edu;ykarahan@udel.edu;matthew.emigh@navy.mil;~Luis_Gonzalo_Sanchez_Giraldo2", "aff": "University of Delaware;;;;;", "aff_domain": "udel.edu;;;;;", "position": "Assistant Professor;;;;;", "bibtex": "@misc{\nbrockmeier2021maxsliced,\ntitle={Max-sliced Bures Distance for Interpreting Discrepancies},\nauthor={Austin J. Brockmeier and Claudio Cesar Claros and Carlos H. Mendoza-Cardenas and Y{\\\"u}ksel Karahan and Matthew S. Emigh and Luis Gonzalo Sanchez Giraldo},\nyear={2021},\nurl={https://openreview.net/forum?id=D2Fp_qheYu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=D2Fp_qheYu", "pdf_size": 0, "rating": "5;6;7", "confidence": "3;2;3", "wc_review": "597;404;503", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "762;231;354", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 2.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 501.3333333333333, 78.80073321711896 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 449.0, 226.94933355266767 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12729531924916115393&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Delaware", "aff_unique_dep": "", "aff_unique_url": "https://www.udel.edu", "aff_unique_abbr": "UD", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "D2TE6VTJG9", "title": "Predicting What You Already Know Helps: Provable Self-Supervised Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks), that do not require labeled data, to learn semantic representations. These pretext tasks are created solely using the input features, such as predicting a missing image patch, recovering the color channels of an image from context, or predicting missing words, yet predicting this \\textit{known} information helps in learning representations effective for downstream prediction tasks. This paper posits a mechanism based on approximate conditional independence to formalize how solving certain pretext tasks can learn representations that provably decrease the sample complexity of downstream supervised tasks. Formally, we quantify how the approximate independence between the components of the pretext task (conditional on the label and latent variables) allows us to learn representations that can solve the downstream task with drastically reduced sample complexity by just training a linear layer on top of the learned representation. ", "keywords": "theory;self-supervised learning;representation learning;unsupervised learning;conditional independence", "primary_area": "", "supplementary_material": "", "author": "Jason D. Lee;Qi Lei;Nikunj Saunshi;Jiacheng Zhuo", "authorids": "~Jason_D._Lee1;~Qi_Lei1;~Nikunj_Saunshi1;~Jiacheng_Zhuo1", "gender": "M;F;;", "homepage": "https://jasondlee88.github.io/;https://cecilialeiqi.github.io/;https://www.nikunjsaunshi.com/;http://www.cs.utexas.edu/~jzhuo/", "dblp": "88/3262;;199/2236;198/0672", "google_scholar": "GR_DsT0AAAAJ;kGOgaowAAAAJ;F24vXggAAAAJ;GlArL6AAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Jason_D._Lee1;~Qi_Lei1;~Nikunj_Saunshi1;~Jiacheng_Zhuo1", "aff": "Princeton University;Princeton University;Princeton University;University of Texas, Austin", "aff_domain": "princeton.edu;princeton.edu;princeton.edu;utexas.edu", "position": "Assistant Professor;Postdoc;PhD student;PhD student", "bibtex": "@misc{\nlee2021predicting,\ntitle={Predicting What You Already Know Helps: Provable Self-Supervised Learning},\nauthor={Jason D. Lee and Qi Lei and Nikunj Saunshi and Jiacheng Zhuo},\nyear={2021},\nurl={https://openreview.net/forum?id=D2TE6VTJG9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=D2TE6VTJG9", "pdf_size": 0, "rating": "6;6;6;6;6", "confidence": "3;3;5;3;3", "wc_review": "853;682;356;103;637", "wc_reply_reviewers": "0;0;0;0;113", "wc_reply_authors": "893;290;125;98;1048", "reply_reviewers": "0;0;0;0;1", "reply_authors": "2;1;1;1;2", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.4, 0.8 ], "wc_review_avg": [ 526.2, 265.10933593519485 ], "wc_reply_reviewers_avg": [ 22.6, 45.2 ], "wc_reply_authors_avg": [ 490.8, 400.16466610634177 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 1.4, 0.4898979485566356 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 220, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8893024002209834362&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Princeton University;University of Texas at Austin", "aff_unique_dep": ";", "aff_unique_url": "https://www.princeton.edu;https://www.utexas.edu", "aff_unique_abbr": "Princeton;UT Austin", "aff_campus_unique_index": "1", "aff_campus_unique": ";Austin", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3256", "id": "D3PcGLdMx0", "poster": "", "openreview": "https://openreview.net/forum?id=D3PcGLdMx0", "slides": "https://iclr.cc/virtual/2021/poster/3256", "video": "https://iclr.cc/virtual/2021/poster/3256", "author_site": "Nanyi Fei, Zhiwu Lu, Tao Xiang, Songfang Huang", "tldr": "", "abstract": "Most recent few-shot learning (FSL) approaches are based on episodic training whereby each episode samples few training instances (shots) per class to imitate the test condition. However, this strict adhering to test condition has a negative side effect, that is, the trained model is susceptible to the poor sampling of few shots. In this work, for the first time, this problem is addressed by exploiting inter-episode relationships. Specifically, a novel meta-learning via modeling episode-level relationships (MELR) framework is proposed. By sampling two episodes containing the same set of classes for meta-training, MELR is designed to ensure that the meta-learned model is robust against the presence of poorly-sampled shots in the meta-test stage. This is achieved through two key components: (1) a Cross-Episode Attention Module (CEAM) to improve the ability of alleviating the effects of poorly-sampled shots, and (2) a Cross-Episode Consistency Regularization (CECR) to enforce that the two classifiers learned from the two episodes are consistent even when there are unrepresentative instances. Extensive experiments for non-transductive standard FSL on two benchmarks show that our MELR achieves 1.0%-5.0% improvements over the baseline (i.e., ProtoNet) used for FSL in our model and outperforms the latest competitors under the same settings.", "keywords": "few-shot learning;episodic training;cross-episode attention", "primary_area": "", "supplementary_material": "", "author": "Nanyi Fei;Zhiwu Lu;Tao Xiang;Songfang Huang", "authorids": "~Nanyi_Fei1;~Zhiwu_Lu1;~Tao_Xiang1;~Songfang_Huang1", "gender": "M;M;M;", "homepage": ";https://gsai.ruc.edu.cn/luzhiwu;https://www.surrey.ac.uk/people/tao-xiang;https://www.coe.pku.edu.cn/teaching/all_time/13007.html", "dblp": "232/2227;53/5234;22/4460-2.html;05/4919", "google_scholar": "Oz6VqeQAAAAJ;OUXS8doAAAAJ;MeS5d4gAAAAJ;3So9lV8AAAAJ", "orcid": ";;0000-0002-2530-1059;", "linkedin": ";;;", "or_profile": "~Nanyi_Fei1;~Zhiwu_Lu1;~Tao_Xiang1;~Songfang_Huang1", "aff": "Renmin University of China;Renmin University of China;University of Surrey;Alibaba Group", "aff_domain": "ruc.edu.cn;ruc.edu.cn;surrey.ac.uk;alibaba-inc.com", "position": "PhD student;Full Professor;Full Professor;Senior Staff Engineer", "bibtex": "@inproceedings{\nfei2021melr,\ntitle={{\\{}MELR{\\}}: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning},\nauthor={Nanyi Fei and Zhiwu Lu and Tao Xiang and Songfang Huang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=D3PcGLdMx0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "3;5;4;5", "wc_review": "267;462;843;209", "wc_reply_reviewers": "0;244;290;0", "wc_reply_authors": "584;1060;1651;11", "reply_reviewers": "0;2;1;0", "reply_authors": "2;3;4;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 445.25, 248.0286021812807 ], "wc_reply_reviewers_avg": [ 133.5, 134.48698821819158 ], "wc_reply_authors_avg": [ 826.5, 603.7733432340318 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.5, 1.118033988749895 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 133, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17204907890595635748&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=D3PcGLdMx0", "email": "ruc.edu.cn;ruc.edu.cn;surrey.ac.uk;alibaba-inc.com", "author_num": 4, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "Renmin University of China;University of Surrey;Alibaba Group", "aff_unique_dep": ";;", "aff_unique_url": "http://www.ruc.edu.cn;https://www.surrey.ac.uk;https://www.alibaba.com", "aff_unique_abbr": "RUC;Surrey;Alibaba", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "China;United Kingdom" }, { "id": "D3TNqCspFpM", "title": "Identifying Treatment Effects under Unobserved Confounding by Causal Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "As an important problem of causal inference, we discuss the estimation of treatment effects under the existence of unobserved confounding. By representing the confounder as a latent variable, we propose Counterfactual VAE, a new variant of variational autoencoder, based on recent advances in identifiability of representation learning. Combining the identifiability and classical identification results of causal inference, under mild assumptions on the generative model and with small noise on the outcome, we theoretically show that the confounder is identifiable up to an affine transformation and then the treatment effects can be identified. Experiments on synthetic and semi-synthetic datasets demonstrate that our method matches the state-of-the-art, even under settings violating our formal assumptions.", "keywords": "VAE;variational autoencoder;Representation Learning;treatment effects;causal inference;Unobserved Confounding;identifiability;CATE;ATE", "primary_area": "", "supplementary_material": "", "author": "Pengzhou Abel Wu;Kenji Fukumizu", "authorids": "~Pengzhou_Abel_Wu1;~Kenji_Fukumizu1", "gender": "M;M", "homepage": ";http://www.ism.ac.jp/~fukumizu/", "dblp": "256/1725;96/464", "google_scholar": "4IuyryIAAAAJ;", "orcid": ";0000-0002-3488-2625", "linkedin": ";", "or_profile": "~Pengzhou_Abel_Wu1;~Kenji_Fukumizu1", "aff": "The Institute of Statistical Mathematics;The Institute of Statistical Mathematics, Japan, Tokyo Institute of Technology", "aff_domain": "ism.ac.jp;ism.ac.jp", "position": "PhD student;Full Professor", "bibtex": "@misc{\nwu2021identifying,\ntitle={Identifying Treatment Effects under Unobserved Confounding by Causal Representation Learning},\nauthor={Pengzhou Abel Wu and Kenji Fukumizu},\nyear={2021},\nurl={https://openreview.net/forum?id=D3TNqCspFpM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=D3TNqCspFpM", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "4;4;4;2", "wc_review": "1234;860;257;482", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "466;455;229;261", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 708.25, 372.2421087142077 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 352.75, 108.41211878752301 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9271726499455306, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8295608101917036378&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Institute of Statistical Mathematics", "aff_unique_dep": "", "aff_unique_url": "https://www.ism.ac.jp", "aff_unique_abbr": "ISM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "id": "D4A-v0kltaX", "title": "Neural Partial Differential Equations with Functional Convolution", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present a lightweighted neural PDE representation to discover the hidden structure and predict the solution of different nonlinear PDEs. Our key idea is to leverage the prior of ``\"translational similarity\" of numerical PDE differential operators to drastically reduce the scale of learning model and training data. We implemented three central network components, including a neural functional convolution operator, a Picard forward iterative procedure, and an adjoint backward gradient calculator. Our novel paradigm fully leverages the multifaceted priors that stem from the sparse and smooth nature of the physical PDE solution manifold and the various mature numerical techniques such as adjoint solver, linearization, and iterative procedure to accelerate the computation. We demonstrate the efficacy of our method by robustly discovering the model and accurately predicting the solutions of various types of PDEs with small-scale networks and training sets. We highlight that all the PDE examples we showed were trained with up to 8 data samples and within 325 network parameters. ", "keywords": "neural PDE;functional convolution;adjoint method", "primary_area": "", "supplementary_material": "/attachment/b29388297304b42d5f92b231bdf14c0016e73ad6.zip", "author": "Ziqian Wu;Xingzhe He;Michael Zhang;Yijun Li;Cheng Yang;Rui Liu;Shiying Xiong;Bo Zhu", "authorids": "~Ziqian_Wu1;~Xingzhe_He1;~Michael_Zhang6;~Yijun_Li4;~Cheng_Yang4;~Rui_Liu7;~Shiying_Xiong1;~Bo_Zhu2", "gender": "F;M;M;F;M;M;M;M", "homepage": ";https://xingzhehe.github.io/;;https://yijun-li-20.github.io;;;;https://faculty.cc.gatech.edu/~bozhu/", "dblp": ";258/0493;;;;;;", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;25tDZpwAAAAJ;;;https://scholar.google.com.hk/citations?user=wU90N1YAAAAJ;UT_1yjgAAAAJ;https://scholar.google.com.hk/citations?user=eq5bc5oAAAAJ;atNjbs0AAAAJ", "orcid": ";;;;;;0000-0002-0468-4249;", "linkedin": ";;michaelzhang21;;cheng-yang-b610b992/;;;", "or_profile": "~Ziqian_Wu1;~Xingzhe_He1;~Michael_Zhang6;~Yijun_Li4;~Cheng_Yang4;~Rui_Liu7;~Shiying_Xiong1;~Bo_Zhu2", "aff": "Dartmouth College;University of British Columbia;The Lawrenceville School;University of California, Los Angeles;ByteDance Inc.;Dartmouth College;Dartmouth College;Dartmouth College", "aff_domain": "dartmouth.edu;cs.ubc.ca;lawrenceville.org;cs.ucla.edu;bytedance.com;dartmouth.edu;dartmouth.edu;dartmouth.edu", "position": "PhD student;PhD student;High Schooler;MS student;Principal Researcher;PhD student;Postdoc;Assistant Professor", "bibtex": "@misc{\nwu2021neural,\ntitle={Neural Partial Differential Equations with Functional Convolution},\nauthor={Ziqian Wu and Xingzhe He and Michael Zhang and Yijun Li and Cheng Yang and Rui Liu and Shiying Xiong and Bo Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=D4A-v0kltaX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=D4A-v0kltaX", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;3;2;4", "wc_review": "737;990;540;494", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 690.25, 195.65578831202515 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=622464377435564636&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;2;3;4;0;0;0", "aff_unique_norm": "Dartmouth College;University of British Columbia;Lawrenceville School;University of California, Los Angeles;ByteDance", "aff_unique_dep": ";;;;", "aff_unique_url": "https://www.dartmouth.edu;https://www.ubc.ca;https://www.thelawrenceschool.org;https://www.ucla.edu;https://www.bytedance.com", "aff_unique_abbr": "Dartmouth;UBC;;UCLA;ByteDance", "aff_campus_unique_index": "1", "aff_campus_unique": ";Los Angeles", "aff_country_unique_index": "0;1;0;0;2;0;0;0", "aff_country_unique": "United States;Canada;China" }, { "id": "D4QFCXGe_z2", "title": "R-LAtte: Attention Module for Visual Control via Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Attention mechanisms are generic inductive biases that have played a critical role in improving the state-of-the-art in supervised learning, unsupervised pre-training and generative modeling for multiple domains including vision, language and speech. However, they remain relatively under-explored for neural network architectures typically used in reinforcement learning (RL) from high dimensional inputs such as pixels. In this paper, we propose and study the effectiveness of augmenting a simple attention module in the convolutional encoder of an RL agent. Through experiments on the widely benchmarked DeepMind Control Suite environments, we demonstrate that our proposed module can (i) extract interpretable task-relevant information such as agent locations and movements without the need for data augmentations or contrastive losses; (ii) significantly improve the sample-efficiency and final performance of the agents. We hope our simple and effective approach will serve as a strong baseline for future research incorporating attention mechanisms in reinforcement learning and control.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Mandi Zhao;Qiyang Li;Aravind Srinivas;Ignasi Clavera;Kimin Lee;Pieter Abbeel", "authorids": "~Mandi_Zhao1;~Qiyang_Li1;~Aravind_Srinivas1;~Ignasi_Clavera1;~Kimin_Lee1;~Pieter_Abbeel2", "gender": "F;M;;;M;M", "homepage": "https://mandizhao.github.io;https://colinqiyangli.github.io/;https://people.eecs.berkeley.edu/~aravind/;;https://sites.google.com/view/kiminlee;https://people.eecs.berkeley.edu/~pabbeel/", "dblp": "336/3180;;218/5157;;183/6849;", "google_scholar": "zBw2w_wAAAAJ;qlwwdfEAAAAJ;GhrKC1gAAAAJ;;92M8xv4AAAAJ;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Mandi_Zhao1;~Qiyang_Li1;~Aravind_Srinivas1;~Ignasi_Clavera1;~Kimin_Lee1;~Pieter_Abbeel2", "aff": "University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;Covariant", "aff_domain": "berkeley.edu;berkeley.edu;berkeley.edu;berkeley.edu;berkeley.edu;covariant.ai", "position": "MS student;PhD student;PhD student;PhD student;Postdoc;Founder", "bibtex": "@misc{\nzhao2021rlatte,\ntitle={R-{\\{}LA{\\}}tte: Attention Module for Visual Control via Reinforcement Learning},\nauthor={Mandi Zhao and Qiyang Li and Aravind Srinivas and Ignasi Clavera and Kimin Lee and Pieter Abbeel},\nyear={2021},\nurl={https://openreview.net/forum?id=D4QFCXGe_z2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=D4QFCXGe_z2", "pdf_size": 0, "rating": "4;4;5", "confidence": "4;5;3", "wc_review": "832;475;757", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "631;346;398", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 688.0, 153.69450217883528 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 458.3333333333333, 123.92560492309714 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:bcvLtWrMl8wJ:scholar.google.com/&scioq=R-LAtte:+Attention+Module+for+Visual+Control+via+Reinforcement+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;0;1", "aff_unique_norm": "University of California, Berkeley;Covariant", "aff_unique_dep": ";", "aff_unique_url": "https://www.berkeley.edu;", "aff_unique_abbr": "UC Berkeley;", "aff_campus_unique_index": "0;0;0;0;0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States;" }, { "id": "D51irFX8UOG", "title": "HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving", "track": "main", "status": "Reject", "tldr": "", "abstract": "Humans learn compositional and causal abstraction, \\ie, knowledge, in response to the structure of naturalistic tasks. When presented with a problem-solving task involving some objects, toddlers would first interact with these objects to reckon what they are and what can be done with them. Leveraging these concepts, they could understand the internal structure of this task, without seeing all of the problem instances. Remarkably, they further build cognitively executable strategies to \\emph{rapidly} solve novel problems. To empower a learning agent with similar capability, we argue there shall be three levels of generalization in how an agent represents its knowledge: perceptual, conceptual, and algorithmic. In this paper, we devise the very first systematic benchmark that offers joint evaluation covering all three levels. This benchmark is centered around a novel task domain, HALMA, for visual concept development and rapid problem solving. Uniquely, HALMA has a minimum yet complete concept space, upon which we introduce a novel paradigm to rigorously diagnose and dissect learning agents' capability in understanding and generalizing complex and structural concepts. We conduct extensive experiments on reinforcement learning agents with various inductive biases and carefully report their proficiency and weakness.", "keywords": "Visual Concept Development;Rapid Problem Solving;Abstract Reasoning", "primary_area": "", "supplementary_material": "", "author": "Sirui Xie;Xiaojian Ma;Peiyu Yu;Yixin Zhu;Ying Nian Wu;Song-Chun Zhu", "authorids": "~Sirui_Xie1;~Xiaojian_Ma1;~Peiyu_Yu1;~Yixin_Zhu1;~Ying_Nian_Wu1;~Song-Chun_Zhu1", "gender": "M;;;M;;M", "homepage": "https://www.siruixie.com;;;https://yzhu.io/;;https://zhusongchun.net/", "dblp": "232/3072;;;91/1103-1.html;;10/10313", "google_scholar": "9GJn5FIAAAAJ;;;qG9l6JEAAAAJ;;https://scholar.google.com.tw/citations?user=Al8dyb4AAAAJ", "orcid": ";;;0000-0001-7024-1545;;", "linkedin": ";;;;;", "or_profile": "~Sirui_Xie1;~Xiaojian_Ma1;~Peiyu_Yu1;~Yixin_Zhu1;~Ying_Nian_Wu1;~Song-Chun_Zhu1", "aff": "University of California, Los Angeles;;;University of California, Los Angeles;;Peking University", "aff_domain": "ucla.edu;;;ucla.edu;;pku.edu.cn", "position": "PhD student;;;Postdoc;;Full Professor", "bibtex": "@misc{\nxie2021halma,\ntitle={{\\{}HALMA{\\}}: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving},\nauthor={Sirui Xie and Xiaojian Ma and Peiyu Yu and Yixin Zhu and Ying Nian Wu and Song-Chun Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=D51irFX8UOG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=D51irFX8UOG", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "2;3;3;2", "wc_review": "311;232;257;625", "wc_reply_reviewers": "0;0;107;0", "wc_reply_authors": "1333;726;1070;1434", "reply_reviewers": "0;0;1;0", "reply_authors": "2;1;2;3", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 2.5, 0.5 ], "wc_review_avg": [ 356.25, 157.7678278357156 ], "wc_reply_reviewers_avg": [ 26.75, 46.332359102467464 ], "wc_reply_authors_avg": [ 1140.75, 273.8515793271969 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 23, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2071129516729431887&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of California, Los Angeles;Peking University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ucla.edu;http://www.pku.edu.cn", "aff_unique_abbr": "UCLA;Peking U", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;China" }, { "id": "D5Wt3FtvCF", "title": "PURE: An Uncertainty-aware Recommendation Framework for Maximizing Expected Posterior Utility of Platform", "track": "main", "status": "Reject", "tldr": "", "abstract": "Commercial recommendation can be regarded as an interactive process between the recommendation platform and its target users. One crucial problem for the platform is how to make full use of its advantages so as to maximize its utility, i.e., the commercial benefits from recommendation. In this paper, we propose a novel recommendation framework which effectively utilizes the information of user uncertainty over different item dimensions and explicitly takes into consideration the impact of display policy on user in order to achieve maximal expected posterior utility for the platform. We formulate the problem of deriving optimal policy to achieve maximal expected posterior utility as a constrained non-convex optimization problem and further propose an ADMM-based solution to derive an approximately optimal policy. Extensive experiments are conducted over data collected from a real-world recommendation platform and demonstrate the effectiveness of the proposed framework. Besides, we also adopt the proposed framework to conduct experiments with an intent to reveal how the platform achieves its commercial benefits. The results suggest that the platform should cater to the user's preference for item dimensions that the user prefers, while for item dimensions where the user is with high uncertainty, the platform can achieve more commercial benefits by recommending items with high utilities.", "keywords": "commercial recommendation;maximizing platform benefits;uncertainty-aware;influence of display policy;non-convex optimization", "primary_area": "", "supplementary_material": "", "author": "Haokun Chen;Zhaoyang Liu;Chen Xu;Ziqian Chen;Jinyang Gao;Bolin Ding", "authorids": "~Haokun_Chen1;jingmu.lzy@alibaba-inc.com;~Chen_Xu2;~Ziqian_Chen1;jinyang.gjy@alibaba-inc.com;~Bolin_Ding3", "gender": "M;;M;M;;M", "homepage": ";;;;;https://bolinding.github.io/", "dblp": "218/6928;;;168/3805;;46/3522.html", "google_scholar": ";;;;;AjYkTi8AAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;bolin-ding-50a0119/", "or_profile": "~Haokun_Chen1;jingmu.lzy@alibaba-inc.com;~Chen_Xu2;~Ziqian_Chen1;jinyang.gjy@alibaba-inc.com;~Bolin_Ding3", "aff": ";;;Alibaba Group;;Alibaba Group", "aff_domain": ";;;alibaba-inc.com;;alibaba-inc.com", "position": ";;;Staff Engineer;;Senior Director", "bibtex": "@misc{\nchen2021pure,\ntitle={{\\{}PURE{\\}}: An Uncertainty-aware Recommendation Framework for Maximizing Expected Posterior Utility of Platform},\nauthor={Haokun Chen and Zhaoyang Liu and Chen Xu and Ziqian Chen and Jinyang Gao and Bolin Ding},\nyear={2021},\nurl={https://openreview.net/forum?id=D5Wt3FtvCF}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=D5Wt3FtvCF", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;1;5;3", "wc_review": "613;268;474;339", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "622;320;590;223", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.25, 1.479019945774904 ], "wc_review_avg": [ 423.5, 132.0804678974147 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 438.75, 171.10431759602093 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.25482359571881275, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13864695255152606074&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Alibaba Group", "aff_unique_dep": "", "aff_unique_url": "https://www.alibaba.com", "aff_unique_abbr": "Alibaba", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "D62nJAdpijt", "title": "Trojans and Adversarial Examples: A Lethal Combination", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this work, we naturally unify adversarial examples and Trojan backdoors into a new stealthy attack, that is activated only when 1) adversarial perturbation is injected into the input examples and 2) a Trojan backdoor is used to poison the training process simultaneously. Different from traditional attacks, we leverage adversarial noise in the input space to move Trojan-infected examples across the model decision boundary, thus making it difficult to be detected. Our attack can fool the user into accidentally trusting the infected model as a robust classifier against adversarial examples. We perform a thorough analysis and conduct an extensive set of experiments on several benchmark datasets to show that our attack can bypass existing defenses with a success rate close to 100%.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Guanxiong Liu;Issa Khalil;Abdallah Khreishah;Hai Phan", "authorids": "~Guanxiong_Liu1;ikhalil@hbku.edu.qa;~Abdallah_Khreishah1;~Hai_Phan1", "gender": "M;;M;Not Specified", "homepage": ";;;https://sites.google.com/site/ihaiphan/", "dblp": ";;;153/5204", "google_scholar": "CwglCoUAAAAJ;;https://scholar.google.com/scholar?hl=en;nsEbWjAAAAAJ", "orcid": ";;;", "linkedin": "sylorbeijing/;;;", "or_profile": "~Guanxiong_Liu1;ikhalil@hbku.edu.qa;~Abdallah_Khreishah1;~Hai_Phan1", "aff": "New Jersey Institute of Technology;;New Jersey Institute of Technology;New Jersey Institute of Technology", "aff_domain": "njit.edu;;njit.edu;njit.edu", "position": "PhD student;;Full Professor;Assistant Professor", "bibtex": "@misc{\nliu2021trojans,\ntitle={Trojans and Adversarial Examples: A Lethal Combination},\nauthor={Guanxiong Liu and Issa Khalil and Abdallah Khreishah and Hai Phan},\nyear={2021},\nurl={https://openreview.net/forum?id=D62nJAdpijt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=D62nJAdpijt", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "5;4;5;4", "wc_review": "108;197;619;251", "wc_reply_reviewers": "0;0;96;95", "wc_reply_authors": "418;600;762;374", "reply_reviewers": "0;0;1;1", "reply_authors": "1;1;1;2", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 293.75, 194.6013553395762 ], "wc_reply_reviewers_avg": [ 47.75, 47.75130888258457 ], "wc_reply_authors_avg": [ 538.5, 154.36563736790646 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.4472135954999579, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1935082299040041051&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "New Jersey Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.njit.edu", "aff_unique_abbr": "NJIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3018", "id": "D9I3drBz4UC", "poster": "", "openreview": "https://openreview.net/forum?id=D9I3drBz4UC", "slides": "https://iclr.cc/virtual/2021/poster/3018", "video": "https://iclr.cc/virtual/2021/poster/3018", "author_site": "Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, Stella Yu", "tldr": "", "abstract": "Natural data are often long-tail distributed over semantic classes. Existing recognition methods tackle this imbalanced classification by placing more emphasis on the tail data, through class re-balancing/re-weighting or ensembling over different data groups, resulting in increased tail accuracies but reduced head accuracies.\nWe take a dynamic view of the training data and provide a principled model bias and variance analysis as the training data fluctuates: Existing long-tail classifiers invariably increase the model variance and the head-tail model bias gap remains large, due to more and larger confusion with hard negatives for the tail.\nWe propose a new long-tailed classifier called RoutIng Diverse Experts (RIDE). It reduces the model variance with multiple experts, reduces the model bias with a distribution-aware diversity loss, reduces the computational cost with a dynamic expert routing module. RIDE outperforms the state-of-the-art by 5% to 7% on CIFAR100-LT, ImageNet-LT and iNaturalist 2018 benchmarks. It is also a universal framework that is applicable to various backbone networks, long-tailed algorithms and training mechanisms for consistent performance gains. Our code is available at: https://github.com/frank-xwang/RIDE-LongTailRecognition.", "keywords": "Long-tailed Recognition;Bias-variance Decomposition", "primary_area": "", "supplementary_material": "", "author": "Xudong Wang;Long Lian;Zhongqi Miao;Ziwei Liu;Stella Yu", "authorids": "~Xudong_Wang4;~Long_Lian1;~Zhongqi_Miao1;~Ziwei_Liu1;~Stella_Yu2", "gender": "M;M;;M;F", "homepage": "http://people.eecs.berkeley.edu/~xdwang/;https://github.com/TonyLianLong;;https://liuziwei7.github.io/;http://www.eecs.umich.edu/~stellayu", "dblp": ";276/0012;239/5123;05/6300-2;58/5089", "google_scholar": "Azf07WcAAAAJ;eOLxyqUAAAAJ;;https://scholar.google.com.hk/citations?user=lc45xlcAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";0000-0001-6098-189X;0000-0002-0439-8592;;", "linkedin": ";longlian/;;;", "or_profile": "~Xudong_Wang4;~Long_Lian1;~Zhongqi_Miao1;~Ziwei_Liu1;~Stella_Yu2", "aff": "University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;Nanyang Technological University;University of California, Berkeley", "aff_domain": "eecs.berkeley.edu;berkeley.edu;berkeley.edu;ntu.edu.sg;berkeley.edu", "position": "PhD student;Undergrad student;PhD student;Assistant Professor;Director, ICSI Vision Group", "bibtex": "@inproceedings{\nwang2021longtailed,\ntitle={Long-tailed Recognition by Routing Diverse Distribution-Aware Experts},\nauthor={Xudong Wang and Long Lian and Zhongqi Miao and Ziwei Liu and Stella Yu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=D9I3drBz4UC}\n}", "github": "[![github](/images/github_icon.svg) frank-xwang/RIDE-LongTailRecognition](https://github.com/frank-xwang/RIDE-LongTailRecognition) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=D9I3drBz4UC)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "7;7;7;8", "confidence": "4;4;4;5", "wc_review": "451;266;121;301", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1257;1457;532;57", "reply_reviewers": "0;0;0;0", "reply_authors": "3;2;1;1", "rating_avg": [ 7.25, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 284.75, 117.33365885371512 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 825.75, 561.6312736128572 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 489, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13544394725234163867&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=D9I3drBz4UC", "email": "eecs.berkeley.edu;berkeley.edu;berkeley.edu;ntu.edu.sg;berkeley.edu", "author_num": 5, "aff_unique_index": "0;0;0;1;0", "aff_unique_norm": "University of California, Berkeley;Nanyang Technological University", "aff_unique_dep": ";", "aff_unique_url": "https://www.berkeley.edu;https://www.ntu.edu.sg", "aff_unique_abbr": "UC Berkeley;NTU", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0;0;1;0", "aff_country_unique": "United States;Singapore" }, { "id": "D9pSaTGUemb", "title": "Implicit Acceleration of Gradient Flow in Overparameterized Linear Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "We study the implicit acceleration of gradient flow in over-parameterized two-layer linear models. We show that implicit acceleration emerges from a conservation law that constrains the dynamics to follow certain trajectories. More precisely, gradient flow preserves the difference of the Gramian~matrices of the input and output weights and we show that the amount of acceleration depends on both the magnitude of that difference (which is fixed at initialization) and the spectrum of the data. In addition, and generalizing prior work, we prove our results without assuming small, balanced or spectral initialization for the weights, and establish interesting connections between the matrix factorization problem and Riccati type differential equations.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Salma Tarmoun;Guilherme Fran\u00e7a;Benjamin David Haeffele;Rene Vidal", "authorids": "~Salma_Tarmoun1;~Guilherme_Fran\u00e7a1;~Benjamin_David_Haeffele1;~Rene_Vidal1", "gender": "F;;;", "homepage": ";;;http://www.vision.jhu.edu", "dblp": ";184/3866;;v/ReneVidal", "google_scholar": ";;;https://scholar.google.com/citations?hl=en", "orcid": ";;;", "linkedin": "salma-tarmoun-94aa5158/;;;rene-vidal-74844928/", "or_profile": "~Salma_Tarmoun1;~Guilherme_Fran\u00e7a1;~Benjamin_David_Haeffele1;~Rene_Vidal1", "aff": "University of Pennsylvania;;;Johns Hopkins University", "aff_domain": "upenn.edu;;;jhu.edu", "position": "PhD student;;;Professor", "bibtex": "@misc{\ntarmoun2021implicit,\ntitle={Implicit Acceleration of Gradient Flow in Overparameterized Linear Models},\nauthor={Salma Tarmoun and Guilherme Fran{\\c{c}}a and Benjamin David Haeffele and Rene Vidal},\nyear={2021},\nurl={https://openreview.net/forum?id=D9pSaTGUemb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=D9pSaTGUemb", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;3;4;5", "wc_review": "597;299;565;664", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "653;405;457;472", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 531.25, 138.7666656657859 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 496.75, 93.57450240316537 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:VpE9McdJndAJ:scholar.google.com/&scioq=Implicit+Acceleration+of+Gradient+Flow+in+Overparameterized+Linear+Models&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Pennsylvania;Johns Hopkins University", "aff_unique_dep": ";", "aff_unique_url": "https://www.upenn.edu;https://www.jhu.edu", "aff_unique_abbr": "UPenn;JHU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "DAaaaqPv9-q", "title": "Self-supervised Graph-level Representation Learning with Local and Global Structure", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper focuses on unsupervised/self-supervised whole-graph representation learning, which is critical in many tasks including drug and material discovery. Current methods can effectively model the local structure between different graph instances, but they fail to discover the global semantic structure of the entire dataset. In this work, we propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning. Specifically, besides preserving the local instance-level structure, GraphLoG leverages a nonparametric strategy to learn hierarchical prototypes of the data. These prototypes capture the semantic clusters in the latent space, and the number of prototypes can automatically adapt to different feature distributions. We evaluate GraphLoG by pre-training it on massive unlabeled graphs followed by fine-tuning on downstream tasks. Extensive experiments on both chemical and biological benchmark datasets demonstrate the effectiveness of our approach. ", "keywords": "Self-supervised Representation Learning;Graph Representation Learning;Hierarchical Semantic Learning", "primary_area": "", "supplementary_material": "/attachment/5d9e09603b2a9b000859921b7fe4e1ddc1ccef83.zip", "author": "Minghao Xu;Hang Wang;Bingbing Ni;Hongyu Guo;Jian Tang", "authorids": "~Minghao_Xu1;~Hang_Wang1;~Bingbing_Ni3;~Hongyu_Guo1;~Jian_Tang1", "gender": "M;M;M;M;", "homepage": "https://chrisallenming.github.io/;https://github.com/Francis0625;;https://hongyuharryguo.github.io/;http://www.jian-tang.com", "dblp": ";;64/831.html;;181/2667-5", "google_scholar": "Oh5S2skAAAAJ;r8UKYQYAAAAJ;V9W87PYAAAAJ;https://scholar.google.ca/citations?user=bZUqlakAAAAJ;https://scholar.google.ca/citations?user=1ir6WUEAAAAJ", "orcid": ";0000-0003-0417-9258;;;", "linkedin": "xuminghao118/;;;harry-h-y-guo-a582087/;", "or_profile": "~Minghao_Xu1;~Hang_Wang1;~Bingbing_Ni3;~Hongyu_Guo1;~Jian_Tang1", "aff": "ByteDance Ltd.;Shanghai Jiaotong University;Shanghai Jiaotong University;National Research Council Canada;Mila, HEC Montreal", "aff_domain": "bytedance.com;sjtu.edu;sjtu.edu.cn;nrc-cnrc.gc.ca;hec.ca", "position": "Researcher;MS student;Full Professor;Senior Research Officer;Assistant Professor", "bibtex": "@misc{\nxu2021selfsupervised,\ntitle={Self-supervised Graph-level Representation Learning with Local and Global Structure},\nauthor={Minghao Xu and Hang Wang and Bingbing Ni and Hongyu Guo and Jian Tang},\nyear={2021},\nurl={https://openreview.net/forum?id=DAaaaqPv9-q}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=DAaaaqPv9-q", "pdf_size": 0, "rating": "5;5;6;8", "confidence": "4;5;3;4", "wc_review": "414;371;138;128", "wc_reply_reviewers": "155;345;0;46", "wc_reply_authors": "763;737;168;328", "reply_reviewers": "2;1;0;1", "reply_authors": "3;2;1;2", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 262.75, 130.68545251863347 ], "wc_reply_reviewers_avg": [ 136.5, 132.88811083012655 ], "wc_reply_authors_avg": [ 499.0, 257.4597055851653 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.28867513459481287, "gs_citation": 266, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15360735332012817623&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;1;2;3", "aff_unique_norm": "ByteDance;Shanghai Jiao Tong University;National Research Council Canada;HEC Montreal", "aff_unique_dep": ";;;HEC Business School", "aff_unique_url": "https://www.bytedance.com;https://www.sjtu.edu.cn;https://www.nrc-cnrc.gc.ca;https://www.hec.ca", "aff_unique_abbr": "ByteDance;SJTU;NRC-CNRC;HEC", "aff_campus_unique_index": "1", "aff_campus_unique": ";Montreal", "aff_country_unique_index": "0;0;0;1;1", "aff_country_unique": "China;Canada" }, { "id": "DC1Im3MkGG", "title": "Exchanging Lessons Between Algorithmic Fairness and Domain Generalization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Standard learning approaches are designed to perform well on average for the data distribution available at training time. Developing learning approaches that are not overly sensitive to the training distribution is central to research on domain- or out-of-distribution generalization, robust optimization and fairness. In this work we focus on links between research on domain generalization and algorithmic fairness---where performance under a distinct but related test distributions is studied---and show how the two fields can be mutually beneficial. While domain generalization methods typically rely on knowledge of disjoint \"domains\" or \"environments\", \"sensitive\" label information indicating which demographic groups are at risk of discrimination is often used in the fairness literature. Drawing inspiration from recent fairness approaches that improve worst-case performance without knowledge of sensitive groups, we propose a novel domain generalization method that handles the more realistic scenario where environment partitions are not provided. We then show theoretically and empirically how different partitioning schemes can lead to increased or decreased generalization performance, enabling us to outperform Invariant Risk Minimization with handcrafted environments in multiple cases. We also show how a re-interpretation of IRMv1 allows us for the first time to directly optimize a common fairness criterion, group-sufficiency, and thereby improve performance on a fair prediction task.\n", "keywords": "algorithmic fairness;domain generalization;representation learning;invariance", "primary_area": "", "supplementary_material": "/attachment/79e932bd94d5335bdfe409e756ad0a4b91b3a4d8.zip", "author": "Elliot Creager;Joern-Henrik Jacobsen;Richard Zemel", "authorids": "~Elliot_Creager1;~Joern-Henrik_Jacobsen1;~Richard_Zemel1", "gender": "M;M;M", "homepage": "https://ecreager.github.io/;https://jhjacobsen.github.io/;http://www.cs.columbia.edu/~zemel", "dblp": "182/2055;180/5526.html;16/6366", "google_scholar": "boebIUcAAAAJ;https://scholar.google.de/citations?user=c1FYGAQAAAAJ;https://scholar.google.ca/citations?user=iBeDoRAAAAAJ", "orcid": "0009-0004-7122-3866;;", "linkedin": ";;", "or_profile": "~Elliot_Creager1;~Joern-Henrik_Jacobsen1;~Richard_Zemel1", "aff": "University of Toronto;Apple;Department of Computer Science, University of Toronto", "aff_domain": "toronto.edu;apple.com;cs.toronto.edu", "position": "PhD student;Researcher;Full Professor", "bibtex": "@misc{\ncreager2021exchanging,\ntitle={Exchanging Lessons Between Algorithmic Fairness and Domain Generalization},\nauthor={Elliot Creager and Joern-Henrik Jacobsen and Richard Zemel},\nyear={2021},\nurl={https://openreview.net/forum?id=DC1Im3MkGG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=DC1Im3MkGG", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "3;4;3;2", "wc_review": "306;833;1038;357", "wc_reply_reviewers": "0;108;0;0", "wc_reply_authors": "31;566;154;23", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 633.5, 311.09845708392703 ], "wc_reply_reviewers_avg": [ 27.0, 46.76537180435969 ], "wc_reply_authors_avg": [ 193.5, 221.24251399764918 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16724804438214703399&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Toronto;Apple", "aff_unique_dep": ";Apple Inc.", "aff_unique_url": "https://www.utoronto.ca;https://www.apple.com", "aff_unique_abbr": "U of T;Apple", "aff_campus_unique_index": "1", "aff_campus_unique": ";Toronto", "aff_country_unique_index": "0;1;0", "aff_country_unique": "Canada;United States" }, { "id": "DE0MSwKv32y", "title": "Trust, but verify: model-based exploration in sparse reward environments", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose $\\textit{trust-but-verify}$ (TBV) mechanism, a new method which uses model uncertainty estimates to guide exploration. The mechanism augments graph search planning algorithms by the capacity to deal with learned model's imperfections. We identify certain type of frequent model errors, which we dub $\\textit{false loops}$, and which are particularly dangerous for graph search algorithms in discrete environments. These errors impose falsely pessimistic expectations and thus hinder exploration. We confirm this experimentally and show that TBV can effectively alleviate them. TBV combined with MCTS or Best First Search forms an effective model-based reinforcement learning solution, which is able to robustly solve sparse reward problems. ", "keywords": "reinforcement learning;model-based;exploration;on-line planning;imperfect environment model", "primary_area": "", "supplementary_material": "", "author": "Konrad Czechowski;Tomasz Odrzyg\u00f3\u017ad\u017a;Micha\u0142 Izworski;Marek Zbysi\u0144ski;\u0141ukasz Kuci\u0144ski;Piotr Mi\u0142o\u015b", "authorids": "~Konrad_Czechowski1;tomaszo@impan.pl;m.izworski@student.uw.edu.pl;marek.zbysinski@gmail.com;~\u0141ukasz_Kuci\u0144ski1;~Piotr_Mi\u0142o\u015b1", "gender": ";;;;M;", "homepage": "https://www.linkedin.com/in/konrad-czechowski-723bb6150/;;;;https://sites.google.com/view/lukaszkucinski;", "dblp": "237/9612;;;;250/9699;208/0989.html", "google_scholar": "ni7tRv4AAAAJ;;;;l6dK-VUAAAAJ;Se68XecAAAAJ", "orcid": ";;;;0000-0002-5617-8129;", "linkedin": ";;;;https://linkedin.com/in/lukasz-kucinski;piotr-milos-4b02151/", "or_profile": "~Konrad_Czechowski1;tomaszo@impan.pl;m.izworski@student.uw.edu.pl;marek.zbysinski@gmail.com;~\u0141ukasz_Kuci\u0144ski1;~Piotr_Mi\u0142o\u015b1", "aff": "University of Warsaw;;;;Institute of Mathematics Polish Academy of Sciences;Polish Academy of Science", "aff_domain": "mimuw.edu.pl;;;;impan.pl;impan.gov.pl", "position": "PhD student;;;;Assistant Professor;Associate Professor", "bibtex": "@misc{\nczechowski2021trust,\ntitle={Trust, but verify: model-based exploration in sparse reward environments},\nauthor={Konrad Czechowski and Tomasz Odrzyg{\\'o}{\\'z}d{\\'z} and Micha{\\l} Izworski and Marek Zbysi{\\'n}ski and {\\L}ukasz Kuci{\\'n}ski and Piotr Mi{\\l}o{\\'s}},\nyear={2021},\nurl={https://openreview.net/forum?id=DE0MSwKv32y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=DE0MSwKv32y", "pdf_size": 0, "rating": "2;4;4;6", "confidence": "4;3;4;3", "wc_review": "853;191;285;309", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "594;113;345;199", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.0, 1.4142135623730951 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 409.5, 259.8244599724976 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 312.75, 182.33262873111877 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:CfXcllPxhX0J:scholar.google.com/&scioq=Trust,+but+verify:+model-based+exploration+in+sparse+reward+environments&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0;1;1", "aff_unique_norm": "University of Warsaw;Polish Academy of Sciences", "aff_unique_dep": ";Institute of Mathematics", "aff_unique_url": "https://www.uw.edu.pl;https://www.impan.pl/", "aff_unique_abbr": "UW;PAS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Poland" }, { "title": "Interpretable Models for Granger Causality Using Self-explaining Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2605", "id": "DEa4JdMWRHp", "poster": "", "openreview": "https://openreview.net/forum?id=DEa4JdMWRHp", "slides": "https://iclr.cc/virtual/2021/poster/2605", "video": "https://iclr.cc/virtual/2021/poster/2605", "author_site": "Ri\u010dards Marcinkevi\u010ds, Julia E Vogt", "tldr": "", "abstract": "Exploratory analysis of time series data can yield a better understanding of complex dynamical systems. Granger causality is a practical framework for analysing interactions in sequential data, applied in a wide range of domains. In this paper, we propose a novel framework for inferring multivariate Granger causality under nonlinear dynamics based on an extension of self-explaining neural networks. This framework is more interpretable than other neural-network-based techniques for inferring Granger causality, since in addition to relational inference, it also allows detecting signs of Granger-causal effects and inspecting their variability over time. In comprehensive experiments on simulated data, we show that our framework performs on par with several powerful baseline methods at inferring Granger causality and that it achieves better performance at inferring interaction signs. The results suggest that our framework is a viable and more interpretable alternative to sparse-input neural networks for inferring Granger causality.", "keywords": "time series;Granger causality;interpretability;inference;neural networks", "primary_area": "", "supplementary_material": "/attachment/bc40911f1003beb08f8fcf94aafbe00af1ef6655.zip", "author": "Ri\u010dards Marcinkevi\u010ds;Julia E Vogt", "authorids": "~Ri\u010dards_Marcinkevi\u010ds1;~Julia_E_Vogt1", "gender": "F;M", "homepage": "http://mds.inf.ethz.ch;https://rmarcinkevics.github.io/", "dblp": "13/8412;234/8553", "google_scholar": "UoeV-8kAAAAJ;https://scholar.google.ch/citations?user=XcxXOJsAAAAJ", "orcid": ";0000-0001-8901-5062", "linkedin": "julia-vogt-50b53895;ri%C4%8Dards-m-668568106?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3Byeq5%2FsReRoWG3HN7r6A5Lw%3D%3D", "or_profile": "~Julia_E_Vogt1;~Ricards_Marcinkevics1", "aff": "Swiss Federal Institute of Technology;Swiss Federal Institute of Technology", "aff_domain": "ethz.ch;inf.ethz.ch", "position": "Assistant Professor;PhD student", "bibtex": "@inproceedings{\nmarcinkevi{\\v{c}}s2021interpretable,\ntitle={Interpretable Models for Granger Causality Using Self-explaining Neural Networks},\nauthor={Ri{\\v{c}}ards Marcinkevi{\\v{c}}s and Julia E Vogt},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=DEa4JdMWRHp}\n}", "github": "[![github](/images/github_icon.svg) i6092467/GVAR](https://github.com/i6092467/GVAR)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "4;6;6;8", "confidence": "4;4;4;4", "wc_review": "217;733;440;178", "wc_reply_reviewers": "0;0;81;0", "wc_reply_authors": "559;1972;1343;64", "reply_reviewers": "0;0;1;0", "reply_authors": "1;3;3;1", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 392.0, 220.79741846316955 ], "wc_reply_reviewers_avg": [ 20.25, 35.074028853269766 ], "wc_reply_authors_avg": [ 984.5, 730.0768795133838 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 87, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4744598609789823872&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 15, "pdf": "https://openreview.net/pdf?id=DEa4JdMWRHp", "email": "ethz.ch;inf.ethz.ch", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Swiss Federal Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Switzerland" }, { "id": "DFIoGDZejIB", "title": "Benefits of Assistance over Reward Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Much recent work has focused on how an agent can learn what to do from human feedback, leading to two major paradigms. The first paradigm is reward learning, in which the agent learns a reward model through human feedback that is provided externally from the environment. The second is assistance, in which the human is modeled as a part of the environment, and the true reward function is modeled as a latent variable in the environment that the agent may make inferences about. The key difference between the two paradigms is that in the reward learning paradigm, by construction there is a separation between reward learning and control using the learned reward. In contrast, in assistance these functions are performed as needed by a single policy. By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning. We illustrate these advantages in simple environments by showing desirable qualitative behaviors of assistive agents that cannot be found by agents based on reward learning.", "keywords": "assistance;reward learning;preference learning;active learning", "primary_area": "", "supplementary_material": "", "author": "Rohin Shah;Pedro Freire;Neel Alex;Rachel Freedman;Dmitrii Krasheninnikov;Lawrence Chan;Michael D Dennis;Pieter Abbeel;Anca Dragan;Stuart Russell", "authorids": "~Rohin_Shah1;~Pedro_Freire1;~Neel_Alex1;~Rachel_Freedman1;~Dmitrii_Krasheninnikov1;~Lawrence_Chan2;~Michael_D_Dennis1;~Pieter_Abbeel2;~Anca_Dragan1;~Stuart_Russell1", "gender": "M;M;M;F;M;M;M;M;F;M", "homepage": "http://rohinshah.com/;;https://neel-alex.github.io/;https://rachelfreedman.github.io/;https://krasheninnikov.github.io/about/;https://chanlawrence.me/;;https://people.eecs.berkeley.edu/~pabbeel/;http://www.ancadragan.com/;https://people.eecs.berkeley.edu/~russell/", "dblp": "145/1009;;;218/7198;;28/2626;;;;", "google_scholar": "odFQXSYAAAAJ;;;Mj1fmhsAAAAJ;BIQflKQAAAAJ;https://scholar.google.com/citations?view_op=list_works;WXXu26AAAAAJ;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ;;https://scholar.google.com.tw/citations?user=KJGrjCAAAAAJ", "orcid": ";;;0000-0003-3299-4313;;;;;;", "linkedin": "rohin-shah-76405832/;pedrofreirex/;;rachelalexfreedman/;;;;;;", "or_profile": "~Rohin_Shah1;~Pedro_Freire1;~Neel_Alex1;~Rachel_Freedman1;~Dmitrii_Krasheninnikov1;~Lawrence_Chan2;~Michael_D_Dennis1;~Pieter_Abbeel2;~Anca_Dragan1;~Stuart_Russell1", "aff": "Google DeepMind;;University of Cambridge;University of California, Berkeley;Sony Europe Ltd.;University of California, Berkeley;University of California, Berkeley;Covariant;University of California, Berkeley;University of California, Berkeley", "aff_domain": "deepmind.com;;cam.ac.uk;berkeley.edu;sony.com;berkeley.edu;berkeley.edu;covariant.ai;berkeley.edu;berkeley.edu", "position": "Researcher;;PhD student;PhD student;Researcher;PhD student;PhD student;Founder;Associate Professor;Full Professor", "bibtex": "@misc{\nshah2021benefits,\ntitle={Benefits of Assistance over Reward Learning},\nauthor={Rohin Shah and Pedro Freire and Neel Alex and Rachel Freedman and Dmitrii Krasheninnikov and Lawrence Chan and Michael D Dennis and Pieter Abbeel and Anca Dragan and Stuart Russell},\nyear={2021},\nurl={https://openreview.net/forum?id=DFIoGDZejIB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer5;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=DFIoGDZejIB", "pdf_size": 0, "rating": "4;5;5;6;7", "confidence": "3;2;3;5;4", "wc_review": "659;645;247;1140;1350", "wc_reply_reviewers": "0;157;0;154;1035", "wc_reply_authors": "804;797;404;1201;3786", "reply_reviewers": "0;1;0;1;6", "reply_authors": "1;2;1;2;8", "rating_avg": [ 5.4, 1.0198039027185568 ], "confidence_avg": [ 3.4, 1.019803902718557 ], "wc_review_avg": [ 808.2, 391.7853493942825 ], "wc_reply_reviewers_avg": [ 269.2, 389.1649521732398 ], "wc_reply_authors_avg": [ 1398.4, 1220.116814079701 ], "reply_reviewers_avg": [ 1.6, 2.244994432064365 ], "reply_authors_avg": [ 2.8, 2.6381811916545836 ], "replies_avg": [ 28, 0 ], "authors#_avg": [ 10, 0 ], "corr_rating_confidence": 0.6153846153846154, "gs_citation": 38, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9578905425350384788&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1;2;3;2;2;4;2;2", "aff_unique_norm": "Google;University of Cambridge;University of California, Berkeley;Sony Europe;Covariant", "aff_unique_dep": "Google DeepMind;;;;", "aff_unique_url": "https://deepmind.com;https://www.cam.ac.uk;https://www.berkeley.edu;https://www.sony.eu;", "aff_unique_abbr": "DeepMind;Cambridge;UC Berkeley;Sony Europe;", "aff_campus_unique_index": "1;2;2;2;2;2", "aff_campus_unique": ";Cambridge;Berkeley", "aff_country_unique_index": "0;0;1;0;1;1;1;1", "aff_country_unique": "United Kingdom;United States;" }, { "id": "DGIXvEAJVd", "title": "Learning Chess Blindfolded", "track": "main", "status": "Reject", "tldr": "", "abstract": "Transformer language models have made tremendous strides in natural language understanding. However, the complexity of natural language makes it challenging to ascertain how accurately these models are tracking the world state underlying the text. Motivated by this issue, we consider the task of language modeling for the game of chess. Unlike natural language, chess notations describe a simple, constrained, and deterministic domain. Moreover, we observe that chess notation itself allows for directly probing the world state, without requiring any additional probing-related machinery. Additionally, we have access to a vast number of chess games coupled with the exact state at every move, allowing us to measure the impact of various ways of including grounding during language model training. Overall, we find that with enough training data, transformer language models can learn to track pieces and predict legal moves when trained solely from move sequences. However, in adverse circumstances (small training sets or prediction following long move histories), providing access to board state information during training can yield consistent improvements.", "keywords": "Chess;Transformers;Language Modeling;World State", "primary_area": "", "supplementary_material": "", "author": "Shubham Toshniwal;Sam Wiseman;Karen Livescu;Kevin Gimpel", "authorids": "~Shubham_Toshniwal1;~Sam_Wiseman1;~Karen_Livescu1;~Kevin_Gimpel1", "gender": ";M;;M", "homepage": ";https://swiseman.github.io;;http://ttic.uchicago.edu/~kgimpel/index.html", "dblp": ";149/1260;;47/1252", "google_scholar": ";SDavuPAAAAAJ;;http://scholar.google.com/citations?user=kDHs7DYAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Shubham_Toshniwal1;~Sam_Wiseman1;~Karen_Livescu1;~Kevin_Gimpel1", "aff": ";Toyota Technological Institute at Chicago;;Toyota Technological Institute at Chicago", "aff_domain": ";ttic.edu;;ttic.edu", "position": ";Research Assistant Professor;;Assistant Professor", "bibtex": "@misc{\ntoshniwal2021learning,\ntitle={Learning Chess Blindfolded},\nauthor={Shubham Toshniwal and Sam Wiseman and Karen Livescu and Kevin Gimpel},\nyear={2021},\nurl={https://openreview.net/forum?id=DGIXvEAJVd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=DGIXvEAJVd", "pdf_size": 0, "rating": "5;5;7;7", "confidence": "3;4;4;4", "wc_review": "1061;449;510;575", "wc_reply_reviewers": "121;201;87;0", "wc_reply_authors": "1556;830;843;753", "reply_reviewers": "1;1;1;0", "reply_authors": "3;2;2;2", "rating_avg": [ 6.0, 1.0 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 648.75, 242.14703694243298 ], "wc_reply_reviewers_avg": [ 102.25, 72.09498942367631 ], "wc_reply_authors_avg": [ 995.5, 325.42779537095475 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5067877830076467319&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Toyota Technological Institute at Chicago", "aff_unique_dep": "", "aff_unique_url": "https://www.tti-chicago.org", "aff_unique_abbr": "TTI Chicago", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Chicago", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "DGttsPh502x", "title": "Unsupervised Discovery of Interpretable Latent Manipulations in Language VAEs", "track": "main", "status": "Reject", "tldr": "", "abstract": "Language generation models are attracting more and more attention due to their constantly increasing quality and remarkable generation results. State-of-the-art NLG models like BART/T5/GPT-3 do not have latent spaces, therefore there is no natural way to perform controlled generation. In contrast, less popular models with explicit latent spaces have the innate ability to manipulate text attributes by moving along latent directions. For images, properties of latent spaces are well-studied: there exist interpretable directions (e.g. zooming, aging, background removal) and they can even be found without supervision. This success is expected: latent space image models, especially GANs, achieve state-of-the-art generation results and hence have been the focus of the research community. For language, this is not the case: text GANs are hard to train because of non-differentiable discrete data generation, and language VAEs suffer from posterior collapse and fill the latent space poorly. This makes finding interpetable text controls challenging. In this work, we make the first step towards unsupervised discovery of interpretable directions in language latent spaces. For this, we turn to methods shown to work in the image domain. Surprisingly, we find that running PCA on VAE representations of training data consistently outperforms shifts along the coordinate and random directions. This approach is simple, data-adaptive, does not require training and discovers meaningful directions, e.g. sentence length, subject age, and verb tense. Our work lays foundations for two important areas: first, it allows to compare models in terms of latent space interpretability, and second, it provides a baseline for unsupervised latent controls discovery.", "keywords": "interpretability;unsupervised interpretable directions;controllable text generation", "primary_area": "", "supplementary_material": "/attachment/63059e11947f578028fe72efe15fea699e4a08a3.zip", "author": "Max Ryabinin;Artem Babenko;Elena Voita", "authorids": "~Max_Ryabinin1;~Artem_Babenko1;~Elena_Voita1", "gender": "Not Specified;M;F", "homepage": "https://mryab.github.io/;;https://lena-voita.github.io", "dblp": "276/0192;117/4834;220/4162", "google_scholar": "930PERsAAAAJ;q885d1wAAAAJ;EcN9o7kAAAAJ", "orcid": ";0000-0002-1830-8252;", "linkedin": ";;elena-voita/", "or_profile": "~Max_Ryabinin1;~Artem_Babenko1;~Elena_Voita1", "aff": "HSE University;Yandex;University of Edinburgh", "aff_domain": "hse.ru;yandex-team.ru;ed.ac.uk", "position": "MS student;Researcher;PhD student", "bibtex": "@misc{\nryabinin2021unsupervised,\ntitle={Unsupervised Discovery of Interpretable Latent Manipulations in Language {\\{}VAE{\\}}s},\nauthor={Max Ryabinin and Artem Babenko and Elena Voita},\nyear={2021},\nurl={https://openreview.net/forum?id=DGttsPh502x}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=DGttsPh502x", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "4;4;4;3", "wc_review": "394;150;229;120", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "345;324;231;215", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 223.25, 106.31880125358826 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 278.75, 56.52598959770629 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:OpZpDnib0roJ:scholar.google.com/&scioq=Unsupervised+Discovery+of+Interpretable+Latent+Manipulations+in+Language+VAEs&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Higher School of Economics;Yandex;University of Edinburgh", "aff_unique_dep": ";;", "aff_unique_url": "https://hse.ru;https://yandex.com;https://www.ed.ac.uk", "aff_unique_abbr": "HSE;Yandex;Edinburgh", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "Russian Federation;United Kingdom" }, { "id": "DHSNrGhAY7W", "title": "The Lipschitz Constant of Self-Attention", "track": "main", "status": "Reject", "tldr": "", "abstract": "Lipschitz constants of neural networks have been explored in various contexts in deep learning, such as provable adversarial robustness, estimating Wasserstein distance, stabilising training of GANs, and formulating invertible neural networks. Such works have focused on bounding the Lipschitz constant of fully connected or convolutional networks, composed of linear maps and pointwise non-linearities. In this paper, we investigate the Lipschitz constant of self-attention, a non-linear neural network module widely used in sequence modelling. We prove that the standard dot-product self-attention is *not* Lipschitz, and propose an alternative L2 self-attention that *is* Lipschitz. We derive an upper bound on the Lipschitz constant of L2 self-attention and provide empirical evidence for its asymptotic tightness. To demonstrate the practical relevance of our theoretical work, we formulate invertible self-attention and use it in a Transformer-based architecture for a character-level language modelling task.", "keywords": "Lipschitz constant;self-attention;theory", "primary_area": "", "supplementary_material": "", "author": "Hyunjik Kim;George Papamakarios;Andriy Mnih", "authorids": "~Hyunjik_Kim1;~George_Papamakarios1;~Andriy_Mnih1", "gender": "M;M;", "homepage": "https://hyunjik11.github.io/;https://gpapamak.github.io;http://www.cs.toronto.edu/~amnih/", "dblp": "180/5389;169/9771;https://dblp.uni-trier.de/pers/m/Mnih:Andriy.html", "google_scholar": "https://scholar.google.co.uk/citations?user=vxU3Zk4AAAAJ;https://scholar.google.co.uk/citations?user=wHcpf58AAAAJ;mxiO4IkAAAAJ", "orcid": ";0000-0002-2551-6543;", "linkedin": ";http://uk.linkedin.com/in/gpapamakarios;", "or_profile": "~Hyunjik_Kim1;~George_Papamakarios1;~Andriy_Mnih1", "aff": "Google DeepMind;Google DeepMind;Google DeepMind", "aff_domain": "google.com;google.com;google.com", "position": "Research Scientist;Research scientist;Research Scientist", "bibtex": "@misc{\nkim2021the,\ntitle={The Lipschitz Constant of Self-Attention},\nauthor={Hyunjik Kim and George Papamakarios and Andriy Mnih},\nyear={2021},\nurl={https://openreview.net/forum?id=DHSNrGhAY7W}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer5;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=DHSNrGhAY7W", "pdf_size": 0, "rating": "5;5;7;7", "confidence": "3;4;4;2", "wc_review": "551;298;406;285", "wc_reply_reviewers": "0;0;118;0", "wc_reply_authors": "775;767;1076;561", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 6.0, 1.0 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 385.0, 106.73097020078099 ], "wc_reply_reviewers_avg": [ 29.5, 51.09549882328188 ], "wc_reply_authors_avg": [ 794.75, 183.644187220832 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 194, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12356022541341785997&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google DeepMind", "aff_unique_url": "https://deepmind.com", "aff_unique_abbr": "DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "DHkGKg2fJay", "title": "Leveraged Weighted Loss For Partial Label Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "As an important branch of weakly supervised learning, partial label learning deals with data where each instance is assigned with a set of candidate labels, whereas only one of them is true. In this paper, we propose a family of loss functions named Leveraged Weighted (LW) loss function, which for the first time introduces the leverage parameter $\\beta$ to partial loss functions to leverage between losses on partial labels and residual labels (non-partial labels). Under mild assumptions, we achieve the relationship between the partial loss function and its corresponding ordinary loss that leads to the consistency in risk. Compared to the existing literatures, our result applies to both deterministic and stochastic scenarios, considers the loss functions of a more general form, and takes milder assumptions on the distribution of the partial label set. As special cases, with $\\beta = 1$ and $\\beta = 2$, the corresponding ordinary losses of our LW loss respectively match the binary classification loss and the \\textit{one-versus-all} (OVA) loss function. In this way, our theorems successfully explain the experimental results on parameter analysis, where $\\beta = 1$ and especially $\\beta = 2$ are considered as preferred choices for the leverage parameter $\\beta$. Last but not least, real data comparisons show the high effectiveness of our LW loss over other state-of-the-art partial label learning algorithms.", "keywords": "weakly supervised learning;loss function;risk consistency", "primary_area": "", "supplementary_material": "/attachment/97fceebb52ac1d8da71fc676a22000f6f87a8500.zip", "author": "Hongwei Wen;Hanyuan Hang;Jiabin Liu;Zhouchen Lin", "authorids": "~Hongwei_Wen1;~Hanyuan_Hang1;~Jiabin_Liu1;~Zhouchen_Lin1", "gender": ";M;M;M", "homepage": "https://www.researchgate.net/profile/Hongwei_Wen2;;;https://zhouchenlin.github.io", "dblp": "41/1357;180/5385;https://dblp.org/pers/hd/l/Liu:Jiabin.html;l/ZhouchenLin", "google_scholar": ";;;https://scholar.google.com.tw/citations?user=TanjFwoAAAAJ", "orcid": ";;;0000-0003-1493-7569", "linkedin": ";;;", "or_profile": "~Hongwei_Wen1;~Hanyuan_Hang1;~Jiabin_Liu1;~Zhouchen_Lin1", "aff": "University of Twente;Samsung;Samsung;Peking University", "aff_domain": "utwente.nl;samsung.com;samsung.com;pku.edu.cn", "position": "PhD student;Staff Engineer;Postdoc;Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=DHkGKg2fJay", "pdf_size": 0, "rating": "3;4;6;7", "confidence": "4;2;3;4", "wc_review": "471;485;118;482", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.5811388300841898 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 389.0, 156.54871446294283 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.19069251784911848, "gs_citation": 128, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5461608260366903450&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 13, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "University of Twente;Samsung;Peking University", "aff_unique_dep": ";Samsung;", "aff_unique_url": "https://www.utwente.nl;https://www.samsung.com;http://www.pku.edu.cn", "aff_unique_abbr": "UT;Samsung;Peking U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;2", "aff_country_unique": "Netherlands;South Korea;China" }, { "title": "VTNet: Visual Transformer Network for Object Goal Navigation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3287", "id": "DILxQP08O3B", "poster": "", "openreview": "https://openreview.net/forum?id=DILxQP08O3B", "slides": "https://iclr.cc/virtual/2021/poster/3287", "video": "https://iclr.cc/virtual/2021/poster/3287", "author_site": "Heming Du, Xin Yu, Liang Zheng", "tldr": "", "abstract": "Object goal navigation aims to steer an agent towards a target object based on observations of the agent. It is of pivotal importance to design effective visual representations of the observed scene in determining navigation actions. In this paper, we introduce a Visual Transformer Network (VTNet) for learning informative visual representation in navigation. VTNet is a highly effective structure that embodies two key properties for visual representations: First, the relationships among all the object instances in a scene are exploited; Second, the spatial locations of objects and image regions are emphasized so that directional navigation signals can be learned. Furthermore, we also develop a pre-training scheme to associate the visual representations with navigation signals, and thus facilitate navigation policy learning. In a nutshell, VTNet embeds object and region features with their location cues as spatial-aware descriptors and then incorporates all the encoded descriptors through attention operations to achieve informative representation for navigation. Given such visual representations, agents are able to explore the correlations between visual observations and navigation actions. For example, an agent would prioritize ``turning right'' over ``turning left'' when the visual representation emphasizes on the right side of activation map. Experiments in the artificial environment AI2-Thor demonstrate that VTNet significantly outperforms state-of-the-art methods in unseen testing environments.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Heming Du;Xin Yu;Liang Zheng", "authorids": "~Heming_Du2;~Xin_Yu1;~Liang_Zheng4", "gender": "M;M;M", "homepage": ";https://sites.google.com/view/xinyus-homepage/Home;http://zheng-lab.cecs.anu.edu.au/", "dblp": "244/8133;54/1184-2;61/7360-1", "google_scholar": "Ha3UZTwAAAAJ;oxdtuSEAAAAJ;https://scholar.google.com.au/citations?user=vNHqr3oAAAAJ", "orcid": "0000-0002-7391-0449;0000-0002-0269-5649;", "linkedin": ";;liang-zheng-76341311a/", "or_profile": "~Heming_Du2;~Xin_Yu1;~Liang_Zheng4", "aff": "Australian National University;University of Technology Sydney;Australian National University", "aff_domain": "anu.edu.au;uts.edu.au;anu.edu.au", "position": "PhD student;Lecturer;Senior Lecturer", "bibtex": "@inproceedings{\ndu2021vtnet,\ntitle={{\\{}VTN{\\}}et: Visual Transformer Network for Object Goal Navigation},\nauthor={Heming Du and Xin Yu and Liang Zheng},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=DILxQP08O3B}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "4;3;4;4", "wc_review": "336;762;229;586", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "520;1200;388;1447", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;2", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 478.25, 208.85685887707876 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 888.75, 445.88304240013434 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 114, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6925488043260471961&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=DILxQP08O3B", "email": "anu.edu.au;uts.edu.au;anu.edu.au", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "Australian National University;University of Technology Sydney", "aff_unique_dep": ";", "aff_unique_url": "https://www.anu.edu.au;https://www.uts.edu.au", "aff_unique_abbr": "ANU;UTS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Australia" }, { "id": "DM6KlL7GeB", "title": "Semi-Relaxed Quantization with DropBits: Training Low-Bit Neural Networks via Bitwise Regularization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order to overcome the nature of transforming continuous activations and weights to discrete ones, recent study called Relaxed Quantization (RQ) [Louizos et al. 2019] successfully employ the popular Gumbel-Softmax that allows this transformation with efficient gradient-based optimization. However, RQ with this Gumbel-Softmax relaxation still suffers from bias-variance trade-off depending on the temperature parameter of Gumbel-Softmax. To resolve the issue, we propose a novel method, Semi-Relaxed Quantization (SRQ) that uses multi-class straight-through estimator to effectively reduce the bias and variance, along with a new regularization technique, DropBits that replaces dropout regularization to randomly drop the bits instead of neurons to further reduce the bias of the multi-class straight-through estimator in SRQ. As a natural extension of DropBits, we further introduce the way of learning heterogeneous quantization levels to find proper bit-length for each layer using DropBits. We experimentally validate our method on various benchmark datasets and network architectures, and also support the quantized lottery ticket hypothesis: learning heterogeneous quantization levels outperforms the case using the same but fixed quantization levels from scratch.", "keywords": "Quantization;Compression;Efficient Inference;Deep Learning", "primary_area": "", "supplementary_material": "", "author": "Jung Hyun Lee;Jihun Yun;Sung Ju Hwang;Eunho Yang", "authorids": "~Jung_Hyun_Lee1;~Jihun_Yun2;~Sung_Ju_Hwang1;~Eunho_Yang1", "gender": "M;M;;M", "homepage": ";https://github.com/abcdxyzpqrst;;https://sites.google.com/site/hleehome2/", "dblp": "132/2899;241/9676;;96/2621", "google_scholar": ";ELv5qfEAAAAJ;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Jung_Hyun_Lee1;~Jihun_Yun2;~Sung_Ju_Hwang1;~Eunho_Yang1", "aff": "KAIST;Korea Advanced Institute of Science & Technology;;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr;;kaist.ac.kr", "position": "MS student;PhD student;;Associate Professor", "bibtex": "@misc{\nlee2021semirelaxed,\ntitle={Semi-Relaxed Quantization with DropBits: Training Low-Bit Neural Networks via Bitwise Regularization},\nauthor={Jung Hyun Lee and Jihun Yun and Sung Ju Hwang and Eunho Yang},\nyear={2021},\nurl={https://openreview.net/forum?id=DM6KlL7GeB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=DM6KlL7GeB", "pdf_size": 0, "rating": "5;6;7", "confidence": "4;4;2", "wc_review": "299;1315;135", "wc_reply_reviewers": "0;292;0", "wc_reply_authors": "432;1686;135", "reply_reviewers": "0;1;0", "reply_authors": "1;3;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 3.3333333333333335, 0.9428090415820634 ], "wc_review_avg": [ 583.0, 521.9144246585514 ], "wc_reply_reviewers_avg": [ 97.33333333333333, 137.65012007098127 ], "wc_reply_authors_avg": [ 751.0, 672.1711091678964 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8660254037844387, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:u8W5-_suA78J:scholar.google.com/&scioq=Semi-Relaxed+Quantization+with+DropBits:+Training+Low-Bit+Neural+Networks+via+Bitwise+Regularization&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "DMxOBm06HUx", "title": "AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Pre-trained language models such as BERT have exhibited remarkable performances in many tasks in natural language understanding (NLU). The tokens in the models are usually fine-grained in the sense that for languages like English they are words or sub-words and for languages like Chinese they are characters. In English, for example, there are multi-word expressions which form natural lexical units and thus the use of coarse-grained tokenization also appears to be reasonable. In fact, both fine-grained and coarse-grained tokenizations have advantages and disadvantages for learning of pre-trained language models. In this paper, we propose a novel pre-trained language model, referred to as AMBERT (A Multi-grained BERT), on the basis of both fine-grained and coarse-grained tokenizations. For English, AMBERT takes both the sequence of words (fine-grained tokens) and the sequence of phrases (coarse-grained tokens) as input after tokenization, employs one encoder for processing the sequence of words and the other encoder for processing the sequence of the phrases, utilizes shared parameters between the two encoders, and finally creates a sequence of contextualized representations of the words and a sequence of contextualized representations of the phrases. Experiments have been conducted on benchmark datasets for Chinese and English, including CLUE, GLUE, SQuAD and RACE. The results show that AMBERT outperforms the existing best performing models in almost all cases, particularly the improvements are significant for Chinese. We also develop a version of AMBERT which performs equally well as AMBERT but uses about half of its inference time.", "keywords": "Pre-trained Language Model;Multi-Grained Tokenization", "primary_area": "", "supplementary_material": "", "author": "Xinsong Zhang;Hang Li", "authorids": "~Xinsong_Zhang1;~Hang_Li4", "gender": "M;M", "homepage": ";https://hangli-hl.github.io/", "dblp": "04/2640;https://dblp.org/pers/hd/l/Li_0001:Hang", "google_scholar": "BnSQUocAAAAJ;nTl5mSwAAAAJ", "orcid": ";0000-0001-9628-3487", "linkedin": ";hang-li-84aa6314/", "or_profile": "~Xinsong_Zhang1;~Hang_Li4", "aff": "Bytedance AI Lab;ByteDance Technology", "aff_domain": "bytedance.com;bytedance.com", "position": "research fellow;Head of Research", "bibtex": "@misc{\nzhang2021ambert,\ntitle={{\\{}AMBERT{\\}}: A Pre-trained Language Model with Multi-Grained Tokenization},\nauthor={Xinsong Zhang and Hang Li},\nyear={2021},\nurl={https://openreview.net/forum?id=DMxOBm06HUx}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer5;AnonReviewer2", "site": "https://openreview.net/forum?id=DMxOBm06HUx", "pdf_size": 0, "rating": "3;4;5;5;7", "confidence": "4;4;4;5;3", "wc_review": "242;1361;565;200;355", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "178;1488;652;767;442", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;2;1;1;1", "rating_avg": [ 4.8, 1.32664991614216 ], "confidence_avg": [ 4.0, 0.6324555320336759 ], "wc_review_avg": [ 544.6, 427.36148633212144 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 705.4, 439.69061850351096 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.2, 0.4000000000000001 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.47673129462279606, "gs_citation": 54, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1158140124780847101&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "ByteDance", "aff_unique_dep": "AI Lab", "aff_unique_url": "https://www.bytedance.com", "aff_unique_abbr": "Bytedance AI Lab", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "title": "Fair Mixup: Fairness via Interpolation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2612", "id": "DNl5s5BXeBn", "poster": "", "openreview": "https://openreview.net/forum?id=DNl5s5BXeBn", "slides": "https://iclr.cc/virtual/2021/poster/2612", "video": "https://iclr.cc/virtual/2021/poster/2612", "author_site": "Ching-Yao Chuang, Youssef Mroueh", "tldr": "", "abstract": "Training classifiers under fairness constraints such as group fairness, regularizes the disparities of predictions between the groups. Nevertheless, even though the constraints are satisfied during training, they might not generalize at evaluation time. To improve the generalizability of fair classifiers, we propose fair mixup, a new data augmentation strategy for imposing the fairness constraint. In particular, we show that fairness can be achieved by regularizing the models on paths of interpolated samples between the groups. We use mixup, a powerful data augmentation strategy to generate these interpolates. We analyze fair mixup and empirically show that it ensures a better generalization for both accuracy and fairness measurement in tabular, vision, and language benchmarks.", "keywords": "fairness;data augmentation", "primary_area": "", "supplementary_material": "", "author": "Ching-Yao Chuang;Youssef Mroueh", "authorids": "~Ching-Yao_Chuang1;~Youssef_Mroueh1", "gender": "M;", "homepage": "https://chingyaoc.github.io/;", "dblp": "190/7522;http://dblp.uni-trier.de/pers/hd/m/Mroueh:Youssef", "google_scholar": "fpUICd0AAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";", "linkedin": ";", "or_profile": "~Ching-Yao_Chuang1;~Youssef_Mroueh1", "aff": "Massachusetts Institute of Technology;IBM", "aff_domain": "mit.edu;us.ibm.com", "position": "PhD student;Research Staff member", "bibtex": "@inproceedings{\nchuang2021fair,\ntitle={Fair Mixup: Fairness via Interpolation},\nauthor={Ching-Yao Chuang and Youssef Mroueh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=DNl5s5BXeBn}\n}", "github": "[![github](/images/github_icon.svg) chingyaoc/fair-mixup](https://github.com/chingyaoc/fair-mixup)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;4;4;4", "wc_review": "490;250;910;342", "wc_reply_reviewers": "219;0;52;72", "wc_reply_authors": "784;199;573;255", "reply_reviewers": "1;0;1;1", "reply_authors": "2;1;1;2", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 498.0, 252.8082277142103 ], "wc_reply_reviewers_avg": [ 85.75, 81.29690953535687 ], "wc_reply_authors_avg": [ 452.75, 238.579520286214 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 179, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15581530866838341454&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=DNl5s5BXeBn", "email": "mit.edu;us.ibm.com", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Massachusetts Institute of Technology;International Business Machines Corporation", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://www.ibm.com", "aff_unique_abbr": "MIT;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "DQpwoZgqyZ", "title": "Model information as an analysis tool in deep learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Information-theoretic perspectives can provide an alternative dimension of analyzing the learning process and complements usual performance metrics. Recently several works proposed methods for quantifying information content in a model (which we refer to as \"model information\"). We demonstrate using model information as a general analysis tool to gain insight into problems that arise in deep learning. By utilizing model information in different scenarios with different control variables, we are able to adapt model information to analyze fundamental elements of learning, i.e., task, data, model, and algorithm. We provide an example in each domain that model information is used as a tool to provide new solutions to problems or to gain insight into the nature of the particular learning setting. These examples help to illustrate the versatility and potential utility of model information as an analysis tool in deep learning.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Xiao Zhang;Di Hu;Xingjian Li;Dejing Dou;Ji Wu", "authorids": "~Xiao_Zhang9;~Di_Hu1;lixingjian@baidu.com;~Dejing_Dou1;wuji_ee@mail.tsinghua.edu.cn", "gender": ";M;;;", "homepage": ";https://dtaoo.github.io/;;;", "dblp": ";49/8496-1;;;", "google_scholar": "https://scholar.google.com/citations?hl=en;https://scholar.google.com.hk/citations?user=F7bvTOEAAAAJ;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Xiao_Zhang9;~Di_Hu1;lixingjian@baidu.com;~Dejing_Dou1;wuji_ee@mail.tsinghua.edu.cn", "aff": "Tsinghua University;Renmin University of China;;;", "aff_domain": "tsinghua.edu.cn;ruc.edu.cn;;;", "position": "PhD student;Assistant Professor;;;", "bibtex": "@misc{\nzhang2021model,\ntitle={Model information as an analysis tool in deep learning},\nauthor={Xiao Zhang and Di Hu and Xingjian Li and Dejing Dou and Ji Wu},\nyear={2021},\nurl={https://openreview.net/forum?id=DQpwoZgqyZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=DQpwoZgqyZ", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "3;2;3;3", "wc_review": "449;337;322;693", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "496;539;495;1140", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 450.25, 148.49473896404547 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 667.5, 273.3756572923054 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:0nN2S_CRKs4J:scholar.google.com/&scioq=Model+information+as+an+analysis+tool+in+deep+learning&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Tsinghua University;Renmin University of China", "aff_unique_dep": ";", "aff_unique_url": "https://www.tsinghua.edu.cn;http://www.ruc.edu.cn", "aff_unique_abbr": "THU;RUC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "DRc-o6DUGVf", "title": "cross-modal knowledge enhancement mechanism for few-shot learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Few-shot learning problems require models to recognize novel classes with only a few supported samples. However, it remains challenging for the model to generalize novel classes with such limited samples. Driven by human behavior, researchers introduced semantic information (e.g. novel categories descriptions, label names, etc.) onto existing methods as prior knowledge to generalize more precise class representations. Despite the promising performance, these methods are under the assumption that users are able to provide precise semantic information for all target categories and this is hard to be satisfied in a real scenario. To address this problem, we proposed a novel Cross-modality Knowledge Enhancement Mechanism(CKEM) to discover task-relevant information in external semantic knowledge automatically. CKEM first utilizes Cross-modality Graph Builder(CGB) to align two unitary modality information (support labeled images and external semantic knowledge) into a cross-modality knowledge graph. After that, with the message-passing mechanism, CKEM selects and transfers relevant knowledge from external semantic knowledge bank to original visual-based class representations in Knowledge Fusion Model(KFM). Through a series of experiments, we show that our method improves the existing metric-based meta-learning methods with 1\\% - 5\\% for 1-shot and 5-shot settings on both mini-ImageNet and tiered-ImageNet datasets.", "keywords": "few-shot learning;cross-modal;image classification", "primary_area": "", "supplementary_material": "", "author": "Haiyang Zhang;Jiaming Duan;liang liu", "authorids": "zhhy@bupt.edu.cn;~Jiaming_Duan1;liangliu@bupt.edu.cn", "gender": ";;", "homepage": ";;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "zhhy@bupt.edu.cn;~Jiaming_Duan1;liangliu@bupt.edu.cn", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=DRc-o6DUGVf", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "5;5;5;4", "wc_review": "331;445;418;249", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 360.75, 77.05314724266621 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:u7ZiR-TudEUJ:scholar.google.com/&scioq=cross-modal+knowledge+enhancement+mechanism+for+few-shot+learning&hl=en&as_sdt=0,33", "gs_version_total": 0 }, { "id": "DUbd4PNhlg", "title": "Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance --- namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster. On the other hand, we also point out an issue regarding the necessity of over-parametrization and study how the scaling of the output neurons affects the convergence time.\n\n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Jun-Kun Wang;Jacob Abernethy", "authorids": "~Jun-Kun_Wang1;~Jacob_Abernethy1", "gender": "M;M", "homepage": "https://jimwang123.github.io/;https://www.cc.gatech.edu/~jabernethy9/", "dblp": "153/5463;91/2520", "google_scholar": ";FDu4ciwAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Jun-Kun_Wang1;~Jacob_Abernethy1", "aff": "Georgia Institute of Technology;Georgia Institute of Technology", "aff_domain": "gatech.edu;cc.gatech.edu", "position": "PhD student;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=DUbd4PNhlg", "pdf_size": 0, "rating": "3;4;4;5;5", "confidence": "5;3;5;4;4", "wc_review": "399;447;318;259;328", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 4.2, 0.7483314773547882 ], "confidence_avg": [ 4.2, 0.7483314773547882 ], "wc_review_avg": [ 350.2, 65.72488113340336 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.4285714285714286, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10178392630048432320&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "Georgia Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.gatech.edu", "aff_unique_abbr": "Georgia Tech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "D_I6trPKwlt", "title": "Spectrally Similar Graph Pooling", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We consider the problem of learning compositional hierarchies of graphs. Even though structural characteristics of graphs can be learned by Graph Neural Networks (GNNs), it is difficult to find an overall compositional hierarchy using such flat operators.\nIn this paper, we propose a new graph pooling algorithm, Spectrally Similar Graph Pooling (SSGPool), to learn hierarchical representations of graphs. The main idea of the proposed SSGPool algorithm is to learn a coarsening matrix which maps nodes from an original graph to a smaller number of nodes in a coarsened graph. The coarsening matrix is trained to coarsen the nodes based on their feature vectors while keeping the spectral characteristics of the original graph in the coarsened one. Although existing graph pooling methods take either feature-based pooling or structure-preserving pooling, SSGPool considers two properties simultaneously in an end-to-end manner. Experiments on various graph benchmarks show the advantage of our method compared to strong baselines. To further investigate the effectiveness of our proposed method, we evaluate our approach on a real-world problem, image retrieval with visual scene graphs. Quantitative and qualitative analyses on the retrieval problem confirm that the proposed method efficiently captures the hierarchical semantic structure of scene graphs.", "keywords": "Graph Neural Networks;Graph Pooling;Spectral Similarity on Graph", "primary_area": "", "supplementary_material": "/attachment/2c3c9c5a8e823c6d87fdc4a3faf307ca77beb1d4.zip", "author": "Kyoung-Woon On;Eun-Sol Kim;Il-Jae Kwon;Sangwoong Yoon;Byoung-Tak Zhang", "authorids": "~Kyoung-Woon_On1;~Eun-Sol_Kim1;~Il-Jae_Kwon1;~Sangwoong_Yoon1;~Byoung-Tak_Zhang1", "gender": "M;F;M;M;", "homepage": ";;https://swyoon.github.io/;https://bi.snu.ac.kr/~btzhang/;", "dblp": "175/0873;52/10086;237/1318;09/5682;", "google_scholar": ";JhZBnfYAAAAJ;https://scholar.google.co.kr/citations?user=cH2rjfIAAAAJ;sYTUOu8AAAAJ;dUL2BPUAAAAJ", "orcid": ";;0000-0002-7251-3230;;", "linkedin": ";;;;", "or_profile": "~Kyoung-Woon_On1;~Eun-Sol_Kim1;~Sangwoong_Yoon1;~Byoung-Tak_Zhang1;~IL_JAE_KWON1", "aff": "Kakaobrain;Kakao Brain;Seoul National University;Seoul National University;Seoul National University", "aff_domain": "kakaobrain.com;kakaobrain.com;snu.ac.kr;snu.ac.kr;snu.ac.kr", "position": "Researcher;Postdoc;PhD student;Full Professor;MS student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=D_I6trPKwlt", "pdf_size": 0, "rating": "4;5;7;7", "confidence": "4;3;3;3", "wc_review": "407;308;187;239", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.75, 1.299038105676658 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 285.25, 82.36010866918522 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7777777777777777, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:4rPkS2Ejt4EJ:scholar.google.com/&scioq=Spectrally+Similar+Graph+Pooling&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1;1;1", "aff_unique_norm": "Kakao Brain;Seoul National University", "aff_unique_dep": ";", "aff_unique_url": "https://brain.kakao.com;https://www.snu.ac.kr", "aff_unique_abbr": "Kakao Brain;SNU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "South Korea" }, { "title": "Sparse encoding for more-interpretable feature-selecting representations in probabilistic matrix factorization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3262", "id": "D_KeYoqCYC", "poster": "", "openreview": "https://openreview.net/forum?id=D_KeYoqCYC", "slides": "https://iclr.cc/virtual/2021/poster/3262", "video": "https://iclr.cc/virtual/2021/poster/3262", "author_site": "Joshua Chang, Patrick A Fletcher, Jungmin Han, Ted Chang, Shashaank Vattikuti, Bart Desmet, Ayah Zirikly, Carson Chow", "tldr": "", "abstract": "Dimensionality reduction methods for count data are critical to a wide range of applications in medical informatics and other fields where model interpretability is paramount. For such data, hierarchical Poisson matrix factorization (HPF) and other sparse probabilistic non-negative matrix factorization (NMF) methods are considered to be interpretable generative models. They consist of sparse transformations for decoding their learned representations into predictions. However, sparsity in representation decoding does not necessarily imply sparsity in the encoding of representations from the original data features. HPF is often incorrectly interpreted in the literature as if it possesses encoder sparsity. The distinction between decoder sparsity and encoder sparsity is subtle but important. Due to the lack of encoder sparsity, HPF does not possess the column-clustering property of classical NMF -- the factor loading matrix does not sufficiently define how each factor is formed from the original features. We address this deficiency by self-consistently enforcing encoder sparsity, using a generalized additive model (GAM), thereby allowing one to relate each representation coordinate to a subset of the original data features. In doing so, the method also gains the ability to perform feature selection. We demonstrate our method on simulated data and give an example of how encoder sparsity is of practical use in a concrete application of representing inpatient comorbidities in Medicare patients.", "keywords": "poisson matrix factorization;generalized additive model;probabilistic matrix factorization;bayesian;sparse coding;interpretability;factor analysis", "primary_area": "", "supplementary_material": "/attachment/6512356ac7b04ade9bee61f550e22c8ee267119d.zip", "author": "Joshua C Chang;Patrick Fletcher;Jungmin Han;Ted L Chang;Shashaank Vattikuti;Bart Desmet;Ayah Zirikly;Carson C Chow", "authorids": "~Joshua_C_Chang1;patrick@mederrata.com;jungmin@mederrata.com;ted@mederrata.com;shashaank@mederrata.com;bart.desmet@gmail.com;ayah.zirikly@gmail.com;carsonc@niddk.nih.gov", "gender": ";;;;;;;", "homepage": ";;;;;;;", "dblp": ";;;;;;;", "google_scholar": ";;;;;;;", "orcid": ";;;;;;;", "linkedin": ";;;;;;;", "or_profile": "~Joshua_C_Chang1;patrick@mederrata.com;jungmin@mederrata.com;ted@mederrata.com;shashaank@mederrata.com;bart.desmet@gmail.com;ayah.zirikly@gmail.com;carsonc@niddk.nih.gov", "aff": ";;;;;;;", "aff_domain": ";;;;;;;", "position": ";;;;;;;", "bibtex": "@inproceedings{\nchang2021sparse,\ntitle={Sparse encoding for more-interpretable feature-selecting representations in probabilistic matrix factorization},\nauthor={Joshua C Chang and Patrick Fletcher and Jungmin Han and Ted L Chang and Shashaank Vattikuti and Bart Desmet and Ayah Zirikly and Carson C Chow},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=D_KeYoqCYC}\n}", "github": "[![github](/images/github_icon.svg) mederrata/spmf](https://github.com/mederrata/spmf)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer5", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;4;4", "wc_review": "138;308;512", "wc_reply_reviewers": "0;115;266", "wc_reply_authors": "400;1428;1627", "reply_reviewers": "0;1;1", "reply_authors": "1;3;3", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 319.3333333333333, 152.89502571662982 ], "wc_reply_reviewers_avg": [ 127.0, 108.92505068471011 ], "wc_reply_authors_avg": [ 1151.6666666666667, 537.6816488923121 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 2.3333333333333335, 0.9428090415820634 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17630346324232626458&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=D_KeYoqCYC", "email": ";;;;;;;", "author_num": 8 }, { "title": "Shape-Texture Debiased Neural Network Training", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2870", "id": "Db4yerZTYkz", "poster": "", "openreview": "https://openreview.net/forum?id=Db4yerZTYkz", "slides": "https://iclr.cc/virtual/2021/poster/2870", "video": "https://iclr.cc/virtual/2021/poster/2870", "author_site": "Yinigwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie", "tldr": "", "abstract": "Shape and texture are two prominent and complementary cues for recognizing objects. Nonetheless, Convolutional Neural Networks are often biased towards either texture or shape, depending on the training dataset. Our ablation shows that such bias degenerates model performance. Motivated by this observation, we develop a simple algorithm for shape-texture debiased learning. To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously. \n\nExperiments show that our method successfully improves model performance on several image recognition benchmarks and adversarial robustness. For example, by training on ImageNet, it helps ResNet-152 achieve substantial improvements on ImageNet (+1.2%), ImageNet-A (+5.2%), ImageNet-C (+8.3%) and Stylized-ImageNet (+11.1%), and on defending against FGSM adversarial attacker on ImageNet (+14.4%). Our method also claims to be compatible with other advanced data augmentation strategies, eg, Mixup, and CutMix. The code is available here: https://github.com/LiYingwei/ShapeTextureDebiasedTraining.", "keywords": "data augmentation;representation learning;debiased training", "primary_area": "", "supplementary_material": "", "author": "Yingwei Li;Qihang Yu;Mingxing Tan;Jieru Mei;Peng Tang;Wei Shen;Alan Yuille;cihang xie", "authorids": "~Yingwei_Li4;~Qihang_Yu1;~Mingxing_Tan3;~Jieru_Mei2;~Peng_Tang1;~Wei_Shen2;~Alan_Yuille1;~cihang_xie1", "gender": "M;;M;M;M;M;M;M", "homepage": "http://yingwei.li/;;;https://meijieru.com/;https://shenwei1231.github.io/;;https://cihangxie.github.io/;https://ppengttang.github.io/", "dblp": ";;11/7863;198/9332.html;71/3692-2;y/AlanLYuille;175/3366;", "google_scholar": "phWmJeIAAAAJ;7zZdZxsAAAAJ;6POeyBoAAAAJ;nHKExN0AAAAJ;Ae2kRCEAAAAJ;;X3vVZPcAAAAJ;h_oYR-IAAAAJ", "orcid": ";;;;;;;", "linkedin": ";;mingxing-tan-2724551b/;meijieru/;;;;ppengtang/", "or_profile": "~Yingwei_Li4;~Qihang_Yu1;~Mingxing_Tan3;~Jieru_Mei2;~Wei_Shen2;~Alan_Yuille1;~cihang_xie1;~Peng_Tang3", "aff": "Johns Hopkins University;Google;Google/Waymo;Johns Hopkins University;Shanghai Jiaotong University;Johns Hopkins University;University of California, Santa Cruz;Amazon", "aff_domain": "jhu.edu;google.com;google.com;jhu.edu;sjtu.edu.cn;johnshopkins.edu;ucsc.edu;amazon.com", "position": "PhD student;Intern;Researcher;PhD student;Associate Professor;Full Professor;Assistant Professor;Applied Scientist", "bibtex": "@inproceedings{\nli2021shapetexture,\ntitle={Shape-Texture Debiased Neural Network Training},\nauthor={Yingwei Li and Qihang Yu and Mingxing Tan and Jieru Mei and Peng Tang and Wei Shen and Alan Yuille and cihang xie},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Db4yerZTYkz}\n}", "github": "[![github](/images/github_icon.svg) LiYingwei/ShapeTextureDebiasedTraining](https://github.com/LiYingwei/ShapeTextureDebiasedTraining)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer5", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "4;3;3;4", "wc_review": "750;382;545;195", "wc_reply_reviewers": "229;0;0;0", "wc_reply_authors": "1524;453;548;199", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 468.0, 204.5592823608843 ], "wc_reply_reviewers_avg": [ 57.25, 99.15990873331822 ], "wc_reply_authors_avg": [ 681.0, 503.15156762152697 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.40824829046386296, "gs_citation": 142, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13815083807768708857&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=Db4yerZTYkz", "email": "jhu.edu;google.com;google.com;jhu.edu;sjtu.edu.cn;johnshopkins.edu;ucsc.edu;amazon.com", "author_num": 8, "aff_unique_index": "0;1;1;0;2;0;3;4", "aff_unique_norm": "Johns Hopkins University;Google;Shanghai Jiao Tong University;University of California, Santa Cruz;Amazon", "aff_unique_dep": ";Google;;;Amazon.com, Inc.", "aff_unique_url": "https://www.jhu.edu;https://www.google.com;https://www.sjtu.edu.cn;https://www.ucsc.edu;https://www.amazon.com", "aff_unique_abbr": "JHU;Google;SJTU;UCSC;Amazon", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Mountain View;Santa Cruz", "aff_country_unique_index": "0;0;0;0;1;0;0;0", "aff_country_unique": "United States;China" }, { "id": "DdGCxq9C_Gr", "title": "Dropout's Dream Land: Generalization from Learned Simulators to Reality", "track": "main", "status": "Reject", "tldr": "", "abstract": "A World Model is a generative model used to simulate an environment. World Models have proven capable of learning spatial and temporal representations of Reinforcement Learning environments. In some cases, a World Model offers an agent the opportunity to learn entirely inside of its own dream environment. In this work we explore improving the generalization capabilities from dream environments to reality (Dream2Real). We present a general approach to improve a controller's ability to transfer from a neural network dream environment to reality at little additional cost. These improvements are gained by drawing on inspiration from domain randomization, where the basic idea is to randomize as much of a simulator as possible without fundamentally changing the task at hand. Generally, domain randomization assumes access to a pre-built simulator with configurable parameters but oftentimes this is not available. By training the World Model using dropout, the dream environment is capable of creating a nearly infinite number of \\textit{different} dream environments. Our experimental results show that Dropout's Dream Land is an effective technique to bridge the reality gap between dream environments and reality. Furthermore, we additionally perform an extensive set of ablation studies. ", "keywords": "Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Zac Wellmer;James Kwok", "authorids": "~Zac_Wellmer1;~James_Kwok1", "gender": ";", "homepage": ";", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Zac_Wellmer1;~James_Kwok1", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@misc{\nwellmer2021dropouts,\ntitle={Dropout's Dream Land: Generalization from Learned Simulators to Reality},\nauthor={Zac Wellmer and James Kwok},\nyear={2021},\nurl={https://openreview.net/forum?id=DdGCxq9C_Gr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=DdGCxq9C_Gr", "pdf_size": 0, "rating": "3;4;6;6", "confidence": "4;3;4;3", "wc_review": "507;394;690;205", "wc_reply_reviewers": "0;0;33;0", "wc_reply_authors": "1127;373;516;276", "reply_reviewers": "0;0;1;0", "reply_authors": "2;1;2;1", "rating_avg": [ 4.75, 1.299038105676658 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 449.0, 176.0724282788194 ], "wc_reply_reviewers_avg": [ 8.25, 14.289419162443238 ], "wc_reply_authors_avg": [ 573.0, 331.0490900153631 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.19245008972987526, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7295178715882008134&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8 }, { "id": "DdhfDplcxs1", "title": "OT-LLP: Optimal Transport for Learning from Label Proportions", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Learning from label proportions (LLP), where the training data are arranged in form of groups with only label proportions provided instead of the exact labels, is an important weakly supervised learning paradigm in machine learning. Existing deep learning based LLP methods pursue an end-to-end learning fashion and construct the loss using Kullback-Leibler divergence, which measures the difference between the prior and posterior class distributions in each bag. However, unconstrained optimization on this objective can hardly reach a solution in accordance with the given proportions at the bag level. In addition, concerning the probabilistic classifier, it probably results in high-entropy conditional class distributions at the instance level. These issues will further degrade the performance of instance-level classification. To address these problems, we propose to impose the exact proportions on the classifier with a constrained optimization, and firstly apply the optimal transport algorithm to solve LLP. With the entropic regularization, our formulation allows to solve a convex programming efficiently and further arrive at an integer solution that meets the proportion constraint strictly. More importantly, our framework is model-agnostic, and demonstrates compelling performance improvement in extensive experiments, when it is incorporated into other deep LLP models as a post-hoc stage.", "keywords": "Learning from label proportions;Optimal transport;Weakly supervised learning;Classification", "primary_area": "", "supplementary_material": "/attachment/da33c51feba34e0b4b7304b448b405a41b47e4cd.zip", "author": "Jiabin Liu;Hanyuan Hang;Bo Wang;Xin Shen;Zhouchen Lin", "authorids": "~Jiabin_Liu1;~Hanyuan_Hang1;~Bo_Wang14;~Xin_Shen2;~Zhouchen_Lin1", "gender": "M;M;M;;M", "homepage": ";;http://it.uibe.edu.cn/szdw/dsjkxyjzx/50452.htm;;https://zhouchenlin.github.io", "dblp": "https://dblp.org/pers/hd/l/Liu:Jiabin.html;180/5385;72/6811-49;;l/ZhouchenLin", "google_scholar": ";;pNrI3CEAAAAJ;vDpk7pkAAAAJ;https://scholar.google.com.tw/citations?user=TanjFwoAAAAJ", "orcid": ";;0000-0002-8054-8185;;0000-0003-1493-7569", "linkedin": ";;;;", "or_profile": "~Jiabin_Liu1;~Hanyuan_Hang1;~Bo_Wang14;~Xin_Shen2;~Zhouchen_Lin1", "aff": "Samsung;Samsung;University of International Business and Economics;The Chinese University of Hong Kong;Peking University", "aff_domain": "samsung.com;samsung.com;uibe.edu.cn;cuhk.edu.hk;pku.edu.cn", "position": "Postdoc;Staff Engineer;Associate Professor;PhD student;Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=DdhfDplcxs1", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "3;4;3;3", "wc_review": "233;694;164;205", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 324.0, 215.02441721813827 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:trFH32D5_4gJ:scholar.google.com/&scioq=OT-LLP:+Optimal+Transport+for+Learning+from+Label+Proportions&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;1;2;3", "aff_unique_norm": "Samsung;University of International Business and Economics;Chinese University of Hong Kong;Peking University", "aff_unique_dep": "Samsung;;;", "aff_unique_url": "https://www.samsung.com;http://www.uibe.edu.cn;https://www.cuhk.edu.hk;http://www.pku.edu.cn", "aff_unique_abbr": "Samsung;UIBE;CUHK;Peking U", "aff_campus_unique_index": "1", "aff_campus_unique": ";Hong Kong SAR", "aff_country_unique_index": "0;0;1;1;1", "aff_country_unique": "South Korea;China" }, { "id": "DegtqJSbxo", "title": "Adversarial and Natural Perturbations for General Robustness", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper we aim to explore the general robustness of neural network classifiers by utilizing adversarial as well as natural perturbations. Different from previous works which mainly focus on studying the robustness of neural networks against adversarial perturbations, we also evaluate their robustness on natural perturbations before and after robustification. After standardizing the comparison between adversarial and natural perturbations, we demonstrate that although adversarial training improves the performance of the networks against adversarial perturbations, it leads to drop in the performance for naturally perturbed samples besides clean samples. In contrast, natural perturbations like elastic deformations, occlusions and wave does not only improve the performance against natural perturbations, but also lead to improvement in the performance for the adversarial perturbations. Additionally they do not drop the accuracy on the clean images.", "keywords": "Robustness;Adversarial Examples;Natural Perturbations;General Robustness", "primary_area": "", "supplementary_material": "", "author": "Sadaf Gulshad;Jan Hendrik Metzen;Arnold W.M. Smeulders", "authorids": "~Sadaf_Gulshad1;~Jan_Hendrik_Metzen1;~Arnold_W.M._Smeulders1", "gender": "F;M;M", "homepage": ";http://jmetzen.github.io/;https://staff.fnwi.uva.nl/a.w.m.smeulders/", "dblp": ";93/1712;", "google_scholar": "2GCPK34AAAAJ;https://scholar.google.de/citations?user=w047VfEAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;", "linkedin": ";jan-hendrik-metzen-211543135/;", "or_profile": "~Sadaf_Gulshad1;~Jan_Hendrik_Metzen1;~Arnold_W.M._Smeulders1", "aff": "University of Amsterdam;Bosch Center Artificial Intelligence;University of Amsterdam", "aff_domain": "uva.nl;bosch.com;uva.nl", "position": "PhD student;Senior Expert;Full Professor", "bibtex": "@misc{\ngulshad2021adversarial,\ntitle={Adversarial and Natural Perturbations for General Robustness},\nauthor={Sadaf Gulshad and Jan Hendrik Metzen and Arnold W.M. Smeulders},\nyear={2021},\nurl={https://openreview.net/forum?id=DegtqJSbxo}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=DegtqJSbxo", "pdf_size": 0, "rating": "4;4;4", "confidence": "5;4;4", "wc_review": "891;537;544", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 657.3333333333334, 165.25199612174802 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9866570130661892913&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Amsterdam;Bosch Center for Artificial Intelligence", "aff_unique_dep": ";Artificial Intelligence", "aff_unique_url": "https://www.uva.nl;https://www.bosch-ai.com", "aff_unique_abbr": "UvA;BCAI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "Netherlands;Germany" }, { "id": "Dh29CAlnMW", "title": "Parsed Categoric Encodings with Automunge", "track": "main", "status": "Reject", "tldr": "", "abstract": "The Automunge open source python library platform for tabular data pre-processing automates feature engineering data transformations of numerical encoding and missing data infill to received tidy data on bases fit to properties of columns in a designated train set for consistent and efficient application to subsequent data pipelines such as for inference, where transformations may be applied to distinct columns in \u201cfamily tree\u201d sets with generations and branches of derivations. Included in the library of transformations are methods to extract structure from bounded categorical string sets by way of automated string parsing, in which comparisons between entries in the set of unique values are parsed to identify character subset overlaps which may be encoded by appended columns of boolean overlap detection activations or by replacing string entries with identified overlap partitions. Further string parsing options, which may also be applied to unbounded categoric sets, include extraction of numeric substring partitions from entries or search functions to identify presence of specified substring partitions. The aggregation of these methods into \u201cfamily tree\u201d sets of transformations are demonstrated for use to automatically extract structure from categoric string compositions in relation to the set of entries in a column, such as may be applied to prepare categoric string set encodings for machine learning without human intervention.", "keywords": "tabular;feature engineering", "primary_area": "", "supplementary_material": "/attachment/a459fd3a1ace5744f62f4a35fecaef08bb6b11e2.zip", "author": "Nicholas Teague", "authorids": "~Nicholas_Teague1", "gender": "M", "homepage": "https://www.automunge.com", "dblp": "314/5998", "google_scholar": "ioqgQwQAAAAJ", "orcid": "0000-0001-6071-5065", "linkedin": "nicholaste/", "or_profile": "~Nicholas_Teague1", "aff": "Automunge", "aff_domain": "automunge.com", "position": "Founder", "bibtex": "@misc{\nteague2021parsed,\ntitle={Parsed Categoric Encodings with Automunge},\nauthor={Nicholas Teague},\nyear={2021},\nurl={https://openreview.net/forum?id=Dh29CAlnMW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=Dh29CAlnMW", "pdf_size": 0, "rating": "4;4;6", "confidence": "4;5;2", "wc_review": "94;145;118", "wc_reply_reviewers": "80;0;0", "wc_reply_authors": "838;1082;833", "reply_reviewers": "1;0;0", "reply_authors": "4;3;3", "rating_avg": [ 4.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 1.247219128924647 ], "wc_review_avg": [ 119.0, 20.83266665599966 ], "wc_reply_reviewers_avg": [ 26.666666666666668, 37.71236166328253 ], "wc_reply_authors_avg": [ 917.6666666666666, 116.21914166301904 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 3.3333333333333335, 0.4714045207910317 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.9449111825230679, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13718400609651132480&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Automunge", "aff_unique_dep": "", "aff_unique_url": "https://www.automunge.com", "aff_unique_abbr": "", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Improving Relational Regularized Autoencoders with Spherical Sliced Fused Gromov Wasserstein", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2980", "id": "DiQD7FWL233", "poster": "", "openreview": "https://openreview.net/forum?id=DiQD7FWL233", "slides": "https://iclr.cc/virtual/2021/poster/2980", "video": "https://iclr.cc/virtual/2021/poster/2980", "author_site": "Khai Nguyen, Son Nguyen, Nhat Ho, Tung Pham, Hung Bui", "tldr": "", "abstract": "Relational regularized autoencoder (RAE) is a framework to learn the distribution of data by minimizing a reconstruction loss together with a relational regularization on the prior of latent space. A recent attempt to reduce the inner discrepancy between the prior and aggregated posterior distributions is to incorporate sliced fused Gromov-Wasserstein (SFG) between these distributions. That approach has a weakness since it treats every slicing direction similarly, meanwhile several directions are not useful for the discriminative task. To improve the discrepancy and consequently the relational regularization, we propose a new relational discrepancy, named spherical sliced fused Gromov Wasserstein (SSFG), that can find an important area of projections characterized by a von Mises-Fisher distribution. Then, we introduce two variants of SSFG to improve its performance. The first variant, named mixture spherical sliced fused Gromov Wasserstein (MSSFG), replaces the vMF distribution by a mixture of von Mises-Fisher distributions to capture multiple important areas of directions that are far from each other. The second variant, named power spherical sliced fused Gromov Wasserstein (PSSFG), replaces the vMF distribution by a power spherical distribution to improve the sampling time of the vMF distribution in high dimension settings. We then apply the new discrepancies to the RAE framework to achieve its new variants. Finally, we conduct extensive experiments to show that the new autoencoders have favorable performance in learning latent manifold structure, image generation, and reconstruction.", "keywords": "Relational regularized autoencoder;deep generative model;sliced fused Gromov Wasserstein;spherical distributions", "primary_area": "", "supplementary_material": "/attachment/55f100a840f34d52df3fcddff1c09061b2c7a5e8.zip", "author": "Khai Nguyen;Son Nguyen;Nhat Ho;Tung Pham;Hung Bui", "authorids": "~Khai_Nguyen1;v.sonnv27@vinai.io;~Nhat_Ho1;v.tungph4@vinai.io;~Hung_Bui1", "gender": "M;;M;;M", "homepage": "https://khainb.com;;https://nhatptnk8912.github.io/;;https://sites.google.com/site/buihhung/home", "dblp": "120/4308;;203/4479;;", "google_scholar": "im5fNaQAAAAJ;;https://scholar.google.ca/citations?user=Xs7cKMwAAAAJ;;mDLwSZAAAAAJ", "orcid": ";;;;", "linkedin": ";;nhat-pham-minh-ho-267b8164/;;", "or_profile": "~Khai_Nguyen1;v.sonnv27@vinai.io;~Nhat_Ho1;v.tungph4@vinai.io;~Hung_Bui1", "aff": "VinAI Research, Vietnam;;University of Texas, Austin;;VinAI Research", "aff_domain": "vinai.io;;utexas.edu;;vinai.io", "position": "AI Research Resident;;Assistant Professor;;Principal Researcher", "bibtex": "@inproceedings{\nnguyen2021improving,\ntitle={Improving Relational Regularized Autoencoders with Spherical Sliced Fused Gromov Wasserstein},\nauthor={Khai Nguyen and Son Nguyen and Nhat Ho and Tung Pham and Hung Bui},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=DiQD7FWL233}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=DiQD7FWL233)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7", "confidence": "3;4;5", "wc_review": "502;655;729", "wc_reply_reviewers": "0;193;0", "wc_reply_authors": "517;2044;1286", "reply_reviewers": "0;1;0", "reply_authors": "1;3;3", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 628.6666666666666, 94.52454119903936 ], "wc_reply_reviewers_avg": [ 64.33333333333333, 90.98107251266912 ], "wc_reply_authors_avg": [ 1282.3333333333333, 623.4005311372636 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.3333333333333335, 0.9428090415820634 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 31, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14127083722926189516&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=DiQD7FWL233", "email": "vinai.io;;utexas.edu;;vinai.io", "author_num": 5, "aff_unique_index": "0;1;0", "aff_unique_norm": "VinAI Research;University of Texas at Austin", "aff_unique_dep": ";", "aff_unique_url": "https://www.vin.ai;https://www.utexas.edu", "aff_unique_abbr": "VinAI;UT Austin", "aff_campus_unique_index": "1", "aff_campus_unique": ";Austin", "aff_country_unique_index": "0;1;0", "aff_country_unique": "Vietnam;United States" }, { "id": "DigrnXQNMTe", "title": "A generalized probability kernel on discrete distributions and its application in two-sample test", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a generalized probability kernel(GPK) on discrete distributions with finite support. This probability kernel, defined as kernel between distributions instead of samples, generalizes the existing discrepancy statistics such as maximum mean discrepancy(MMD) as well as probability product kernels, and extends to more general cases. For both existing and newly proposed statistics, we estimate them through empirical frequency and illustrate the strategy to analyze the resulting bias and convergence bounds. We further propose power-MMD, a natural extension of MMD in the framework of GPK, illustrating its usage for the task of two-sample test. Our work connects the fields of discrete distribution-property estimation and kernel-based hypothesis test, which might shed light on more new possibilities.", "keywords": "maximum mean discrepancy;RKHS;two-sample test;empirical estimator;discrete distributions", "primary_area": "", "supplementary_material": "", "author": "Le Niu", "authorids": "~Le_Niu1", "gender": "M", "homepage": "https://overshiki.github.io/", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~Le_Niu1", "aff": "", "aff_domain": "", "position": "", "bibtex": "@misc{\nniu2021a,\ntitle={A generalized probability kernel on discrete distributions and its application in two-sample test},\nauthor={Le Niu},\nyear={2021},\nurl={https://openreview.net/forum?id=DigrnXQNMTe}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=DigrnXQNMTe", "pdf_size": 0, "rating": "1;2;2;3", "confidence": "5;4;5;4", "wc_review": "203;442;288;732", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "144;377;111;213", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 2.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 416.25, 201.42290708854344 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 211.25, 102.52895932369547 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ZSy3scLKX8EJ:scholar.google.com/&scioq=A+generalized+probability+kernel+on+discrete+distributions+and+its+application+in+two-sample+test&hl=en&as_sdt=0,33", "gs_version_total": 0 }, { "title": "SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2777", "id": "DktZb97_Fx", "poster": "", "openreview": "https://openreview.net/forum?id=DktZb97_Fx", "slides": "https://iclr.cc/virtual/2021/poster/2777", "video": "https://iclr.cc/virtual/2021/poster/2777", "author_site": "Mikhail Yurochkin, Yuekai Sun", "tldr": "", "abstract": "In this paper, we cast fair machine learning as invariant machine learning. We first formulate a version of individual fairness that enforces invariance on certain sensitive sets. We then design a transport-based regularizer that enforces this version of individual fairness and develop an algorithm to minimize the regularizer efficiently. Our theoretical results guarantee the proposed approach trains certifiably fair ML models. Finally, in the experimental studies we demonstrate improved fairness metrics in comparison to several recent fair training procedures on three ML tasks that are susceptible to algorithmic bias.", "keywords": "Algorithmic fairness;invariance", "primary_area": "", "supplementary_material": "/attachment/1c6f84f5d5740ce96ad87cd792c6c18b4dccb329.zip", "author": "Mikhail Yurochkin;Yuekai Sun", "authorids": "~Mikhail_Yurochkin1;~Yuekai_Sun1", "gender": "M;", "homepage": "https://moonfolk.github.io/;https://yuekai.github.io/", "dblp": "191/6719;", "google_scholar": "QjBF9sUAAAAJ;6T1XtW8AAAAJ", "orcid": ";", "linkedin": "mikhail-yurochkin-a45659114/;", "or_profile": "~Mikhail_Yurochkin1;~Yuekai_Sun1", "aff": "IBM Research;University of Michigan - Ann Arbor", "aff_domain": "ibm.com;umich.edu", "position": "Researcher;Assistant \u2192 Associate Professor of Statistics", "bibtex": "@inproceedings{\nyurochkin2021sensei,\ntitle={SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness},\nauthor={Mikhail Yurochkin and Yuekai Sun},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=DktZb97_Fx}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "3;4;2;3", "wc_review": "199;752;186;293", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "402;242;387;627", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 357.5, 231.47624068141423 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 414.5, 137.68169813014364 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 70, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9070407415187430480&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=DktZb97_Fx", "email": "ibm.com;umich.edu", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "IBM;University of Michigan", "aff_unique_dep": "IBM Research;", "aff_unique_url": "https://www.ibm.com/research;https://www.umich.edu", "aff_unique_abbr": "IBM;UM", "aff_campus_unique_index": "1", "aff_campus_unique": ";Ann Arbor", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "DlPnp5_1JMI", "title": "PDE-regularized Neural Networks for Image Classification", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural ordinary differential equations (neural ODEs) introduced an approach to approximate a neural network as a system of ODEs after considering its layer as a continuous variable and discretizing its hidden dimension. While having several good characteristics, neural ODEs are known to be numerically unstable and slow in solving their integral problems, resulting in errors and/or much computation of the forward-pass inference. In this work, we present a novel partial differential equation (PDE)-based approach that removes the necessity of solving integral problems and considers both the layer and the hidden dimension as continuous variables. Owing to the recent advancement of learning PDEs, the presented novel concept, called PR-Net, can be implemented. Our method shows comparable (or better) accuracy and robustness in much shorter forward-pass inference time for various datasets and tasks in comparison with neural ODEs and Isometric MobileNet V3. For the efficient nature of PR-Net, it is suitable to be deployed in resource-scarce environments, e.g., deploying instead of MobileNet.", "keywords": "Neural ODE;Partial Differential Equations;Image Classification", "primary_area": "", "supplementary_material": "/attachment/13fc393bf901de11f8af19abe70e079d1b4d2f08.zip", "author": "Jungeun Kim;Seunghyun Hwang;Jihyun Hwang;Kookjin Lee;Dongeun Lee;Noseong Park", "authorids": "jekim5418@yonsei.ac.kr;hwangsh7415@gmail.com;hwanggh96@gmail.com;koolee@sandia.gov;dongeun.lee@tamuc.edu;~Noseong_Park1", "gender": ";;;;;", "homepage": ";;;;;", "dblp": ";;;;;", "google_scholar": ";;;;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": ";;;;;", "aff": ";;;;;", "aff_domain": ";;;;;", "position": ";;;;;", "bibtex": "@misc{\nkim2021pderegularized,\ntitle={{\\{}PDE{\\}}-regularized Neural Networks for Image Classification},\nauthor={Jungeun Kim and Seunghyun Hwang and Jihyun Hwang and Kookjin Lee and Dongeun Lee and Noseong Park},\nyear={2021},\nurl={https://openreview.net/forum?id=DlPnp5_1JMI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=DlPnp5_1JMI", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;4;3;4", "wc_review": "904;728;308;212", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1234;1035;476;260", "reply_reviewers": "0;0;0;0", "reply_authors": "3;3;2;2", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 538.0, 286.8937085402885 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 751.25, 397.0676107415461 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.5, 0.5 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:oAOD5Zy6N1EJ:scholar.google.com/&scioq=PDE-regularized+Neural+Networks+for+Image+Classification&hl=en&as_sdt=0,33", "gs_version_total": 0 }, { "id": "Dmpi13JiqcX", "title": "Disentangling Representations of Text by Masking Transformers", "track": "main", "status": "Reject", "tldr": "", "abstract": "Representations in large language models such as BERT encode a range of features into a single vector, which are predictive in the context of a multitude of downstream tasks. In this paper, we explore whether it is possible to learn disentangled representations by identifying subnetworks in pre-trained models that encode distinct, complementary aspects of the representation. Concretely, we learn binary masks over transformer weights or hidden units to uncover the subset of features that correlate with a specific factor of variation. This sidesteps the need to train a disentangled model from scratch within a particular domain. We evaluate the ability of this method to disentangle representations of syntax and semantics, and sentiment from genre in the context of movie reviews. By combining this method with magnitude pruning we find that we can identify quite sparse subnetworks. Moreover, we find that this disentanglement-via-masking approach performs as well as or better than previously proposed methods based on variational autoencoders and adversarial training. ", "keywords": "disentanglement;model pruning;representation learning;transformers", "primary_area": "", "supplementary_material": "", "author": "Xiongyi Zhang;Jan-Willem van de Meent;Byron C Wallace", "authorids": "~Xiongyi_Zhang2;~Jan-Willem_van_de_Meent1;~Byron_C_Wallace1", "gender": "M;M;M", "homepage": "https://www.ccs.neu.edu/home/jwvdm/;https://jwvdm.github.io/;http://www.byronwallace.com/", "dblp": ";137/3263;00/8247", "google_scholar": ";CX9Lu38AAAAJ;KTzRHmwAAAAJ", "orcid": ";0000-0001-9465-5398;", "linkedin": ";;", "or_profile": "~Xiongyi_Zhang2;~Jan-Willem_van_de_Meent1;~Byron_C_Wallace1", "aff": "Northeastern University;Northeastern University;Northeastern University", "aff_domain": "northeastern.edu;northeastern.edu;northeastern.edu", "position": "PhD student;Assistant Professor;Associate Professor", "bibtex": "@misc{\nzhang2021disentangling,\ntitle={Disentangling Representations of Text by Masking Transformers},\nauthor={Xiongyi Zhang and Jan-Willem van de Meent and Byron C Wallace},\nyear={2021},\nurl={https://openreview.net/forum?id=Dmpi13JiqcX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=Dmpi13JiqcX", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;3;4;4", "wc_review": "293;328;344;524", "wc_reply_reviewers": "52;0;0;0", "wc_reply_authors": "485;591;455;530", "reply_reviewers": "1;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 372.25, 89.53316424655168 ], "wc_reply_reviewers_avg": [ 13.0, 22.516660498395403 ], "wc_reply_authors_avg": [ 515.25, 51.236583609760714 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3335489497083000957&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "Northeastern University", "aff_unique_dep": "", "aff_unique_url": "https://www.northeastern.edu", "aff_unique_abbr": "NEU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "AdaSpeech: Adaptive Text to Speech for Custom Voice", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3164", "id": "Drynvt7gg4L", "poster": "", "openreview": "https://openreview.net/forum?id=Drynvt7gg4L", "slides": "https://iclr.cc/virtual/2021/poster/3164", "video": "https://iclr.cc/virtual/2021/poster/3164", "author_site": "Mingjian Chen, Xu Tan, Bohan Li, Eric Liu, Tao Qin, sheng zhao, Tie-Yan Liu", "tldr": "", "abstract": "Custom voice, a specific text to speech (TTS) service in commercial speech platforms, aims to adapt a source TTS model to synthesize personal voice for a target speaker using few speech from her/him. Custom voice presents two unique challenges for TTS adaptation: 1) to support diverse customers, the adaptation model needs to handle diverse acoustic conditions which could be very different from source speech data, and 2) to support a large number of customers, the adaptation parameters need to be small enough for each target speaker to reduce memory usage while maintaining high voice quality. In this work, we propose AdaSpeech, an adaptive TTS system for high-quality and efficient customization of new voices. We design several techniques in AdaSpeech to address the two challenges in custom voice: 1) To handle different acoustic conditions, we model the acoustic information in both utterance and phoneme level. Specifically, we use one acoustic encoder to extract an utterance-level vector and another one to extract a sequence of phoneme-level vectors from the target speech during pre-training and fine-tuning; in inference, we extract the utterance-level vector from a reference speech and use an acoustic predictor to predict the phoneme-level vectors. 2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation. We pre-train the source TTS model on LibriTTS datasets and fine-tune it on VCTK and LJSpeech datasets (with different acoustic conditions from LibriTTS) with few adaptation data, e.g., 20 sentences, about 1 minute speech. Experiment results show that AdaSpeech achieves much better adaptation quality than baseline methods, with only about 5K specific parameters for each speaker, which demonstrates its effectiveness for custom voice. The audio samples are available at https://speechresearch.github.io/adaspeech/.", "keywords": "Text to speech;adaptation;fine-tuning;custom voice;acoustic condition modeling;conditional layer normalization", "primary_area": "", "supplementary_material": "", "author": "Mingjian Chen;Xu Tan;Bohan Li;Yanqing Liu;Tao Qin;sheng zhao;Tie-Yan Liu", "authorids": "t-miche@microsoft.com;~Xu_Tan1;bohan.li@microsoft.com;yanqliu@microsoft.com;~Tao_Qin1;~sheng_zhao1;~Tie-Yan_Liu1", "gender": ";M;;;M;M;M", "homepage": ";https://tan-xu.github.io/;;;https://www.microsoft.com/en-us/research/people/taoqin/;https://www.aaai.org/ojs/index.php/AAAI/article/view/4642;http://member.acm.org/~tieyanliu", "dblp": ";96/10484-3;;;14/6841;;l/TieYanLiu", "google_scholar": ";tob-U1oAAAAJ;;;Bl4SRU0AAAAJ;689bIIwAAAAJ;Nh832fgAAAAJ", "orcid": ";0000-0001-5631-0639;;;;;0000-0002-0476-8020", "linkedin": ";;;;;;", "or_profile": "t-miche@microsoft.com;~Xu_Tan1;bohan.li@microsoft.com;yanqliu@microsoft.com;~Tao_Qin1;~sheng_zhao1;~Tie-Yan_Liu1", "aff": ";Microsoft;;;Microsoft Research Asia;Microsoft;Microsoft", "aff_domain": ";microsoft.com;;;microsoft.com;microsoft.com;microsoft.com", "position": ";Principal Researcher;;;Principal Researcher;Researcher;Distinguished Scientist", "bibtex": "@inproceedings{\nchen2021adaspeech,\ntitle={AdaSpeech: Adaptive Text to Speech for Custom Voice},\nauthor={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and sheng zhao and Tie-Yan Liu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Drynvt7gg4L}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=Drynvt7gg4L)", "project": "", "reviewers": "AnonReviewer5;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "4;6;7;8", "confidence": "5;2;5;5", "wc_review": "493;102;407;375", "wc_reply_reviewers": "637;0;0;0", "wc_reply_authors": "2898;209;453;302", "reply_reviewers": "1;0;0;0", "reply_authors": "6;1;1;1", "rating_avg": [ 6.25, 1.479019945774904 ], "confidence_avg": [ 4.25, 1.299038105676658 ], "wc_review_avg": [ 344.25, 146.36832819978508 ], "wc_reply_reviewers_avg": [ 159.25, 275.8290911053437 ], "wc_reply_authors_avg": [ 965.5, 1119.1220889608069 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 2.165063509461097 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.09759000729485333, "gs_citation": 242, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5695102233701284044&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=Drynvt7gg4L", "email": ";microsoft.com;;;microsoft.com;microsoft.com;microsoft.com", "author_num": 7, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Microsoft Corporation", "aff_unique_url": "https://www.microsoft.com", "aff_unique_abbr": "Microsoft", "aff_campus_unique_index": "1", "aff_campus_unique": ";Asia", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "United States;China" }, { "id": "DsbhGImWjF", "title": "GOLD-NAS: Gradual, One-Level, Differentiable", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "There has been a large literature on neural architecture search, but most existing work made use of heuristic rules that largely constrained the search flexibility. In this paper, we first relax these manually designed constraints and enlarge the search space to contain more than $10^{117}$ candidates. In the new space, most existing differentiable search methods can fail dramatically. We then propose a novel algorithm named Gradual One-Level Differentiable Neural Architecture Search (GOLD-NAS) which introduces a variable resource constraint to one-level optimization so that the weak operators are gradually pruned out from the super-network. In standard image classification benchmarks, GOLD-NAS can find a series of Pareto-optimal architectures within a single search procedure. Most of the discovered architectures were never studied before, yet they achieve a nice tradeoff between recognition accuracy and model complexity. GOLD-NAS also shows generalization ability in extended search spaces with different candidate operators.\n", "keywords": "Neural Architecture Search;GOLD-NAS", "primary_area": "", "supplementary_material": "", "author": "Kaifeng Bi;Lingxi Xie;Xin Chen;Longhui Wei;Qi Tian", "authorids": "bikaifeng1@huawei.com;~Lingxi_Xie1;chenxin180@huawei.com;~Longhui_Wei1;~Qi_Tian3", "gender": ";M;;M;M", "homepage": ";http://lingxixie.com/;;https://joinwei-pku.github.io/longhuiwei.github.io/;https://www.qitian1987.com/index.html", "dblp": ";123/2869;;206/6179;78/1467-1.html", "google_scholar": ";EEMm7hwAAAAJ;;thhnAhIAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;0000-0002-7252-5047", "linkedin": ";;;;", "or_profile": "bikaifeng1@huawei.com;~Lingxi_Xie1;chenxin180@huawei.com;~Longhui_Wei1;~Qi_Tian3", "aff": ";Huawei Technologies Ltd.;;University of Science and Technology of China;Huawei Technologies Ltd.", "aff_domain": ";huawei.com;;ustc.edu.cn;huawei.com", "position": ";Researcher;;PhD student;Principal Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=DsbhGImWjF", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;5;5;4", "wc_review": "541;629;591;501", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 565.5, 48.587549845613744 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 52, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=190356642262593793&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "Huawei;University of Science and Technology of China", "aff_unique_dep": "Huawei Technologies;", "aff_unique_url": "https://www.huawei.com;http://www.ustc.edu.cn", "aff_unique_abbr": "Huawei;USTC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "Dtahsj2FkrK", "title": "A REINFORCEMENT LEARNING FRAMEWORK FOR TIME DEPENDENT CAUSAL EFFECTS EVALUATION IN A/B TESTING", "track": "main", "status": "Reject", "tldr": "", "abstract": "A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. The aim of this paper is to introduce a reinforcement learn- ing framework for carrying A/B testing in two-sided marketplace platforms, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating. It is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., size and power) of our testing procedure. Finally, we apply our framework to both synthetic data and a real-world data example obtained from a technological company to illustrate its advantage over the current practice. \n", "keywords": "reinforcement learning;A/B testing;causal inference;sequential testing", "primary_area": "", "supplementary_material": "", "author": "Chengchun Shi;Xiaoyu Wang;Shikai Luo;Rui Song;Hongtu Zhu;Jieping Ye", "authorids": "~Chengchun_Shi1;wxyinucas@gmail.com;luoshikai@didiglobal.com;~Rui_Song2;~Hongtu_Zhu2;~Jieping_Ye3", "gender": "M;;;;M;", "homepage": "https://callmespring.github.io/;;;https://song-ray.github.io/;https://bigkp.org;", "dblp": ";;;01/2743-6.html;03/5683;03/5454", "google_scholar": "dDGy3N0AAAAJ;;;;https://scholar.google.com/citations?hl=en;", "orcid": ";;;0000-0003-1875-2115;0000-0002-6781-2690;", "linkedin": ";;;;;", "or_profile": "~Chengchun_Shi1;wxyinucas@gmail.com;luoshikai@didiglobal.com;~Rui_Song2;~Hongtu_Zhu2;~Jieping_Ye3", "aff": "London School of Economics and Political Science, University of London;;;North Carolina State University;University of North Carolina at Chapel Hill;", "aff_domain": "lse.ac.uk;;;ncsu.edu;unc.edu;", "position": "Assistant Professor;;;Full Professor;Full Professor;", "bibtex": "@misc{\nshi2021a,\ntitle={A {\\{}REINFORCEMENT{\\}} {\\{}LEARNING{\\}} {\\{}FRAMEWORK{\\}} {\\{}FOR{\\}} {\\{}TIME{\\}} {\\{}DEPENDENT{\\}} {\\{}CAUSAL{\\}} {\\{}EFFECTS{\\}} {\\{}EVALUATION{\\}} {\\{}IN{\\}} A/B {\\{}TESTING{\\}}},\nauthor={Chengchun Shi and Xiaoyu Wang and Shikai Luo and Rui Song and Hongtu Zhu and Jieping Ye},\nyear={2021},\nurl={https://openreview.net/forum?id=Dtahsj2FkrK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Dtahsj2FkrK", "pdf_size": 0, "rating": "5;5;6", "confidence": "2;4;2", "wc_review": "426;644;302", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "769;1322;390", "reply_reviewers": "0;0;0", "reply_authors": "1;2;1", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 2.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 457.3333333333333, 141.36792029625704 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 827.0, 382.69134647476244 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14174398135683970293&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "London School of Economics and Political Science;North Carolina State University;University of North Carolina", "aff_unique_dep": ";;", "aff_unique_url": "https://www.lse.ac.uk;https://www.ncsu.edu;https://www.unc.edu", "aff_unique_abbr": "LSE;NCSU;UNC", "aff_campus_unique_index": "0;2", "aff_campus_unique": "London;;Chapel Hill", "aff_country_unique_index": "0;1;1", "aff_country_unique": "United Kingdom;United States" }, { "id": "Du7s5ukNKz", "title": "Policy Learning Using Weak Supervision", "track": "main", "status": "Reject", "tldr": "", "abstract": "Most existing policy learning solutions require the learning agents to receive high-quality supervision signals, e.g., rewards in reinforcement learning (RL) or high-quality expert demonstrations in behavior cloning (BC). These quality supervisions are either infeasible or prohibitively expensive to obtain in practice. We aim for a unified framework that leverages the weak supervisions to perform policy learning efficiently. To handle this problem, we treat the ``weak supervisions'' as imperfect information coming from a \\emph{peer agent}, and evaluate the learning agent's policy based on a ``correlated agreement'' with the peer agent's policy (instead of simple agreements). Our way of leveraging peer agent's information offers us a family of solutions that learn effectively from weak supervisions with theoretical guarantees. Extensive evaluations on tasks including RL with noisy reward, BC with weak demonstrations, and standard policy co-training (RL + BC) show that the proposed approach leads to substantial improvements, especially when the complexity or the noise of the learning environments grows. ", "keywords": "Weak Supervision;Policy Learning;Correlated Agreement", "primary_area": "", "supplementary_material": "/attachment/938f8268fab1cdd4c6790c22f0e3f210e5f77e49.zip", "author": "Jingkang Wang;Hongyi Guo;Zhaowei Zhu;Yang Liu", "authorids": "~Jingkang_Wang1;~Hongyi_Guo1;~Zhaowei_Zhu1;~Yang_Liu3", "gender": "M;M;M;M", "homepage": "http://www.cs.toronto.edu/~wangjk/;https://gohsyi.github.io/;https://www.zzw.ai;http://www.yliuu.com", "dblp": "223/9910;;202/1712;51/3710-18", "google_scholar": "c0BTYC4AAAAJ;https://scholar.google.com/citations?hl=en;YS8pSQoAAAAJ;jKrIVCIAAAAJ", "orcid": ";;0000-0003-3894-5862;0000-0001-8420-6011", "linkedin": ";;;", "or_profile": "~Jingkang_Wang1;~Hongyi_Guo1;~Zhaowei_Zhu1;~Yang_Liu3", "aff": "University of Toronto;Northwestern University, Northwestern University;University of California, Santa Cruz;University of California, Santa Cruz", "aff_domain": "toronto.edu;u.northwestern.edu;ucsc.edu;ucsc.edu", "position": "PhD student;PhD student;PhD student;Assistant Professor", "bibtex": "@misc{\nwang2021policy,\ntitle={Policy Learning Using Weak Supervision},\nauthor={Jingkang Wang and Hongyi Guo and Zhaowei Zhu and Yang Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=Du7s5ukNKz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=Du7s5ukNKz", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "3;3;4;3", "wc_review": "377;617;1123;477", "wc_reply_reviewers": "285;168;517;0", "wc_reply_authors": "1265;714;1531;342", "reply_reviewers": "2;1;1;0", "reply_authors": "2;2;3;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 648.5, 286.90895768518624 ], "wc_reply_reviewers_avg": [ 242.5, 188.09106836849006 ], "wc_reply_authors_avg": [ 963.0, 464.07704101797583 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16464632898961524841&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "University of Toronto;Northwestern University;University of California, Santa Cruz", "aff_unique_dep": ";;", "aff_unique_url": "https://www.utoronto.ca;https://www.northwestern.edu;https://www.ucsc.edu", "aff_unique_abbr": "U of T;NU;UCSC", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Santa Cruz", "aff_country_unique_index": "0;1;1;1", "aff_country_unique": "Canada;United States" }, { "id": "Dw8vAUKYq8C", "title": "Near-Optimal Glimpse Sequences for Training Hard Attention Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. Hard attention mechanisms are typically non-differentiable. They can be trained with reinforcement learning but the high-variance training this entails hinders more widespread application. We show how hard attention for image classification can be framed as a Bayesian optimal experimental design (BOED) problem. From this perspective, the optimal locations to attend to are those which provide the greatest expected reduction in the entropy of the classification distribution. We introduce methodology from the BOED literature to approximate this optimal behaviour, and use it to generate `near-optimal' sequences of attention locations. We then show how to use such sequences to partially supervise, and therefore speed up, the training of a hard attention mechanism. Although generating these sequences is computationally expensive, they can be reused by any other networks later trained on the same task.", "keywords": "attention;hard attention;variational inference;bayesian optimal experimental design", "primary_area": "", "supplementary_material": "/attachment/80198e4bc4d79a1e270fe83eafe1ded415bd3441.zip", "author": "William Harvey;Michael Teng;Frank Wood", "authorids": "~William_Harvey1;~Michael_Teng1;~Frank_Wood2", "gender": "M;M;M", "homepage": "https://www.cs.ubc.ca/~wsgh/;;http://www.robots.ox.ac.uk/~fwood/", "dblp": "26/8210-2;217/1536;44/4750", "google_scholar": "https://scholar.google.co.uk/citations?user=kDd7nBkAAAAJ;;d4yNzXIAAAAJ", "orcid": ";;", "linkedin": ";;frank-wood-43529114?trk=hp-identity-name", "or_profile": "~William_Harvey1;~Michael_Teng1;~Frank_Wood2", "aff": "University of British Columbia;University of Oxford;MILA", "aff_domain": "cs.ubc.ca;ox.ac.uk;mila.quebec", "position": "PhD student;PhD student;Associate Professor", "bibtex": "@misc{\nharvey2021nearoptimal,\ntitle={Near-Optimal Glimpse Sequences for Training Hard Attention Neural Networks},\nauthor={William Harvey and Michael Teng and Frank Wood},\nyear={2021},\nurl={https://openreview.net/forum?id=Dw8vAUKYq8C}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=Dw8vAUKYq8C", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;4;2;2", "wc_review": "193;248;217;285", "wc_reply_reviewers": "29;133;0;0", "wc_reply_authors": "309;491;158;306", "reply_reviewers": "2;1;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.0, 1.0 ], "wc_review_avg": [ 235.75, 34.47734763580284 ], "wc_reply_reviewers_avg": [ 40.5, 54.70146250330058 ], "wc_reply_authors_avg": [ 316.0, 118.04448314089058 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8944271909999159, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:2apKfO9vQqMJ:scholar.google.com/&scioq=Near-Optimal+Glimpse+Sequences+for+Training+Hard+Attention+Neural+Networks&hl=en&as_sdt=0,33", "gs_version_total": 2, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of British Columbia;University of Oxford;Mila", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ubc.ca;https://www.ox.ac.uk;https://mila.quebec", "aff_unique_abbr": "UBC;Oxford;MILA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "Canada;United Kingdom" }, { "id": "E3SWxn0cDBG", "title": "Provable Acceleration of Wide Neural Net Training via Polyak's Momentum", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Incorporating a so-called momentum dynamic in gradient descent methods is widely used in neural net training as it has been broadly observed that, at least empirically, it often leads to significantly faster convergence. At the same time, there are very few theoretical guarantees in the literature to explain this apparent acceleration effect. In this paper we show that Polyak's momentum, in combination with over-parameterization of the model, helps achieve faster convergence in training a one-layer ReLU network on $n$ examples. We show specifically that gradient descent with Polyak's momentum decreases the initial training error at a rate much faster than that of vanilla gradient descent. We provide a bound for a fixed sample size $n$, and we show that gradient descent with Polyak's momentum converges at an accelerated rate to a small error that is controllable by the number of neurons $m$. Prior work (Du et al. 2019) showed that using vanilla gradient descent, and with a similar method of over-parameterization, the error decays as $(1-\\kappa_n)^t$ after $t$ iterations, where $\\kappa_n$ is a problem-specific parameter. Our result shows that with the appropriate choice of momentum parameter one has a rate of $(1-\\sqrt{\\kappa_n})^t$. This work establishes that momentum does indeed speed up neural net training.\n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Jun-Kun Wang;Jacob Abernethy", "authorids": "~Jun-Kun_Wang1;~Jacob_Abernethy1", "gender": "M;M", "homepage": "https://jimwang123.github.io/;https://www.cc.gatech.edu/~jabernethy9/", "dblp": "153/5463;91/2520", "google_scholar": ";FDu4ciwAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Jun-Kun_Wang1;~Jacob_Abernethy1", "aff": "Georgia Institute of Technology;Georgia Institute of Technology", "aff_domain": "gatech.edu;cc.gatech.edu", "position": "PhD student;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=E3SWxn0cDBG", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;3;2;2", "wc_review": "377;352;303;509", "wc_reply_reviewers": "0;291;0;63", "wc_reply_authors": "2613;1490;40;1198", "reply_reviewers": "0;1;0;1", "reply_authors": "4;2;1;3", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 2.75, 0.82915619758885 ], "wc_review_avg": [ 385.25, 76.24426207918862 ], "wc_reply_reviewers_avg": [ 88.5, 119.70902221637265 ], "wc_reply_authors_avg": [ 1335.25, 915.5739661545647 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.5, 1.118033988749895 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9438798074485388, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:BxfchWsbflsJ:scholar.google.com/&scioq=Provable+Acceleration+of+Wide+Neural+Net+Training+via+Polyak%27s+Momentum&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Georgia Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.gatech.edu", "aff_unique_abbr": "Georgia Tech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "E3UZoJKHxuk", "title": "Latent Causal Invariant Model", "track": "main", "status": "Reject", "tldr": "", "abstract": "Current supervised learning can learn spurious correlation during the data-fitting process, imposing issues regarding interpretability, out-of-distribution (OOD) generalization, and robustness. To avoid spurious correlation, we propose a \\textbf{La}tent \\textbf{C}ausal \\textbf{I}nvariance \\textbf{M}odel (LaCIM) which pursues \\emph{causal prediction}. Specifically, we introduce latent variables that are separated into (a) output-causative factors and (b) others that are spuriously correlated to the output via confounders, to model the underlying causal factors. We further assume the generating mechanisms from latent space to observed data to be \\emph{causally invariant}. We give the identifiable claim of such invariance, particularly the disentanglement of output-causative factors from others, as a theoretical guarantee for precise inference and avoiding spurious correlation. We propose a Variational-Bayesian-based method for estimation and to optimize over the latent space for prediction. The utility of our approach is verified by improved interpretability, prediction power on various OOD scenarios (including healthcare) and robustness on security. ", "keywords": "invariance;causality;spurious correlation;out-of-distribution generalization;interpretability;variational auto-encoder", "primary_area": "", "supplementary_material": "/attachment/a2a686a9a219692ef4137788c359bb92fc3b31a0.zip", "author": "Xinwei Sun;Botong Wu;Chang Liu;Xiangyu Zheng;Wei Chen;Tao Qin;Tie-Yan Liu", "authorids": "~Xinwei_Sun1;~Botong_Wu1;~Chang_Liu10;~Xiangyu_Zheng1;~Wei_Chen1;~Tao_Qin1;~Tie-Yan_Liu1", "gender": "M;M;M;F;F;M;M", "homepage": "https://sunxinwei0625.github.io/sunxw.github.io/;;https://changliu00.github.io/;https://songxichen.com/index.php/people/zhengxiangyu;https://weichen-cas.github.io/;https://www.microsoft.com/en-us/research/people/taoqin/;http://member.acm.org/~tieyanliu", "dblp": "145/6592-1;;52/5716-30;;;14/6841;l/TieYanLiu", "google_scholar": ";8q0Gzr4AAAAJ;rYd0GEsAAAAJ;;https://scholar.google.com/citations?hl=en;Bl4SRU0AAAAJ;Nh832fgAAAAJ", "orcid": ";;0000-0001-5207-5440;;;;0000-0002-0476-8020", "linkedin": ";;chang-liu-9ab479168/;;;;", "or_profile": "~Xinwei_Sun1;~Botong_Wu1;~Chang_Liu10;~Xiangyu_Zheng1;~Wei_Chen1;~Tao_Qin1;~Tie-Yan_Liu1", "aff": ";Peking University, Tsinghua University;Microsoft;Peking University;;Microsoft Research Asia;Microsoft", "aff_domain": ";pku.edu.cn;microsoft.com;pku.edu.cn;;microsoft.com;microsoft.com", "position": ";PhD student;Researcher;PhD student;;Principal Researcher;Distinguished Scientist", "bibtex": "@misc{\nsun2021latent,\ntitle={Latent Causal Invariant Model},\nauthor={Xinwei Sun and Botong Wu and Chang Liu and Xiangyu Zheng and Wei Chen and Tao Qin and Tie-Yan Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=E3UZoJKHxuk}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=E3UZoJKHxuk", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;4;3;3", "wc_review": "2175;261;1068;863", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1981;578;412;976", "reply_reviewers": "0;0;0;0", "reply_authors": "3;1;1;2", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 1091.75, 692.1825535940645 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 986.75, 609.5208671571467 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 23, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9112087959525004117&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;0;1;1", "aff_unique_norm": "Peking University;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "http://www.pku.edu.cn;https://www.microsoft.com", "aff_unique_abbr": "Peking U;Microsoft", "aff_campus_unique_index": "1", "aff_campus_unique": ";Asia", "aff_country_unique_index": "0;1;0;0;1", "aff_country_unique": "China;United States" }, { "title": "The Importance of Pessimism in Fixed-Dataset Policy Optimization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2655", "id": "E3Ys6a1NTGT", "poster": "", "openreview": "https://openreview.net/forum?id=E3Ys6a1NTGT", "slides": "https://iclr.cc/virtual/2021/poster/2655", "video": "https://iclr.cc/virtual/2021/poster/2655", "author_site": "Jacob Buckman, Carles Gelada, Marc G Bellemare", "tldr": "", "abstract": "We study worst-case guarantees on the expected return of fixed-dataset policy optimization algorithms. Our core contribution is a unified conceptual and mathematical framework for the study of algorithms in this regime. This analysis reveals that for naive approaches, the possibility of erroneous value overestimation leads to a difficult-to-satisfy requirement: in order to guarantee that we select a policy which is near-optimal, we may need the dataset to be informative of the value of every policy. To avoid this, algorithms can follow the pessimism principle, which states that we should choose the policy which acts optimally in the worst possible world. We show why pessimistic algorithms can achieve good performance even when the dataset is not informative of every policy, and derive families of algorithms which follow this principle. These theoretical findings are validated by experiments on a tabular gridworld, and deep learning experiments on four MinAtar environments.", "keywords": "deep learning;reinforcement learning;offline reinforcement learning", "primary_area": "", "supplementary_material": "", "author": "Jacob Buckman;Carles Gelada;Marc G Bellemare", "authorids": "~Jacob_Buckman2;cgel@openai.com;~Marc_G_Bellemare1", "gender": ";;M", "homepage": ";;http://www.marcgbellemare.info", "dblp": ";;38/4525", "google_scholar": ";;https://scholar.google.co.uk/citations?user=uyYPun0AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Jacob_Buckman2;cgel@openai.com;~Marc_G_Bellemare1", "aff": ";;Google", "aff_domain": ";;google.com", "position": ";;Research Scientist", "bibtex": "@inproceedings{\nbuckman2021the,\ntitle={The Importance of Pessimism in Fixed-Dataset Policy Optimization},\nauthor={Jacob Buckman and Carles Gelada and Marc G Bellemare},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=E3Ys6a1NTGT}\n}", "github": "[![github](/images/github_icon.svg) jbuckman/tiopifdpo](https://github.com/jbuckman/tiopifdpo)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;4;4", "wc_review": "562;526;850", "wc_reply_reviewers": "133;0;0", "wc_reply_authors": "1055;476;537", "reply_reviewers": "1;0;0", "reply_authors": "2;1;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 646.0, 144.99655168313487 ], "wc_reply_reviewers_avg": [ 44.333333333333336, 62.69680126520721 ], "wc_reply_authors_avg": [ 689.3333333333334, 259.7618567500283 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 167, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7642597601487950859&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=E3Ys6a1NTGT", "email": ";;google.com", "author_num": 3, "aff_unique_index": "0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "E4PK0rg2eP", "title": "Parameter-Efficient Transfer Learning with Diff Pruning", "track": "main", "status": "Reject", "tldr": "", "abstract": "While task-specific finetuning of deep networks pretrained with self-supervision has led to significant empirical advances in NLP, their large size makes the standard finetuning approach difficult to apply to multi-task, memory-constrained settings, as storing the full model parameters for each task become prohibitively expensive. We propose $\\textit{diff pruning}$ as a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework. This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks. The diff vector is adaptively pruned during training with a differentiable approximation to the $L_0$-norm penalty to encourage sparsity. Diff pruning becomes parameter-efficient as the number of tasks increases, as it requires storing only the nonzero positions and weights of the diff vector for each task, while the cost of storing the shared pretrained model remains constant. We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark while only modifying 0.5$\\%$ of the pretrained model's parameters per task.", "keywords": "transfer learning;parameter efficiency", "primary_area": "", "supplementary_material": "", "author": "Demi Guo;Alexander M Rush;Yoon Kim", "authorids": "~Demi_Guo1;~Alexander_M_Rush1;~Yoon_Kim1", "gender": "F;M;", "homepage": ";http://rush.seas.harvard.edu/;https://people.csail.mit.edu/yoonkim/", "dblp": "215/3969;http://dblp.uni-trier.de/pers/hd/r/Rush:Alexander_M=;", "google_scholar": ";LIjnUGgAAAAJ;n_ts4eYAAAAJ", "orcid": ";0000-0002-9900-1606;", "linkedin": ";sasha-rush-a69b6917/;", "or_profile": "~Demi_Guo1;~Alexander_M_Rush1;~Yoon_Kim1", "aff": "Harvard;School of Engineering and Applied Sciences, Harvard University;International Business Machines", "aff_domain": "college.harvard.edu;seas.harvard.edu;ibm.com", "position": "Undergrad student;Assistant Professor;Research scientist", "bibtex": "@misc{\nguo2021parameterefficient,\ntitle={Parameter-Efficient Transfer Learning with Diff Pruning},\nauthor={Demi Guo and Alexander M Rush and Yoon Kim},\nyear={2021},\nurl={https://openreview.net/forum?id=E4PK0rg2eP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=E4PK0rg2eP", "pdf_size": 0, "rating": "4;5;6;8", "confidence": "4;4;4;4", "wc_review": "216;416;244;439", "wc_reply_reviewers": "202;0;0;0", "wc_reply_authors": "750;778;496;464", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 1.479019945774904 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 328.75, 99.57754515953886 ], "wc_reply_reviewers_avg": [ 50.5, 87.4685657822283 ], "wc_reply_authors_avg": [ 622.0, 142.7935572776307 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 459, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2500880548083395687&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0;1", "aff_unique_norm": "Harvard University;International Business Machines Corporation", "aff_unique_dep": ";", "aff_unique_url": "https://www.harvard.edu;https://www.ibm.com", "aff_unique_abbr": "Harvard;IBM", "aff_campus_unique_index": "1", "aff_campus_unique": ";Cambridge", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "E6fb6ehhLh8", "title": "Unified Principles For Multi-Source Transfer Learning Under Label Shifts", "track": "main", "status": "Reject", "tldr": "", "abstract": "We study the label shift problem in multi-source transfer learning and derive new generic principles. Our proposed framework unifies the principles of conditional feature alignment, label distribution ratio estimation, and domain relation weights estimation. Based on inspired practical principles, we provide a unified practical framework for three multi-source label shift transfer scenarios: learning with limited target data, unsupervised domain adaptation, and label partial unsupervised domain adaptation. We evaluate the proposed method on these scenarios by extensive experiments and show that our proposed algorithm can significantly outperform the baselines. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "changjian shui;Zijian Li;jiaqi li;Christian Gagn\u00e9;Charles Ling;Boyu Wang", "authorids": "~changjian_shui1;~Zijian_Li1;lijiaqi.victor@gmail.com;~Christian_Gagn\u00e91;~Charles_Ling1;~Boyu_Wang3", "gender": "Not Specified;M;;M;M;M", "homepage": "https://cjshui.github.io;;;http://vision.gel.ulaval.ca/~cgagne/english.html;http://cling.csd.uwo.ca/;https://sites.google.com/site/borriewang/", "dblp": "215/5461;27/10487;;80/5084-1;;41/6565-4.html", "google_scholar": "r91NXUgAAAAJ;j3ilESoAAAAJ;;https://scholar.google.ca/citations?user=egixsbEAAAAJ;https://scholar.google.co.uk/citations?hl=en;qAZM5KcAAAAJ", "orcid": ";;;0000-0003-3697-4184;;0000-0002-7413-4162", "linkedin": ";;;;;", "or_profile": "~changjian_shui1;~Zijian_Li1;lijiaqi.victor@gmail.com;~Christian_Gagn\u00e91;~Charles_Ling1;~Boyu_Wang3", "aff": "Laval university;;;Universit\u00e9 Laval;Western University;University of Western Ontario", "aff_domain": "ulaval.ca;;;ulaval.ca;uwo.ca;uwo.ca", "position": "PhD student;;;Full Professor;Professor;Assistant Professor", "bibtex": "@misc{\nshui2021unified,\ntitle={Unified Principles For Multi-Source Transfer Learning Under Label Shifts},\nauthor={changjian shui and Zijian Li and jiaqi li and Christian Gagn{\\'e} and Charles Ling and Boyu Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=E6fb6ehhLh8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=E6fb6ehhLh8", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "4;5;4;3", "wc_review": "480;250;469;367", "wc_reply_reviewers": "123;0;0;0", "wc_reply_authors": "2214;460;1624;874", "reply_reviewers": "1;0;0;0", "reply_authors": "4;1;4;2", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 391.5, 92.81837102642989 ], "wc_reply_reviewers_avg": [ 30.75, 53.26056233274298 ], "wc_reply_authors_avg": [ 1293.0, 675.8794271169969 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.75, 1.299038105676658 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.28867513459481287, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:4eXPclIZEIgJ:scholar.google.com/&scioq=Unified+Principles+For+Multi-Source+Transfer+Learning+Under+Label+Shifts&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Laval University;Universit\u00e9 Laval;Western University;University of Western Ontario", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.laval.ca;https://www.ulaval.ca;https://www.uwo.ca;https://www.uwo.ca", "aff_unique_abbr": "Laval;ULaval;Western;UWO", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Canada" }, { "id": "E8fmaZwzEj", "title": "Defective Convolutional Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i.e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly. Recent research suggests that the noise in adversarial examples breaks the textural structure, which eventually leads to wrong predictions. To mitigate the threat of such adversarial attacks, we propose defective convolutional networks that make predictions relying less on textural information but more on shape information by properly integrating defective convolutional layers into standard CNNs. The defective convolutional layers contain defective neurons whose activations are set to be a constant function. As defective neurons contain no information and are far different from standard neurons in its spatial neighborhood, the textural features cannot be accurately extracted, and so the model has to seek other features for classification, such as the shape. We show extensive evidence to justify our proposal and demonstrate that defective CNNs can defend against black-box attacks better than standard CNNs. In particular, they achieve state-of-the-art performance against transfer-based attacks without any adversarial training being applied.\n\n", "keywords": "Representation Learning;Robustness", "primary_area": "", "supplementary_material": "/attachment/214172c1baa022e3c465220424d943eb4d3671f8.zip", "author": "Tiange Luo;Tianle Cai;Mengxiao Zhang;Siyu Chen;Di He;Liwei Wang", "authorids": "~Tiange_Luo1;~Tianle_Cai1;~Mengxiao_Zhang2;~Siyu_Chen1;~Di_He1;~Liwei_Wang1", "gender": "M;M;;M;M;M", "homepage": "https://tiangeluo.github.io/;https://tianle.website;;;https://dihe-pku.github.io/;http://www.liweiwang-pku.com/", "dblp": "227/2386.html;241/9458;;23/7930;74/184;", "google_scholar": "https://scholar.google.com/citations?hl=en;CvwLRSMAAAAJ;;;https://scholar.google.co.jp/citations?user=orVoz4IAAAAJ;VZHxoh8AAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Tiange_Luo1;~Tianle_Cai1;~Mengxiao_Zhang2;~Siyu_Chen1;~Di_He1;~Liwei_Wang1", "aff": "Peking University;Princeton University;;;Microsoft;Peking University", "aff_domain": "pku.edu;princeton.edu;;;microsoft.com;pku.edu.cn", "position": "MS student;PhD student;;;Senior Researcher;Full Professor", "bibtex": "@misc{\nluo2021defective,\ntitle={Defective Convolutional Networks},\nauthor={Tiange Luo and Tianle Cai and Mengxiao Zhang and Siyu Chen and Di He and Liwei Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=E8fmaZwzEj}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=E8fmaZwzEj", "pdf_size": 0, "rating": "6;6;6", "confidence": "4;3;4", "wc_review": "400;504;377", "wc_reply_reviewers": "0;0;143", "wc_reply_authors": "1058;763;896", "reply_reviewers": "0;0;1", "reply_authors": "2;1;2", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 427.0, 55.25094267672423 ], "wc_reply_reviewers_avg": [ 47.666666666666664, 67.41084647311753 ], "wc_reply_authors_avg": [ 905.6666666666666, 120.62706532486351 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14498861838279531495&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "Peking University;Princeton University;Microsoft", "aff_unique_dep": ";;Microsoft Corporation", "aff_unique_url": "http://www.pku.edu.cn;https://www.princeton.edu;https://www.microsoft.com", "aff_unique_abbr": "Peking U;Princeton;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "China;United States" }, { "id": "E9W0QPxtZ_u", "title": "not-so-big-GAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution", "track": "main", "status": "Reject", "tldr": "", "abstract": "State-of-the-art models for high-resolution image generation, such as BigGAN and VQVAE-2, require an incredible amount of compute resources and/or time (512 TPU-v3 cores) to train, putting them out of reach for the larger research community. On the other hand, GAN-based image super-resolution models, such as ESRGAN, can not only upscale images to high dimensions, but also are efficient to train. In this paper, we present not-so-big-GAN (nsb-GAN), a simple yet cost-effective two-step training framework for deep generative models (DGMs) of high-dimensional natural images. First, we generate images in low-frequency bands by training a sampler in the wavelet domain. Then, we super-resolve these images from the wavelet domain back to the pixel-space with our novel wavelet super-resolution decoder network. Wavelet-based down-sampling method preserves more structural information than pixel-based methods, leading to significantly better generative quality of the low-resolution sampler (e.g., 64\u00d764). Since the sampler and decoder can be trained in parallel and operate on much lower dimensional spaces than end-to-end models, the training cost is substantially reduced. On ImageNet 512\u00d7512, our model achieves a Fr\u00e9chet Inception Distance (FID) of 10.59 \u2013 beating the baseline BigGAN model \u2013 at half the compute (256 TPU-v3 cores).", "keywords": "deep generative modeling;GAN;super-resolution;wavelet transformation;energy efficient", "primary_area": "", "supplementary_material": "", "author": "Seungwook Han;Akash Srivastava;Cole Lincoln Hurwitz;Prasanna Sattigeri;David Daniel Cox", "authorids": "~Seungwook_Han1;~Akash_Srivastava1;~Cole_Lincoln_Hurwitz1;~Prasanna_Sattigeri1;~David_Daniel_Cox1", "gender": ";M;;;", "homepage": ";http://akashgit.github.io;https://colehurwitz.github.io/;;", "dblp": "119/3428;24/9528;;00/7428;48/7659", "google_scholar": "B6tpjKkAAAAJ;https://scholar.google.co.uk/citations?user=2h6SZeEAAAAJ;https://scholar.google.co.uk/citations?hl=en;m-s38ikAAAAJ;", "orcid": ";;;0000-0003-4435-0486;", "linkedin": ";https://uk.linkedin.com/in/akash-srivastava-aa97361b;;prasannasattigeri/;", "or_profile": "~Seungwook_Han1;~Akash_Srivastava1;~Cole_Lincoln_Hurwitz1;~Prasanna_Sattigeri1;~David_Daniel_Cox1", "aff": "MIT-IBM Watson AI Lab;MIT-IBM Watson AI Research Lab;University of Edinburgh;IBM Research;International Business Machines", "aff_domain": "ibm.com;ibm.com;ed.ac.uk;ibm.com;ibm.com", "position": "Researcher;Research Scientist;PhD student;Researcher;IBM Director, MIT-IBM Watson AI Lab", "bibtex": "@misc{\nhan2021notsobiggan,\ntitle={not-so-big-{\\{}GAN{\\}}: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution},\nauthor={Seungwook Han and Akash Srivastava and Cole Lincoln Hurwitz and Prasanna Sattigeri and David Daniel Cox},\nyear={2021},\nurl={https://openreview.net/forum?id=E9W0QPxtZ_u}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=E9W0QPxtZ_u", "pdf_size": 0, "rating": "2;5;6", "confidence": "3;3;4", "wc_review": "725;444;303", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "815;800;278", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.333333333333333, 1.699673171197595 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 490.6666666666667, 175.41252964242764 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 631.0, 249.68380003516447 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.6933752452815364, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3846623338663719183&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;1;2;3", "aff_unique_norm": "Massachusetts Institute of Technology;University of Edinburgh;IBM;International Business Machines Corporation", "aff_unique_dep": "IBM Watson AI Lab;;IBM Research;", "aff_unique_url": "https://www.mitibmwatsonailab.org;https://www.ed.ac.uk;https://www.ibm.com/research;https://www.ibm.com", "aff_unique_abbr": "MIT-IBM AI Lab;Edinburgh;IBM;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "EAZHurUYz8U", "title": "Orthogonal Over-Parameterized Training", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By maintaining the minimum hyperspherical energy during training, OPT can greatly improve the empirical generalization. Specifically, OPT fixes the randomly initialized weights of the neurons and learns an orthogonal transformation that applies to these neurons. We consider multiple ways to learn such an orthogonal transformation, including unrolling orthogonalization algorithms, applying orthogonal parameterization, and designing orthogonality-preserving gradient descent. For better scalability, we further propose the stochastic OPT which performs orthogonal transformation stochastically for partial dimensions of the neuron. Interestingly, OPT reveals that learning a proper coordinate system for neurons is crucial to generalization. We provide some insights on why OPT yields better generalization. Extensive experiments validate the superiority of OPT.", "keywords": "Neural Network;Hyperspherical Energy;Inductive Bias;Orthogonality", "primary_area": "", "supplementary_material": "", "author": "Weiyang Liu;Rongmei Lin;Zhen Liu;James Matthew Rehg;Li Xiong;Adrian Weller;Le Song", "authorids": "~Weiyang_Liu1;~Rongmei_Lin1;~Zhen_Liu6;~James_Matthew_Rehg1;~Li_Xiong1;~Adrian_Weller1;~Le_Song1", "gender": "M;;M;;M;M;M", "homepage": "http://wyliu.com/;https://rmlin.github.io/;;http://www.cs.emory.edu/~lxiong/;http://mlg.eng.cam.ac.uk/adrian/;http://www.cc.gatech.edu/~lsong;http://rehg.org/", "dblp": "137/1532;198/2482;77/35-19;39/3530-1.html;73/8324;94/3481;r/JMRehg", "google_scholar": "DMjROf0AAAAJ;ehsqBUkAAAAJ;I1IiJCAAAAAJ;jJ8BLgsAAAAJ;https://scholar.google.co.uk/citations?user=Ek4hM10AAAAJ;Xl4E0CsAAAAJ;https://scholar.google.com.tw/citations?user=8kA3eDwAAAAJ", "orcid": ";;;0000-0001-7354-0428;;;0000-0003-1793-5462", "linkedin": ";;;li-xiong-32472513/;;;", "or_profile": "~Weiyang_Liu1;~Rongmei_Lin1;~Zhen_Liu6;~Li_Xiong1;~Adrian_Weller1;~Le_Song1;~James_Rehg1", "aff": "University of Cambridge;Emory University;University of Montreal;Emory University;University of Cambridge;College of Computing, Georgia Institute of Technology;Georgia Institute of Technology", "aff_domain": "cam.ac.uk;emory.edu;umontreal.ca;emory.edu;cam.ac.uk;cc.gatech.edu;gatech.edu", "position": "Researcher;PhD student;PhD student;Professor;Principal Researcher;Associate Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=EAZHurUYz8U", "pdf_size": 0, "rating": "3;5;6", "confidence": "4;2;3", "wc_review": "623;1072;333", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "565;864;239", "reply_reviewers": "0;0;0", "reply_authors": "1;2;1", "rating_avg": [ 4.666666666666667, 1.247219128924647 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 676.0, 304.0142540517906 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 556.0, 255.23453266881162 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.6546536707079772, "gs_citation": 35, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6254842155086919082&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "aff_unique_index": "0;1;2;1;0;3;3", "aff_unique_norm": "University of Cambridge;Emory University;University of Montreal;Georgia Institute of Technology", "aff_unique_dep": ";;;College of Computing", "aff_unique_url": "https://www.cam.ac.uk;https://www.emory.edu;https://wwwumontreal.ca;https://www.gatech.edu", "aff_unique_abbr": "Cambridge;Emory;UM;Georgia Tech", "aff_campus_unique_index": "0;0;2", "aff_campus_unique": "Cambridge;;Atlanta", "aff_country_unique_index": "0;1;2;1;0;1;1", "aff_country_unique": "United Kingdom;United States;Canada" }, { "id": "EArH-0iHhIq", "title": "ON NEURAL NETWORK GENERALIZATION VIA PROMOTING WITHIN-LAYER ACTIVATION DIVERSITY", "track": "main", "status": "Reject", "tldr": "", "abstract": "During the last decade, neural networks have been intensively used to tackle various problems and they have often led to state-of-the-art results. These networks are composed of multiple jointly optimized layers arranged in a hierarchical structure. At each layer, the aim is to learn to extract hidden patterns needed to solve the problem at hand and forward it to the next layers. In the standard form, a neural network is trained with gradient-based optimization, where the errors are back-propagated from the last layer back to the first one. Thus at each optimization step, neurons at a given layer receive feedback from neurons belonging to higher layers of the hierarchy. In this paper, we propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage diversity of the activations within the same layer. To this end, we measure the pairwise similarity between the outputs of the neurons and use it to model the layer's overall diversity. By penalizing similarities and promoting diversity, we encourage each neuron to learn a distinctive representation and, thus, to enrich the data representation learned within the layer and to increase the total capacity of the model. We theoretically study how the within-layer activation diversity affects the generalization performance of a neural network in a supervised context and we prove that increasing the diversity of hidden activations reduces the estimation error. In addition to the theoretical guarantees, we present an empirical study confirming that the proposed approach enhances the performance of neural networks.", "keywords": "Deep learning", "primary_area": "", "supplementary_material": "", "author": "Firas Laakom;Jenni Raitoharju;Alexandros Iosifidis;Moncef Gabbouj", "authorids": "~Firas_Laakom1;~Jenni_Raitoharju1;~Alexandros_Iosifidis2;~Moncef_Gabbouj1", "gender": "M;;M;M", "homepage": ";;https://www.tuni.fi/en/people/alexandros-iosifidis;https://www.tuni.fi/en/moncef-gabbouj", "dblp": "242/8179;;01/9539;08/6597", "google_scholar": "VPWIyx8AAAAJ;;KjsL0KEAAAAJ;cHukfSUAAAAJ", "orcid": "0000-0001-7436-5692;;0000-0003-4807-1345;0000-0002-9788-2323", "linkedin": ";;;moncef-gabbouj-2186282/?originalSubdomain=fi", "or_profile": "~Firas_Laakom1;~Jenni_Raitoharju1;~Alexandros_Iosifidis2;~Moncef_Gabbouj1", "aff": "Tampere University;;Aarhus University;Tampere University", "aff_domain": "tuni.fi;;au.dk;tuni.fi", "position": "PhD student;;Associate Professor;Full Professor", "bibtex": "@misc{\nlaakom2021on,\ntitle={{\\{}ON{\\}} {\\{}NEURAL{\\}} {\\{}NETWORK{\\}} {\\{}GENERALIZATION{\\}} {\\{}VIA{\\}} {\\{}PROMOTING{\\}} {\\{}WITHIN{\\}}-{\\{}LAYER{\\}} {\\{}ACTIVATION{\\}} {\\{}DIVERSITY{\\}}},\nauthor={Firas Laakom and Jenni Raitoharju and Alexandros Iosifidis and Moncef Gabbouj},\nyear={2021},\nurl={https://openreview.net/forum?id=EArH-0iHhIq}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=EArH-0iHhIq", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "3;4;4;3", "wc_review": "187;284;307;178", "wc_reply_reviewers": "202;0;0;0", "wc_reply_authors": "1412;435;692;388", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 239.0, 57.17079674099356 ], "wc_reply_reviewers_avg": [ 50.5, 87.4685657822283 ], "wc_reply_authors_avg": [ 731.75, 409.43398430027764 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.2294157338705618, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:g-CXLX_9uVEJ:scholar.google.com/&scioq=ON+NEURAL+NETWORK+GENERALIZATION+VIA+PROMOTING+WITHIN-LAYER+ACTIVATION+DIVERSITY&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Tampere University;Aarhus University", "aff_unique_dep": ";", "aff_unique_url": "https://www.tuni.fi;https://au.dk", "aff_unique_abbr": "Tuni;AU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "Finland;Denmark" }, { "id": "EBRTjOm_sl1", "title": "Learning Active Learning in the Batch-Mode Setup with Ensembles of Active Learning Agents", "track": "main", "status": "Reject", "tldr": "", "abstract": "Supervised learning models perform best when trained on a lot of data, but annotating training data is very costly in some domains. Active learning aims to chose only the most informative subset of unlabelled samples for annotation, thus saving annotation cost. Several heuristics for choosing this subset have been developed, which use fix policies for this choice. They are easily understandable and applied. However, there is no heuristic performing optimal in all settings. This lead to the development of agents learning the best selection policy from data. They formulate active learning as a Markov decision process and applying reinforcement learning (RL) methods to it. Their advantage is that they are able to use many features and to adapt to the specific task.\n\nOur paper proposes a new approach combining these advantages of learning active learning and heuristics: We propose to learn active learning using a parametrised ensemble of agents, where the parameters are learned using Monte Carlo policy search. As this approach can incorporate any active learning agent into its ensemble, it allows to increase the performance of every active learning agent by learning how to combine it with others.", "keywords": "active learning;ensembles", "primary_area": "", "supplementary_material": "/attachment/2e5044015f9c2ad92549254e488de697181aa6af.zip", "author": "Malte Ebner;Bernhard Kratzwald;Stefan Feuerriegel", "authorids": "~Malte_Ebner1;bkratzwald@ethz.ch;sfeuerriegel@ethz.ch", "gender": "M;;", "homepage": "https://malteebner.github.io;;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Malte_Ebner1;bkratzwald@ethz.ch;sfeuerriegel@ethz.ch", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "@misc{\nebner2021learning,\ntitle={Learning Active Learning in the Batch-Mode Setup with Ensembles of Active Learning Agents},\nauthor={Malte Ebner and Bernhard Kratzwald and Stefan Feuerriegel},\nyear={2021},\nurl={https://openreview.net/forum?id=EBRTjOm_sl1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=EBRTjOm_sl1", "pdf_size": 0, "rating": "3;4;4;7", "confidence": "4;4;4;4", "wc_review": "196;294;1216;230", "wc_reply_reviewers": "0;72;0;0", "wc_reply_authors": "244;172;416;230", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 1.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 484.0, 424.08253913595644 ], "wc_reply_reviewers_avg": [ 18.0, 31.176914536239792 ], "wc_reply_authors_avg": [ 265.5, 90.9876365227716 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:-CFOzJS7AsEJ:scholar.google.com/&scioq=Learning+Active+Learning+in+the+Batch-Mode+Setup+with+Ensembles+of+Active+Learning+Agents&hl=en&as_sdt=0,33", "gs_version_total": 0 }, { "title": "A teacher-student framework to distill future trajectories", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3008", "id": "ECuvULjFQia", "poster": "", "openreview": "https://openreview.net/forum?id=ECuvULjFQia", "slides": "https://iclr.cc/virtual/2021/poster/3008", "video": "https://iclr.cc/virtual/2021/poster/3008", "author_site": "Alexander Neitz, Giambattista Parascandolo, Bernhard Schoelkopf", "tldr": "", "abstract": "By learning to predict trajectories of dynamical systems, model-based methods can make extensive use of all observations from past experience. However, due to partial observability, stochasticity, compounding errors, and irrelevant dynamics, training to predict observations explicitly often results in poor models. Model-free techniques try to side-step the problem by learning to predict values directly. While breaking the explicit dependency on future observations can result in strong performance, this usually comes at the cost of low sample efficiency, as the abundant information about the dynamics contained in future observations goes unused. Here we take a step back from both approaches: Instead of hand-designing how trajectories should be incorporated, a teacher network learns to interpret the trajectories and to provide target activations which guide a student model that can only observe the present. The teacher is trained with meta-gradients to maximize the student's performance on a validation set. We show that our approach performs well on tasks that are difficult for model-free and model-based methods, and we study the role of every component through ablation studies.", "keywords": "meta-learning;privileged information", "primary_area": "", "supplementary_material": "", "author": "Alexander Neitz;Giambattista Parascandolo;Bernhard Sch\u00f6lkopf", "authorids": "~Alexander_Neitz1;~Giambattista_Parascandolo1;~Bernhard_Sch\u00f6lkopf1", "gender": ";;", "homepage": ";;", "dblp": "180/8340;;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Alexander_Neitz1;~Giambattista_Parascandolo1;~Bernhard_Sch\u00f6lkopf1", "aff": "Max Planck Institute for Intelligent Systems, Max-Planck Institute;;", "aff_domain": "tuebingen.mpg.de;;", "position": "PhD student;;", "bibtex": "@inproceedings{\nneitz2021a,\ntitle={A teacher-student framework to distill future trajectories},\nauthor={Alexander Neitz and Giambattista Parascandolo and Bernhard Sch{\\\"o}lkopf},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=ECuvULjFQia}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "4;2;3;3", "wc_review": "359;165;435;259", "wc_reply_reviewers": "0;0;269;0", "wc_reply_authors": "490;383;643;412", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 304.5, 101.89578008926571 ], "wc_reply_reviewers_avg": [ 67.25, 116.480416809007 ], "wc_reply_authors_avg": [ 482.0, 100.85385466108869 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15323250630888731789&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=ECuvULjFQia", "email": "tuebingen.mpg.de;;", "author_num": 3, "aff_unique_index": "0", "aff_unique_norm": "Max Planck Institute for Intelligent Systems", "aff_unique_dep": "Intelligent Systems", "aff_unique_url": "https://www.mpi-is.mpg.de", "aff_unique_abbr": "MPI-IS", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "id": "EEBhskS0Gzt", "title": "Navigating the Trade-Off between Learning Efficacy and Processing Efficiency in Deep Neural Networks", "track": "main", "status": "Desk Reject", "tldr": "", "abstract": "A number of training protocols in machine learning seek to enhance learning efficacy by training a single agent on multiple tasks in sequence. Sequential acquisition exploits the discovery of common structure between tasks in the form of shared representations to improve learning speed and generalization. The learning of shared representations, however, is known to impair the execution of multiple tasks in parallel. The parallel execution of tasks results in higher efficiency of processing and is promoted by separating representations between tasks to avoid processing interference. Here, we build on previous work involving shallow networks and simple task settings suggesting that there is a trade-off between learning efficacy and processing efficiency, mediated by the use of shared versus separated representations. We show that the same tension arises in deep networks and discuss a meta-learning algorithm for an agent to manage this trade-off in an unfamiliar environment. We display through different experiments that the agent is able to successfully optimize its training strategy as a function of the environment.", "keywords": "multitask-learning;multitasking;parallel processing", "primary_area": "", "supplementary_material": "/attachment/f5367c45bf20f53b2b5ac5d9a0448b726fd3768c.zip", "author": "Anonymous", "authorids": "ICLR.cc/2021/Conference/Paper1911/Authors", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nanonymous2021navigating,\ntitle={Navigating the Trade-Off between Learning Efficacy and Processing Efficiency in Deep Neural Networks},\nauthor={Anonymous},\nbooktitle={Submitted to International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=EEBhskS0Gzt},\nnote={under review}\n}", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=EEBhskS0Gzt", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1 }, { "id": "EGVxmJKLC2L", "title": "Learning not to learn: Nature versus nurture in silico", "track": "main", "status": "Reject", "tldr": "", "abstract": "Animals are equipped with a rich innate repertoire of sensory, behavioral and motor skills, which allows them to interact with the world immediately after birth. At the same time, many behaviors are highly adaptive and can be tailored to specific environments by means of learning. In this work, we use mathematical analysis and the framework of meta-learning (or 'learning to learn') to answer when it is beneficial to learn such an adaptive strategy and when to hard-code a heuristic behavior. We find that the interplay of ecological uncertainty, task complexity and the agents' lifetime has crucial effects on the meta-learned amortized Bayesian inference performed by an agent. There exist two regimes: One in which meta-learning yields a learning algorithm that implements task-dependent information-integration and a second regime in which meta-learning imprints a heuristic or 'hard-coded' behavior. Further analysis reveals that non-adaptive behaviors are not only optimal for aspects of the environment that are stable across individuals, but also in situations where an adaptation to the environment would in fact be highly beneficial, but could not be done quickly enough to be exploited within the remaining lifetime. Hard-coded behaviors should hence not only be those that always work, but also those that are too complex to be learned within a reasonable time frame.", "keywords": "Meta-Learning;Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Robert Tjarko Lange;Henning Sprekeler", "authorids": "~Robert_Tjarko_Lange1;h.sprekeler@tu-berlin.de", "gender": ";", "homepage": "https://roberttlange.github.io/;", "dblp": "245/9152;", "google_scholar": "https://scholar.google.es/citations?user=cTrc3x4AAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Robert_Tjarko_Lange1;h.sprekeler@tu-berlin.de", "aff": "TU Berlin;", "aff_domain": "tu-berlin.de;", "position": "PhD student;", "bibtex": "@misc{\nlange2021learning,\ntitle={Learning not to learn: Nature versus nurture in silico},\nauthor={Robert Tjarko Lange and Henning Sprekeler},\nyear={2021},\nurl={https://openreview.net/forum?id=EGVxmJKLC2L}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=EGVxmJKLC2L", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;4;4;4", "wc_review": "784;636;862;263", "wc_reply_reviewers": "457;330;308;167", "wc_reply_authors": "2020;1512;776;543", "reply_reviewers": "2;1;1;1", "reply_authors": "4;2;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 636.25, 230.27632857069787 ], "wc_reply_reviewers_avg": [ 315.5, 102.8846441409018 ], "wc_reply_authors_avg": [ 1212.75, 587.4773931820696 ], "reply_reviewers_avg": [ 1.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 1.224744871391589 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1457469824167837306&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0", "aff_unique_norm": "Technische Universit\u00e4t Berlin", "aff_unique_dep": "", "aff_unique_url": "https://www.tu-berlin.de", "aff_unique_abbr": "TU Berlin", "aff_campus_unique_index": "0", "aff_campus_unique": "Berlin", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "title": "Generalization bounds via distillation", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3261", "id": "EGdFhBzmAwB", "poster": "", "openreview": "https://openreview.net/forum?id=EGdFhBzmAwB", "slides": "https://iclr.cc/virtual/2021/poster/3261", "video": "https://iclr.cc/virtual/2021/poster/3261", "author_site": "Daniel Hsu, Ziwei Ji, Matus Telgarsky, Lan Wang", "tldr": "", "abstract": "This paper theoretically investigates the following empirical phenomenon: given a high-complexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller generalization bounds. The main contribution is an analysis showing that the original network inherits this good generalization bound from its distillation, assuming the use of well-behaved data augmentation. This bound is presented both in an abstract and in a concrete form, the latter complemented by a reduction technique to handle modern computation graphs featuring convolutional layers, fully-connected layers, and skip connections, to name a few. To round out the story, a (looser) classical uniform convergence analysis of compression is also presented, as well as a variety of experiments on cifar and mnist demonstrating similar generalization performance between the original network and its distillation. \n", "keywords": "Generalization;statistical learning theory;theory;distillation", "primary_area": "", "supplementary_material": "", "author": "Daniel Hsu;Ziwei Ji;Matus Telgarsky;Lan Wang", "authorids": "~Daniel_Hsu1;~Ziwei_Ji1;~Matus_Telgarsky1;~Lan_Wang4", "gender": "M;M;M;", "homepage": "https://www.cs.columbia.edu/~djhsu/;https://jiziwei.github.io/;https://cims.nyu.edu/~matus/;", "dblp": "h/DanielHsu.html;176/4574.html=;05/9061;", "google_scholar": "Bp6tvy0AAAAJ;3l_6H5sAAAAJ;https://scholar.google.com/citations?hl=en;", "orcid": "0000-0002-3495-7113;;;", "linkedin": ";ziwei-ji-b1274899/;;lan-wang-57a550b1/", "or_profile": "~Daniel_Hsu1;~Ziwei_Ji1;~Matus_Telgarsky1;~Lan_Wang4", "aff": "Columbia University;University of Illinois Urbana Champaign;Department of Computer Science, University of Illinois, Urbana Champaign;", "aff_domain": "columbia.edu;illinois.edu;cs.illinois.edu;", "position": "Associate Professor;PhD student;Assistant Professor;", "bibtex": "@inproceedings{\nhsu2021generalization,\ntitle={Generalization bounds via distillation},\nauthor={Daniel Hsu and Ziwei Ji and Matus Telgarsky and Lan Wang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=EGdFhBzmAwB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "2;2;2;5", "wc_review": "196;128;283;372", "wc_reply_reviewers": "0;0;0;108", "wc_reply_authors": "378;217;450;696", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;2", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 2.75, 1.299038105676658 ], "wc_review_avg": [ 244.75, 91.73705630768845 ], "wc_reply_reviewers_avg": [ 27.0, 46.76537180435969 ], "wc_reply_authors_avg": [ 435.25, 172.5679214106724 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.8703882797784891, "gs_citation": 46, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12235745090871943563&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=EGdFhBzmAwB", "email": "columbia.edu;illinois.edu;cs.illinois.edu;", "author_num": 4, "aff_unique_index": "0;1;1", "aff_unique_norm": "Columbia University;University of Illinois Urbana-Champaign", "aff_unique_dep": ";", "aff_unique_url": "https://www.columbia.edu;https://illinois.edu", "aff_unique_abbr": "Columbia;UIUC", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Urbana-Champaign", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Efficient Continual Learning with Modular Networks and Task-Driven Priors", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3040", "id": "EKV158tSfwv", "poster": "", "openreview": "https://openreview.net/forum?id=EKV158tSfwv", "slides": "https://iclr.cc/virtual/2021/poster/3040", "video": "https://iclr.cc/virtual/2021/poster/3040", "author_site": "Tom Veniat, Ludovic Denoyer, Marc'Aurelio Ranzato", "tldr": "", "abstract": "Existing literature in Continual Learning (CL) has focused on overcoming catastrophic forgetting, the inability of the learner to recall how to perform tasks observed in the past. \nThere are however other desirable properties of a CL system, such as the ability to transfer knowledge from previous tasks and to scale memory and compute sub-linearly with the number of tasks. Since most current benchmarks focus only on forgetting using short streams of tasks, we first propose a new suite of benchmarks to probe CL algorithms across these new axes. \nFinally, we introduce a new modular architecture, whose modules represent atomic skills that can be composed to perform a certain task. Learning a task reduces to figuring out which past modules to re-use, and which new modules to instantiate to solve the current task. Our learning algorithm leverages a task-driven prior over the exponential search space of all possible ways to combine modules, enabling efficient learning on long streams of tasks. \nOur experiments show that this modular architecture and learning algorithm perform competitively on widely used CL benchmarks while yielding superior performance on the more challenging benchmarks we introduce in this work. The Benchmark is publicly available at https://github.com/facebookresearch/CTrLBenchmark.", "keywords": "Continual learning;Lifelong learning;Benchmark;Modular network;Neural Network", "primary_area": "", "supplementary_material": "", "author": "Tom Veniat;Ludovic Denoyer;MarcAurelio Ranzato", "authorids": "~Tom_Veniat1;~Ludovic_Denoyer1;~MarcAurelio_Ranzato1", "gender": ";M;M", "homepage": ";;https://ranzato.github.io/", "dblp": "202/2390;54/5551;28/1732", "google_scholar": ";9PLqulwAAAAJ;NbXF7T8AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Tom_Veniat1;~Ludovic_Denoyer1;~MarcAurelio_Ranzato1", "aff": ";Meta Facebook;Meta Facebook", "aff_domain": ";fb.com;fb.com", "position": ";Research Scientist;Researcher", "bibtex": "@inproceedings{\nveniat2021efficient,\ntitle={Efficient Continual Learning with Modular Networks and Task-Driven Priors},\nauthor={Tom Veniat and Ludovic Denoyer and MarcAurelio Ranzato},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=EKV158tSfwv}\n}", "github": "[![github](/images/github_icon.svg) facebookresearch/CTrLBenchmark](https://github.com/facebookresearch/CTrLBenchmark) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=EKV158tSfwv)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;4;3;3", "wc_review": "552;625;393;329", "wc_reply_reviewers": "209;80;55;0", "wc_reply_authors": "990;1326;555;841", "reply_reviewers": "1;1;1;0", "reply_authors": "2;3;1;2", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 474.75, 118.81577125954281 ], "wc_reply_reviewers_avg": [ 86.0, 76.68441823473658 ], "wc_reply_authors_avg": [ 928.0, 277.914555214368 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 104, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12940206604439548007&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=EKV158tSfwv", "email": ";fb.com;fb.com", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta Platforms, Inc.", "aff_unique_url": "https://meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "EKb4Z0aSNf", "title": "CLOPS: Continual Learning of Physiological Signals", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep learning algorithms are known to experience destructive interference when instances violate the assumption of being independent and identically distributed (i.i.d). This violation, however, is ubiquitous in clinical settings where data are streamed temporally and from a multitude of physiological sensors. To overcome this obstacle, we propose CLOPS, a replay-based continual learning strategy. In three continual learning scenarios based on three publically-available datasets, we show that CLOPS can outperform the state-of-the-art methods, GEM and MIR. Moreover, we propose end-to-end trainable parameters, which we term task-instance parameters, that can be used to quantify task difficulty and similarity. This quantification yields insights into both network interpretability and clinical applications, where task difficulty is poorly quantified.", "keywords": "Continual learning;physiological signals;healthcare", "primary_area": "", "supplementary_material": "/attachment/49796ec1806fe872109be2b790f1265b2ddc2452.zip", "author": "Dani Kiyasseh;Tingting Zhu;David A. Clifton", "authorids": "~Dani_Kiyasseh1;tingting.zhu@eng.ox.ac.uk;~David_A._Clifton1", "gender": ";;M", "homepage": "https://danikiyasseh.github.io/;;http://www.eng.ox.ac.uk/chi", "dblp": ";;89/6424", "google_scholar": "UD1oO4MAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Dani_Kiyasseh1;tingting.zhu@eng.ox.ac.uk;~David_A._Clifton1", "aff": "University of Oxford;;University of Oxford", "aff_domain": "oxford.ac.uk;;ox.ac.uk", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nkiyasseh2021clops,\ntitle={{\\{}CLOPS{\\}}: Continual Learning of Physiological Signals},\nauthor={Dani Kiyasseh and Tingting Zhu and David A. Clifton},\nyear={2021},\nurl={https://openreview.net/forum?id=EKb4Z0aSNf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=EKb4Z0aSNf", "pdf_size": 0, "rating": "3;4;7;7", "confidence": "4;3;2;4", "wc_review": "487;449;762;253", "wc_reply_reviewers": "785;398;147;0", "wc_reply_authors": "1070;783;454;447", "reply_reviewers": "3;1;1;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.25, 1.7853571071357126 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 487.75, 181.53701413210476 ], "wc_reply_reviewers_avg": [ 332.5, 297.49495794046663 ], "wc_reply_authors_avg": [ 688.5, 258.73973409586705 ], "reply_reviewers_avg": [ 1.25, 1.0897247358851685 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.37998029782867415, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14426328690642166598&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "EKw6nZ4QkJl", "title": "EM-RBR: a reinforced framework for knowledge graph completion from reasoning perspective", "track": "main", "status": "Reject", "tldr": "", "abstract": "Knowledge graph completion aims to predict the new links in given entities among the knowledge graph (KG). Most mainstream embedding methods focus on fact triplets contained in the given KG, however, ignoring the rich background information provided by logic rules driven from knowledge base implicitly. To solve this problem, in this paper, we propose a general framework, named EM-RBR(embedding and rule-based reasoning), capable of combining the advantages of reasoning based on rules and the state-of-the-art models of embedding. EM-RBR aims to utilize relational background knowledge contained in rules to conduct multi-relation reasoning link prediction rather than superficial vector triangle linkage in embedding models. By this way, we can explore relation between two entities in deeper context to achieve higher accuracy. In experiments, we demonstrate that EM-RBR achieves better performance compared with previous models on FB15k, WN18 and our new dataset FB15k-R, especially the new dataset where our model perform futher better than those state-of-the-arts. We make the implementation of EM-RBR available at https://github.com/1173710224/link-prediction-with-rule-based-reasoning.", "keywords": "knowledge graph completion;bread first search", "primary_area": "", "supplementary_material": "", "author": "Bozhou Chen;Zhaochong An;Houde Quan;Qihui Lin;Hongzhi Wang", "authorids": "~Bozhou_Chen1;~Zhaochong_An1;~Houde_Quan1;~Qihui_Lin1;~Hongzhi_Wang2", "gender": "M;M;M;F;M", "homepage": ";https://zhaochongan.github.io/;https://github.com/1172100122;https://github.com/1171000607;http://homepage.hit.edu.cn/wang", "dblp": "259/9940;274/7063;;;81/940", "google_scholar": "avQkdTsAAAAJ;;;;", "orcid": ";;;;0000-0002-7521-2871", "linkedin": ";;;;", "or_profile": "~Bozhou_Chen1;~Zhaochong_An1;~Houde_Quan1;~Qihui_Lin1;~Hongzhi_Wang2", "aff": "Harbin Institute of Technology;ETHZ - ETH Zurich;Harbin Institute of Technology;HIT;Harbin Institute of Technology", "aff_domain": "hit.edu.cn;ethz.ch;hit.edu.cn;hit.edu.cn;hit.edu.cn", "position": "MS student;MS student;Undergrad student;Undergrad student;Full Professor", "bibtex": "@misc{\nchen2021emrbr,\ntitle={{\\{}EM{\\}}-{\\{}RBR{\\}}: a reinforced framework for knowledge graph completion from reasoning perspective},\nauthor={Bozhou Chen and Zhaochong An and Houde Quan and Qihui Lin and Hongzhi Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=EKw6nZ4QkJl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=EKw6nZ4QkJl", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "3;4;4;4", "wc_review": "499;305;560;612", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 494.0, 116.2174685664767 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:MHE9_auDUwMJ:scholar.google.com/&scioq=EM-RBR:+a+reinforced+framework+for+knowledge+graph+completion+from+reasoning+perspective&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0;1;0;0;0", "aff_unique_norm": "Harbin Institute of Technology;ETH Zurich", "aff_unique_dep": ";", "aff_unique_url": "http://www.hit.edu.cn/;https://www.ethz.ch", "aff_unique_abbr": "HIT;ETHZ", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Harbin;", "aff_country_unique_index": "0;1;0;0;0", "aff_country_unique": "China;Switzerland" }, { "id": "ELiYxj9JlyW", "title": "ME-MOMENTUM: EXTRACTING HARD CONFIDENT EXAMPLES FROM NOISILY LABELED DATA", "track": "main", "status": "Reject", "tldr": "", "abstract": "Examples that are close to the decision boundary\u2014that we term hard examples, are essential to shaping accurate classifiers. Extracting confident examples has been widely studied in the community of learning with noisy labels. However, it remains elusive how to extract hard confident examples from the noisy training data. In this paper, we propose a deep learning paradigm to solve this problem, which is built on the memorization effect of deep neural networks that they would first learn simple patterns, i.e., which are defined by these shared by multiple training examples. To extract hard confident examples that contain non-simple patterns and are entangled with the inaccurately labeled examples, we borrow the idea of momentum from physics. Specifically, we alternately update the confident examples and refine the classifier. Note that the extracted confident examples in the previous round can be exploited to learn a better classifier and that the better classifier will help identify better (and hard) confident examples. We call the approach the \u201cMomentum of Memorization\u201d (Me-Momentum). Empirical results on benchmark-simulated and real-world label-noise data illustrate the effectiveness of Me-Momentum for extracting hard confident examples, leading to better classification performance.", "keywords": "label noise;hard confident examples", "primary_area": "", "supplementary_material": "/attachment/6040a1439db683bdb7c84d38337c5f77ce9fc3ee.zip", "author": "Yingbin Bai;Tongliang Liu", "authorids": "~Yingbin_Bai1;~Tongliang_Liu1", "gender": "M;M", "homepage": "https://bybeye.github.io/;https://tongliang-liu.github.io/", "dblp": "296/1646;150/6667", "google_scholar": "EWMII50AAAAJ;https://scholar.google.com.au/citations?user=EiLdZ_YAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Yingbin_Bai1;~Tongliang_Liu1", "aff": "University of Sydney;University of Sydney", "aff_domain": "sydney.edu.au;sydney.edu.au", "position": "PhD student;Lecturer", "bibtex": "@misc{\nbai2021memomentum,\ntitle={{\\{}ME{\\}}-{\\{}MOMENTUM{\\}}: {\\{}EXTRACTING{\\}} {\\{}HARD{\\}} {\\{}CONFIDENT{\\}} {\\{}EXAMPLES{\\}} {\\{}FROM{\\}} {\\{}NOISILY{\\}} {\\{}LABELED{\\}} {\\{}DATA{\\}}},\nauthor={Yingbin Bai and Tongliang Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=ELiYxj9JlyW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=ELiYxj9JlyW", "pdf_size": 0, "rating": "4;4;7;8", "confidence": "4;3;4;4", "wc_review": "408;754;395;262", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "735;1347;521;308", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.75, 1.7853571071357126 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 454.75, 181.97441441037805 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 727.75, 388.0910814486723 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5659164584181102, "gs_citation": 42, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16726599454428924035&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "University of Sydney", "aff_unique_dep": "", "aff_unique_url": "https://www.sydney.edu.au", "aff_unique_abbr": "USYD", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Australia" }, { "title": "Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2806", "id": "EMHoBG0avc1", "poster": "", "openreview": "https://openreview.net/forum?id=EMHoBG0avc1", "slides": "https://iclr.cc/virtual/2021/poster/2806", "video": "https://iclr.cc/virtual/2021/poster/2806", "author_site": "Wenhan Xiong, Xiang Li, Srini Iyer, Jingfei Du, Patrick Lewis, William Yang Wang, Yashar Mehdad, Scott Yih, Sebastian Riedel, Douwe Kiela, Barlas Oguz", "tldr": "", "abstract": "We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be applied to any unstructured text corpus. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.", "keywords": "multi-hop question answering;recursive dense retrieval;open domain complex question answering", "primary_area": "", "supplementary_material": "", "author": "Wenhan Xiong;Xiang Li;Srini Iyer;Jingfei Du;Patrick Lewis;William Yang Wang;Yashar Mehdad;Scott Yih;Sebastian Riedel;Douwe Kiela;Barlas Oguz", "authorids": "~Wenhan_Xiong1;~Xiang_Li2;~Srini_Iyer1;~Jingfei_Du1;~Patrick_Lewis2;~William_Yang_Wang2;mehdad@fb.com;~Scott_Yih1;~Sebastian_Riedel1;~Douwe_Kiela1;~Barlas_Oguz1", "gender": "M;F;M;M;M;;;;M;M;", "homepage": "https://xwhan.github.io;https://people.cs.pitt.edu/~xianglli/;http://sriniiyer.github.io;;https://patricklewis.io;;;;https://www.riedelcastro.org/;https://douwekiela.github.io;", "dblp": "203/8542;40/1491-69;78/4928.html;137/3917;227/3197;;;;18/3348-1.html;136/9140;https://dblp.org/pers/hd/o/Oguz:Barlas", "google_scholar": ";SRgRwSoAAAAJ;jNjde2wAAAAJ;;JN7Zg-kAAAAJ;;;;https://scholar.google.com.tw/citations?user=AcCtcrsAAAAJ;Q0piorUAAAAJ;iPmTQZMAAAAJ", "orcid": ";;;;0000-0002-2192-9543;;;;;;", "linkedin": ";;;;patrick-s-h-lewis/;;;;;;barlas-o%C4%9Fuz-25465050", "or_profile": "~Wenhan_Xiong1;~Xiang_Li2;~Srini_Iyer1;~Jingfei_Du1;~Patrick_Lewis2;~William_Yang_Wang2;mehdad@fb.com;~Scott_Yih1;~Sebastian_Riedel1;~Douwe_Kiela1;~Barlas_Oguz1", "aff": ", UC Santa Barbara;Department of Computer Science, University of Massachusetts, Amherst;Meta Facebook;;University College London;;;;Meta Facebook;Facebook AI Research;Meta", "aff_domain": "cs.ucsb.edu;cs.umass.edu;meta.com;;ucl.ac.uk;;;;fb.com;fb.com;meta.com", "position": "PhD student;PhD student;Principal Researcher;;PhD student;;;;Researcher;Research Scientist;Research Scientist", "bibtex": "@inproceedings{\nxiong2021answering,\ntitle={Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval},\nauthor={Wenhan Xiong and Xiang Li and Srini Iyer and Jingfei Du and Patrick Lewis and William Yang Wang and Yashar Mehdad and Scott Yih and Sebastian Riedel and Douwe Kiela and Barlas Oguz},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=EMHoBG0avc1}\n}", "github": "[![github](/images/github_icon.svg) facebookresearch/multihop_dense_retrieval](https://github.com/facebookresearch/multihop_dense_retrieval)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "5;6;7;9", "confidence": "3;5;4;4", "wc_review": "133;291;572;643", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "748;394;1030;309", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 6.75, 1.479019945774904 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 409.75, 207.0161527514218 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 620.25, 288.21899226109304 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 11, 0 ], "corr_rating_confidence": 0.23904572186687872, "gs_citation": 73, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=950426100300537362&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=EMHoBG0avc1", "email": "cs.ucsb.edu;cs.umass.edu;meta.com;;ucl.ac.uk;;;;fb.com;fb.com;meta.com", "author_num": 11, "aff_unique_index": "0;1;2;3;2;2;2", "aff_unique_norm": "University of California, Santa Barbara;University of Massachusetts Amherst;Meta;University College London", "aff_unique_dep": ";Department of Computer Science;Meta Platforms, Inc.;", "aff_unique_url": "https://www.ucsb.edu;https://www.umass.edu;https://meta.com;https://www.ucl.ac.uk", "aff_unique_abbr": "UCSB;UMass Amherst;Meta;UCL", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Santa Barbara;Amherst;", "aff_country_unique_index": "0;0;0;1;0;0;0", "aff_country_unique": "United States;United Kingdom" }, { "title": "Deep Learning meets Projective Clustering", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3076", "id": "EQfpYwF3-b", "poster": "", "openreview": "https://openreview.net/forum?id=EQfpYwF3-b", "slides": "https://iclr.cc/virtual/2021/poster/3076", "video": "https://iclr.cc/virtual/2021/poster/3076", "author_site": "Alaa Maalouf, Harry Lang, Daniela Rus, Dan Feldman", "tldr": "", "abstract": "A common approach for compressing Natural Language Processing (NLP) networks is to encode the embedding layer as a matrix $A\\in\\mathbb{R}^{n\\times d}$, compute its rank-$j$ approximation $A_j$ via SVD (Singular Value Decomposition), and then factor $A_j$ into a pair of matrices that correspond to smaller fully-connected layers to replace the original embedding layer. Geometrically, the rows of $A$ represent points in $\\mathbb{R}^d$, and the rows of $A_j$ represent their projections onto the $j$-dimensional subspace that minimizes the sum of squared distances (``errors'') to the points. \nIn practice, these rows of $A$ may be spread around $k>1$ subspaces, so factoring $A$ based on a single subspace may lead to large errors that turn into large drops in accuracy.\n\nInspired by \\emph{projective clustering} from computational geometry, we suggest replacing this subspace by a set of $k$ subspaces, each of dimension $j$, that minimizes the sum of squared distances over every point (row in $A$) to its \\emph{closest} subspace. Based on this approach, we provide a novel architecture that replaces the original embedding layer by a set of $k$ small layers that operate in parallel and are then recombined with a single fully-connected layer. \n\nExtensive experimental results on the GLUE benchmark yield networks that are both more accurate and smaller compared to the standard matrix factorization (SVD). For example, we further compress DistilBERT by reducing the size of the embedding layer by $40\\%$ while incurring only a $0.5\\%$ average drop in accuracy over all nine GLUE tasks, compared to a $2.8\\%$ drop using the existing SVD approach.\nOn RoBERTa we achieve $43\\%$ compression of the embedding layer with less than a $0.8\\%$ average drop in accuracy as compared to a $3\\%$ drop previously.", "keywords": "Compressing Deep Networks;NLP;Matrix Factorization;SVD", "primary_area": "", "supplementary_material": "/attachment/4d8b38c8e1b86d0b37e1668c7ea9e21839424870.zip", "author": "Alaa Maalouf;Harry Lang;Daniela Rus;Dan Feldman", "authorids": "~Alaa_Maalouf1;~Harry_Lang1;~Daniela_Rus1;~Dan_Feldman1", "gender": "M;;F;M", "homepage": ";;https://www.csail.mit.edu/person/daniela-rus;http://people.csail.mit.edu/dannyf/", "dblp": "242/8928.html;80/4437;r/DanielaRus;84/6696.html", "google_scholar": "https://scholar.google.com/citations?hl=en;;https://scholar.google.com/citations?hl=en;67QZN0gAAAAJ", "orcid": ";;;", "linkedin": "alaa-maalouf/?originalSubdomain=il;;;", "or_profile": "~Alaa_Maalouf1;~Harry_Lang1;~Daniela_Rus1;~Dan_Feldman1", "aff": "The University of Haifa;Massachusetts Institute of Technology;Massachusetts Institute of Technology;University of Haifa", "aff_domain": "ac.il;mit.edu;mit.edu;haifa.ac.il", "position": "PhD student;Postdoc;Full Professor;Associate Professor", "bibtex": "@inproceedings{\nmaalouf2021deep,\ntitle={Deep Learning meets Projective Clustering},\nauthor={Alaa Maalouf and Harry Lang and Daniela Rus and Dan Feldman},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=EQfpYwF3-b}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer4", "pdf_size": 0, "rating": "4;5;7", "confidence": "3;3;4", "wc_review": "377;473;269", "wc_reply_reviewers": "0;60;0", "wc_reply_authors": "388;808;522", "reply_reviewers": "0;1;0", "reply_authors": "3;3;3", "rating_avg": [ 5.333333333333333, 1.247219128924647 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 373.0, 83.33066662399864 ], "wc_reply_reviewers_avg": [ 20.0, 28.284271247461902 ], "wc_reply_authors_avg": [ 572.6666666666666, 175.1672216927458 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 3.0, 0.0 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.944911182523068, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11053046679811086908&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=EQfpYwF3-b", "email": "ac.il;mit.edu;mit.edu;haifa.ac.il", "author_num": 4, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "University of Haifa;Massachusetts Institute of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.haifa.ac.il;https://web.mit.edu", "aff_unique_abbr": "UoH;MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "Israel;United States" }, { "id": "EQtwFlmq7mx", "title": "Stochastic Proximal Point Algorithm for Large-scale Nonconvex Optimization: Convergence, Implementation, and Application to Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We revisit the stochastic proximal point algorithm (SPPA) for large-scale nonconvex optimization problems. SPPA has been shown to converge faster and more stable than the celebrated stochastic gradient descent (SGD) algorithm, and its many variations, for convex problems. However, the per-iteration update of SPPA is defined abstractly and has long been considered expensive. In this paper, we show that efficient implementation of SPPA can be achieved. If the problem is a nonlinear least squares, each iteration of SPPA can be efficiently implemented by Gauss-Newton; with some linear algebra trick the resulting complexity is in the same order of SGD. For more generic problems, SPPA can still be implemented with L-BFGS or accelerated gradient with high efficiency. Another contribution of this work is the convergence of SPPA to a stationary point in expectation for nonconvex problems. The result is encouraging that it admits more flexible choices of the step sizes under similar assumptions. The proposed algorithm is elaborated for both regression and classification problems using different neural network structures. Real data experiments showcase its effectiveness in terms of convergence and accuracy compared to SGD and its variants.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/6c5aaabeb6d8e0f18702c84edf2a9d799b5d56dd.zip", "author": "Aysegul Bumin;Kejun Huang", "authorids": "aysegul.bumin@ufl.edu;~Kejun_Huang1", "gender": ";M", "homepage": ";https://www.cise.ufl.edu/~kejun/", "dblp": ";140/8874", "google_scholar": ";-RIDViAAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "aysegul.bumin@ufl.edu;~Kejun_Huang1", "aff": ";University of Florida", "aff_domain": ";ufl.edu", "position": ";Assistant Professor", "bibtex": "@misc{\nbumin2021stochastic,\ntitle={Stochastic Proximal Point Algorithm for Large-scale Nonconvex Optimization: Convergence, Implementation, and Application to Neural Networks},\nauthor={Aysegul Bumin and Kejun Huang},\nyear={2021},\nurl={https://openreview.net/forum?id=EQtwFlmq7mx}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=EQtwFlmq7mx", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "4;4;4;5", "wc_review": "130;498;299;246", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 293.25, 133.0777498306911 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Qvb_--ayw84J:scholar.google.com/&scioq=Stochastic+Proximal+Point+Algorithm+for+Large-scale+Nonconvex+Optimization:+Convergence,+Implementation,+and+Application+to+Neural+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Florida", "aff_unique_dep": "", "aff_unique_url": "https://www.ufl.edu", "aff_unique_abbr": "UF", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "ERAQ5ZCP9t", "title": "Robust Multi-view Representation Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Multi-view data has become ubiquitous, especially with multi-sensor systems like self-driving cars or medical patient-side monitors.\n\nWe look at modeling multi-view data through robust representation learning, with the goal of leveraging relationships between views and building resilience to missing information.\nWe propose a new flavor of multi-view AutoEncoders, the Robust Multi-view AutoEncoder, which explicitly encourages robustness to missing views.\nThe principle we use is straightforward: we apply the idea of drop-out to the level of views.\nDuring training, we leave out views as input to our model while forcing it to reconstruct all of them.\nWe also consider a flow-based generative modeling extension of our approach in the case where all the views are available.\n\nWe conduct experiments for different scenarios: directly using the learned representations for reconstruction, as well as a two-step process where the learned representation is subsequently used as features for the data for a down-stream application.\nOur synthetic and real-world experiments show promising results for the application of these models to robust representation learning.", "keywords": "Multimodal Machine Learning;Representation Learning;AutoEncoders", "primary_area": "", "supplementary_material": "", "author": "Sibi Venkatesan;Kyle Miller;Artur Dubrawski", "authorids": "~Sibi_Venkatesan1;~Kyle_Miller1;~Artur_Dubrawski2", "gender": ";;M", "homepage": ";;https://www.autonlab.org", "dblp": ";92/11514;76/48", "google_scholar": ";;O3gezzcAAAAJ", "orcid": ";;0000-0002-2372-0831", "linkedin": ";;artur-dubrawski-33a2a87/", "or_profile": "~Sibi_Venkatesan1;~Kyle_Miller1;~Artur_Dubrawski2", "aff": ";Carnegie Mellon University;Carnegie Mellon University", "aff_domain": ";andrew;cmu.edu", "position": ";Project scientist;Research Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=ERAQ5ZCP9t", "pdf_size": 0, "rating": "3;3;3;3", "confidence": "5;5;4;5", "wc_review": "223;496;180;437", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.0, 0.0 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 334.0, 134.99074042318605 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "ES9cpVTyLL", "title": "Weak and Strong Gradient Directions: Explaining Memorization, Generalization, and Hardness of Examples at Scale", "track": "main", "status": "Reject", "tldr": "", "abstract": "Coherent Gradients (CGH) [Chatterjee, ICLR 20] is a recently proposed hypothesis to explain why over-parameterized neural networks trained with gradient descent generalize well even though they have sufficient capacity to memorize the training set. The key insight of CGH is that, since the overall gradient for a single step of SGD is the sum of the per-example gradients, it is strongest in directions that reduce the loss on multiple examples if such directions exist. In this paper, we validate CGH on ResNet, Inception, and VGG models on ImageNet. Since the techniques presented in the original paper do not scale beyond toy models and datasets, we propose new methods. By posing the problem of suppressing weak gradient directions as a problem of robust mean estimation, we develop a coordinate-based median of means approach. We present two versions of this algorithm, M3, which partitions a mini-batch into 3 groups and computes the median, and a more efficient version RM3, which reuses gradients from previous two time steps to compute the median. Since they suppress weak gradient directions without requiring per-example gradients, they can be used to train models at scale. Experimentally, we find that they indeed greatly reduce overfitting (and memorization) and thus provide the first convincing evidence that CGH holds at scale. We also propose a new test of CGH that does not depend on adding noise to training labels or on suppressing weak gradient directions. Using the intuition behind CGH, we posit that the examples learned early in the training process (i.e., \"easy\" examples) are precisely those that have more in common with other training examples. Therefore, as per CGH, the easy examples should generalize better amongst themselves than the hard examples amongst themselves. We validate this hypothesis with detailed experiments, and believe that it provides further orthogonal evidence for CGH.", "keywords": "generalization;deep learning;hardness of examples", "primary_area": "", "supplementary_material": "/attachment/cc7418ca58a75f4fa5861c6c639daeaf6ec31465.zip", "author": "Piotr Zielinski;Shankar Krishnan;Satrajit Chatterjee", "authorids": "zielinski@google.com;~Shankar_Krishnan1;~Satrajit_Chatterjee1", "gender": ";M;M", "homepage": ";;http://www.blif.org/~satrajit", "dblp": ";;https://dblp.org/pers/c/Chatterjee:Satrajit.html", "google_scholar": ";;Nh_5ogYAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "zielinski@google.com;~Shankar_Krishnan1;~Satrajit_Chatterjee1", "aff": ";;Google", "aff_domain": ";;google.com", "position": ";;Researcher", "bibtex": "@misc{\nzielinski2021weak,\ntitle={Weak and Strong Gradient Directions: Explaining Memorization, Generalization, and Hardness of Examples at Scale},\nauthor={Piotr Zielinski and Shankar Krishnan and Satrajit Chatterjee},\nyear={2021},\nurl={https://openreview.net/forum?id=ES9cpVTyLL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=ES9cpVTyLL", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "3;4;5;5", "wc_review": "443;838;781;1052", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 778.5, 218.46567236067088 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7852306672427027144&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3357", "id": "ESG-DMKQKsD", "poster": "", "openreview": "https://openreview.net/forum?id=ESG-DMKQKsD", "slides": "https://iclr.cc/virtual/2021/poster/3357", "video": "https://iclr.cc/virtual/2021/poster/3357", "author_site": "Zhipeng Bao, Yu-Xiong Wang, Martial Hebert", "tldr": "", "abstract": "We propose a novel task of joint few-shot recognition and novel-view synthesis: given only one or few images of a novel object from arbitrary views with only category annotation, we aim to simultaneously learn an object classifier and generate images of that type of object from new viewpoints. While existing work copes with two or more tasks mainly by multi-task learning of shareable feature representations, we take a different perspective. We focus on the interaction and cooperation between a generative model and a discriminative model, in a way that facilitates knowledge to flow across tasks in complementary directions. To this end, we propose bowtie networks that jointly learn 3D geometric and semantic representations with a feedback loop. Experimental evaluation on challenging fine-grained recognition datasets demonstrates that our synthesized images are realistic from multiple viewpoints and significantly improve recognition performance as ways of data augmentation, especially in the low-data regime. ", "keywords": "computer vision;object recognition;few-shot learning;generative models;adversarial training", "primary_area": "", "supplementary_material": "", "author": "Zhipeng Bao;Yu-Xiong Wang;Martial Hebert", "authorids": "~Zhipeng_Bao1;~Yu-Xiong_Wang1;~Martial_Hebert1", "gender": "M;;M", "homepage": "https://zpbao.github.io/;https://yxw.cs.illinois.edu/;http://www.cs.cmu.edu/~hebert/", "dblp": "244/8798;35/10700;h/MartialHebert", "google_scholar": "TwYdLuYAAAAJ;T_Q-xDkAAAAJ;https://scholar.google.com.tw/citations?user=0ytii2EAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Zhipeng_Bao1;~Yu-Xiong_Wang1;~Martial_Hebert1", "aff": ";Department of Computer Science, University of Illinois Urbana-Champaign;Carnegie Mellon University", "aff_domain": ";cs.illinois.edu;cmu.edu", "position": ";Assistant Professor;Professor", "bibtex": "@inproceedings{\nbao2021bowtie,\ntitle={Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis},\nauthor={Zhipeng Bao and Yu-Xiong Wang and Martial Hebert},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=ESG-DMKQKsD}\n}", "github": "[![github](/images/github_icon.svg) zpbao/bowtie_networks](https://github.com/zpbao/bowtie_networks)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;3;5;4", "wc_review": "344;331;518;515", "wc_reply_reviewers": "0;82;0;0", "wc_reply_authors": "958;681;363;495", "reply_reviewers": "0;1;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 427.0, 89.62421547773793 ], "wc_reply_reviewers_avg": [ 20.5, 35.50704155516198 ], "wc_reply_authors_avg": [ 624.25, 223.36447233165796 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.4264014327112209, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4751463230610145393&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=ESG-DMKQKsD", "email": ";cs.illinois.edu;cmu.edu", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of Illinois Urbana-Champaign;Carnegie Mellon University", "aff_unique_dep": "Department of Computer Science;", "aff_unique_url": "https://illinois.edu;https://www.cmu.edu", "aff_unique_abbr": "UIUC;CMU", "aff_campus_unique_index": "0", "aff_campus_unique": "Urbana-Champaign;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "ESVGfJM9a7", "title": "Neural Point Process for Forecasting Spatiotemporal Events", "track": "main", "status": "Reject", "tldr": "", "abstract": "Forecasting events occurring in space and time is a fundamental problem. Existing neural point process models are only temporal and are limited in spatial inference. We propose a family of deep sequence models that integrate spatiotemporal point processes with deep neural networks. Our novel Neural Spatiotemporal Point Process model is flexible, efficient, and can accurately predict irregularly sampled events. The key construction of our approach is based on space-time separation of temporal intensity function and time-conditioned spatial density function, which is approximated by kernel density estimation. We validate our model on the synthetic spatiotemporal Hawkes process and self-correcting process. On many benchmark spatiotemporal event forecasting datasets, our model demonstrates superior performances. To the best of our knowledge, this is the first neural point process model that can jointly predict the continuous space and time of events. ", "keywords": "spatiotemporal point process;deep sequence models;time series", "primary_area": "", "supplementary_material": "", "author": "Zihao Zhou;Xingyi Yang;Xinyi He;Ryan Rossi;Handong Zhao;Rose Yu", "authorids": "~Zihao_Zhou1;~Xingyi_Yang1;~Xinyi_He1;~Ryan_Rossi1;~Handong_Zhao3;~Rose_Yu1", "gender": "M;M;F;M;F;", "homepage": "http://zzhou.info;https://adamdad.github.io/;https://www.linkedin.com/in/xinyi-he-91ab0b13b/;http://ryanrossi.com;http://roseyu.com;https://hdzhao.github.io/", "dblp": ";;;17/5085;164/7314;79/8522", "google_scholar": ";1n2OPtwAAAAJ;;_Dc6lbQAAAAJ;;0f-YOFgAAAAJ", "orcid": ";;;0000-0001-9758-0635;;", "linkedin": ";;;;;", "or_profile": "~Zihao_Zhou1;~Xingyi_Yang1;~Xinyi_He1;~Ryan_Rossi1;~Rose_Yu1;~Handong_Zhao1", "aff": "University of California, San Diego;University of California, San Diego;University of California, San Diego;Adobe Research;University of California, San Diego;Adobe Systems", "aff_domain": "ucsd.edu;ucsd.edu;ucsd.edu;adobe.com;ucsd.edu;adobe.com", "position": "MS student;MS student;MS student;Senior Research Scientist;Assistant Professor;Research Scientist", "bibtex": "@misc{\nzhou2021neural,\ntitle={Neural Point Process for Forecasting Spatiotemporal Events},\nauthor={Zihao Zhou and Xingyi Yang and Xinyi He and Ryan Rossi and Handong Zhao and Rose Yu},\nyear={2021},\nurl={https://openreview.net/forum?id=ESVGfJM9a7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=ESVGfJM9a7", "pdf_size": 0, "rating": "4;4;5;8", "confidence": "4;5;4;4", "wc_review": "471;507;147;208", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "525;383;128;113", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 1.6393596310755 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 333.25, 157.75039619601594 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 287.25, 174.2245318547305 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.44022545316281186, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:e8J0fuMLDAcJ:scholar.google.com/&scioq=Neural+Point+Process+for+Forecasting+Spatiotemporal+Events&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0;1;0;1", "aff_unique_norm": "University of California, San Diego;Adobe", "aff_unique_dep": ";Adobe Research", "aff_unique_url": "https://www.ucsd.edu;https://research.adobe.com", "aff_unique_abbr": "UCSD;Adobe", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "San Diego;", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Learning with AMIGo: Adversarially Motivated Intrinsic Goals", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2781", "id": "ETBc_MIMgoX", "poster": "", "openreview": "https://openreview.net/forum?id=ETBc_MIMgoX", "slides": "https://iclr.cc/virtual/2021/poster/2781", "video": "https://iclr.cc/virtual/2021/poster/2781", "author_site": "Andres Campero, Roberta Raileanu, Heinrich Kuttler, Joshua B Tenenbaum, Tim Rocktaeschel, Edward Grefenstette", "tldr": "", "abstract": "A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals to train a goal-conditioned \"student\" policy in the absence of (or alongside) environment reward. Specifically, through a simple but effective \"constructively adversarial\" objective, the teacher learns to propose increasingly challenging -- yet achievable -- goals that allow the student to learn general skills for acting in a new environment, independent of the task to be solved. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks where other forms of intrinsic motivation and state-of-the-art RL methods fail.", "keywords": "reinforcement learning;exploration;meta-learning", "primary_area": "", "supplementary_material": "", "author": "Andres Campero;Roberta Raileanu;Heinrich Kuttler;Joshua B. Tenenbaum;Tim Rockt\u00e4schel;Edward Grefenstette", "authorids": "~Andres_Campero1;~Roberta_Raileanu2;~Heinrich_Kuttler1;~Joshua_B._Tenenbaum1;~Tim_Rockt\u00e4schel1;~Edward_Grefenstette1", "gender": "M;;;M;F;M", "homepage": "https://andrescampero.mit.edu/;;;http://egrefen.com/;https://rraileanu.github.io/;http://rockt.ai", "dblp": "https://dblp.uni-trier.de/pers/hd/c/Campero:Andres;;t/JoshuaBTenenbaum;http://dblp.uni-trier.de/pers/hd/g/Grefenstette:Edward;215/5579;43/11537", "google_scholar": ";;;https://scholar.google.co.uk/citations?user=ezllEwMAAAAJ;9hVXpJ0AAAAJ;https://scholar.google.co.uk/citations?user=mWBY8aIAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;roberta-raileanu-44b25660/;rockt/", "or_profile": "~Andres_Campero1;~Heinrich_Kuttler1;~Joshua_B._Tenenbaum1;~Edward_Grefenstette1;~Roberta_Raileanu1;~Tim_Rocktaeschel1", "aff": "Massachusetts Institute of Technology;Meta Facebook;Massachusetts Institute of Technology;Meta Facebook;New York University;Department of Computer Science, University College London", "aff_domain": "mit.edu;fb.com;mit.edu;fb.com;nyu.edu;cs.ucl.ac.uk", "position": "PhD student;Research Engineer;Professor;Research Scientist;PhD student;Assistant Professor", "bibtex": "@inproceedings{\ncampero2021learning,\ntitle={Learning with {\\{}AMIG{\\}}o: Adversarially Motivated Intrinsic Goals},\nauthor={Andres Campero and Roberta Raileanu and Heinrich Kuttler and Joshua B. Tenenbaum and Tim Rockt{\\\"a}schel and Edward Grefenstette},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=ETBc_MIMgoX}\n}", "github": "[![github](/images/github_icon.svg) facebookresearch/adversarially-motivated-intrinsic-goals](https://github.com/facebookresearch/adversarially-motivated-intrinsic-goals) + [![Papers with Code](/images/pwc_icon.svg) 4 community implementations](https://paperswithcode.com/paper/?openreview=ETBc_MIMgoX)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer5", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;4;4;3", "wc_review": "455;623;1131;706", "wc_reply_reviewers": "0;0;387;1228", "wc_reply_authors": "767;1022;1896;4083", "reply_reviewers": "0;0;1;4", "reply_authors": "1;2;4;7", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 728.75, 249.22116182218556 ], "wc_reply_reviewers_avg": [ 403.75, 501.4221649468639 ], "wc_reply_authors_avg": [ 1942.0, 1305.088311188174 ], "reply_reviewers_avg": [ 1.25, 1.6393596310755 ], "reply_authors_avg": [ 3.5, 2.29128784747792 ], "replies_avg": [ 25, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 168, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10840346887158319600&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=ETBc_MIMgoX", "email": "mit.edu;fb.com;mit.edu;fb.com;nyu.edu;cs.ucl.ac.uk", "author_num": 6, "aff_unique_index": "0;1;0;1;2;3", "aff_unique_norm": "Massachusetts Institute of Technology;Meta;New York University;University College London", "aff_unique_dep": ";Meta Platforms, Inc.;;Department of Computer Science", "aff_unique_url": "https://web.mit.edu;https://meta.com;https://www.nyu.edu;https://www.ucl.ac.uk", "aff_unique_abbr": "MIT;Meta;NYU;UCL", "aff_campus_unique_index": "1", "aff_campus_unique": ";London", "aff_country_unique_index": "0;0;0;0;0;1", "aff_country_unique": "United States;United Kingdom" }, { "id": "EUUp9nWXsop", "title": "IALE: Imitating Active Learner Ensembles", "track": "main", "status": "Reject", "tldr": "", "abstract": "Active learning (AL) prioritizes the labeling of the most informative data samples. However, the performance of AL heuristics depends on the structure of the underlying classifier model and the data. We propose an imitation learning scheme that imitates the selection of the best expert heuristic at each stage of the AL cycle in a batch-mode pool-based setting. We use DAGGER to train the policy on a dataset and later apply it to datasets from similar domains. With multiple AL heuristics as experts, the policy is able to reflect the choices of the best AL heuristics given the current state of the AL process. Our experiment on well-known datasets show that we both outperform state of the art imitation learners and heuristics.", "keywords": "active learning;imitating learning;ensembles", "primary_area": "", "supplementary_material": "/attachment/b2d9a25a22298346beffbad7c79fee4c9e4bf98d.zip", "author": "Christoffer L\u00f6ffler;Christopher Mutschler", "authorids": "~Christoffer_L\u00f6ffler1;~Christopher_Mutschler1", "gender": "M;M", "homepage": "https://christofferloeffler.com;https://www.cmutschler.de", "dblp": "141/5637;118/7748", "google_scholar": "bIaHh6gAAAAJ;https://scholar.google.de/citations?user=gKDSp8YAAAAJ", "orcid": "0000-0003-1834-8323;0000-0001-8108-0230", "linkedin": ";christopher-mutschler-28431576/", "or_profile": "~Christoffer_L\u00f6ffler1;~Christopher_Mutschler1", "aff": "Fraunhofer IIS;Fraunhofer IIS", "aff_domain": "fraunhofer.de;fraunhofer.de", "position": "Scientific Associate;Principal Researcher", "bibtex": "@misc{\nl{\\\"o}ffler2021iale,\ntitle={{\\{}IALE{\\}}: Imitating Active Learner Ensembles},\nauthor={Christoffer L{\\\"o}ffler and Christopher Mutschler},\nyear={2021},\nurl={https://openreview.net/forum?id=EUUp9nWXsop}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=EUUp9nWXsop", "pdf_size": 0, "rating": "4;5;6", "confidence": "4;4;3", "wc_review": "311;785;394", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "158;659;211", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 496.6666666666667, 206.6790318881483 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 342.6666666666667, 224.7255115814748 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6071906203847773282&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0", "aff_unique_norm": "Fraunhofer Institute for Integrated Circuits", "aff_unique_dep": "", "aff_unique_url": "https://www.iis.fraunhofer.de/", "aff_unique_abbr": "Fraunhofer IIS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "EVV259WQuFG", "title": "Machine Reading Comprehension with Enhanced Linguistic Verifiers", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose two linguistic verifiers for span-extraction style machine reading comprehension to respectively tackle two challenges: how to evaluate the syntactic completeness of predicted answers and how to utilize the rich context of long documents. Our first verifier rewrites a question through replacing its interrogatives by the predicted answer phrases and then builds a cross-attention scorer between the rewritten question and the segment, so that the answer candidates are scored in a \\emph{position-sensitive} context. Our second verifier builds a hierarchical attention network to represent segments in a passage where neighbour segments in long passages are \\emph{recurrently connected} and can contribute to current segment-question pair's inference for answerablility classification and boundary determination. We then combine these two verifiers together into a pipeline and apply it to SQuAD2.0, NewsQA and TriviaQA benchmark sets. Our pipeline achieves significantly better improvements of both exact matching and F1 scores than state-of-the-art baselines.", "keywords": "machine reading comprehension;BERT;linguistic verifiers;hierarchical attention networks", "primary_area": "", "supplementary_material": "", "author": "Xianchao Wu", "authorids": "~Xianchao_Wu1", "gender": "M", "homepage": "https://sites.google.com/site/xianchaowu2012/home", "dblp": "https://dblp.org/pers/hd/w/Wu:Xianchao", "google_scholar": "0cP7RfUAAAAJ", "orcid": "", "linkedin": "xianchao-wu-6239101a/", "or_profile": "~Xianchao_Wu1", "aff": "NVIDIA", "aff_domain": "nvidia.com", "position": "Senior Data Scientist", "bibtex": "@misc{\nwu2021machine,\ntitle={Machine Reading Comprehension with Enhanced Linguistic Verifiers},\nauthor={Xianchao Wu},\nyear={2021},\nurl={https://openreview.net/forum?id=EVV259WQuFG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=EVV259WQuFG", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "3;4;5;5", "wc_review": "518;303;575;360", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "539;773;791;724", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 439.0, 111.21375814169755 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 706.75, 99.90589321956938 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.8181818181818182, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=31300972631355734&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "NVIDIA", "aff_unique_dep": "NVIDIA Corporation", "aff_unique_url": "https://www.nvidia.com", "aff_unique_abbr": "NVIDIA", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "EXkD6ZjvJQQ", "title": "Provable More Data Hurt in High Dimensional Least Squares Estimator", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper investigates the finite-sample prediction risk of the high-dimensional least squares estimator. We derive the central limit theorem for the prediction risk when both the sample size and the number of features tend to infinity. Furthermore, the finite-sample distribution and the confidence interval of the prediction risk are provided. Our theoretical results demonstrate the sample-wise non-monotonicity of the prediction risk and confirm ''more data hurt'' phenomenon.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Zeng Li;Chuanlong Xie;QINWEN WANG", "authorids": "liz9@sustech.edu.cn;~Chuanlong_Xie1;~QINWEN_WANG1", "gender": ";M;F", "homepage": ";;", "dblp": ";;", "google_scholar": ";_fgE3u8AAAAJ;", "orcid": ";;0000-0002-8732-1696", "linkedin": ";;", "or_profile": "liz9@sustech.edu.cn;~Chuanlong_Xie1;~QINWEN_WANG1", "aff": ";Huawei Technologies Ltd.;Fudan University", "aff_domain": ";huawei.com;fudan.edu.cn", "position": ";Researcher;Assistant Professor", "bibtex": "@misc{\nli2021provable,\ntitle={Provable More Data Hurt in High Dimensional Least Squares Estimator},\nauthor={Zeng Li and Chuanlong Xie and QINWEN WANG},\nyear={2021},\nurl={https://openreview.net/forum?id=EXkD6ZjvJQQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=EXkD6ZjvJQQ", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;3;4", "wc_review": "314;369;518", "wc_reply_reviewers": "192;0;0", "wc_reply_authors": "1389;465;791", "reply_reviewers": "1;0;0", "reply_authors": "2;1;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 400.3333333333333, 86.17939944609088 ], "wc_reply_reviewers_avg": [ 64.0, 90.50966799187809 ], "wc_reply_authors_avg": [ 881.6666666666666, 382.63066032692967 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3144003008454854377&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Huawei;Fudan University", "aff_unique_dep": "Huawei Technologies;", "aff_unique_url": "https://www.huawei.com;https://www.fudan.edu.cn", "aff_unique_abbr": "Huawei;Fudan", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "EZ8aZaCt9k", "title": "No Spurious Local Minima: on the Optimization Landscapes of Wide and Deep Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Empirical studies suggest that wide neural networks are comparably easy to optimize, but mathematical support for this observation is scarce. In this paper, we analyze the optimization landscapes of deep learning with wide networks. We prove especially that constraint and unconstraint empirical-risk minimization over such networks has no spurious local minima. Hence, our theories substantiate the common belief that increasing network widths not only improves the expressiveness of deep-learning pipelines but also facilitates their optimizations.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Johannes Lederer", "authorids": "~Johannes_Lederer1", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@misc{\nlederer2021no,\ntitle={No Spurious Local Minima: on the Optimization Landscapes of Wide and Deep Neural Networks},\nauthor={Johannes Lederer},\nyear={2021},\nurl={https://openreview.net/forum?id=EZ8aZaCt9k}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=EZ8aZaCt9k", "pdf_size": 0, "rating": "4;4;4;5;6", "confidence": "4;3;4;4;3", "wc_review": "323;296;388;694;209", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 4.6, 0.8 ], "confidence_avg": [ 3.6, 0.4898979485566356 ], "wc_review_avg": [ 382.0, 166.24439840187097 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.408248290463863, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=88758716311694755&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0 }, { "id": "E_U8Zvx7zrf", "title": "Delay-Tolerant Local SGD for Efficient Distributed Training", "track": "main", "status": "Reject", "tldr": "", "abstract": "The heavy communication for model synchronization is a major bottleneck for scaling up the distributed deep neural network training to many workers. Moreover, model synchronization can suffer from long delays in scenarios such as federated learning and geo-distributed training. Thus, it is crucial that the distributed training methods are both \\textit{delay-tolerant} AND \\textit{communication-efficient}. However, existing works cannot simultaneously address the communication delay and bandwidth constraint. To address this important and challenging problem, we propose a novel training framework OLCO\\textsubscript{3} to achieve delay tolerance with a low communication budget by using stale information. OLCO\\textsubscript{3} introduces novel staleness compensation and compression compensation to combat the influence of staleness and compression error. Theoretical analysis shows that OLCO\\textsubscript{3} achieves the same sub-linear convergence rate as the vanilla synchronous stochastic gradient descent (SGD) method. Extensive experiments on deep learning tasks verify the effectiveness of OLCO\\textsubscript{3} and its advantages over existing works.", "keywords": "Delay-tolerant;communication-efficient;distributed learning", "primary_area": "", "supplementary_material": "", "author": "An Xu;Xiao Yan;Hongchang Gao;Heng Huang", "authorids": "~An_Xu1;~Xiao_Yan2;~Hongchang_Gao1;~Heng_Huang1", "gender": "M;;M;M", "homepage": "https://anxuthu.github.io/;;https://www.cs.umd.edu/~heng/;https://yanxiaosunny.github.io/", "dblp": "47/8547;166/5141.html;03/281;07/2626-2", "google_scholar": "GWiQawQAAAAJ;;4OqLaDwAAAAJ;rzNoyOIAAAAJ", "orcid": ";;;0000-0002-2122-915X", "linkedin": "an-xu-918604157/;;;", "or_profile": "~An_Xu1;~Hongchang_Gao1;~Heng_Huang1;~Xiao_Yan1", "aff": ";Temple University;University of Pittsburgh;Southern University of Science and Technology", "aff_domain": ";temple.edu;pitt.edu;sustech.edu.cn", "position": ";Assistant Professor;Full Professor;Researcher", "bibtex": "@misc{\nxu2021delaytolerant,\ntitle={Delay-Tolerant Local {\\{}SGD{\\}} for Efficient Distributed Training},\nauthor={An Xu and Xiao Yan and Hongchang Gao and Heng Huang},\nyear={2021},\nurl={https://openreview.net/forum?id=E_U8Zvx7zrf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=E_U8Zvx7zrf", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;3;4;4", "wc_review": "746;343;193;335", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 404.25, 206.13511952115292 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:u832P26TDg8J:scholar.google.com/&scioq=Delay-Tolerant+Local+SGD+for+Efficient+Distributed+Training&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Temple University;University of Pittsburgh;Southern University of Science and Technology", "aff_unique_dep": ";;", "aff_unique_url": "https://www.temple.edu;https://www.pitt.edu;https://www.sustech.edu.cn", "aff_unique_abbr": "Temple;Pitt;SUSTech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;China" }, { "title": "Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3144", "id": "EbIDjBynYJ8", "poster": "", "openreview": "https://openreview.net/forum?id=EbIDjBynYJ8", "slides": "https://iclr.cc/virtual/2021/poster/3144", "video": "https://iclr.cc/virtual/2021/poster/3144", "author_site": "David Klindt, Lukas Schott, Yash Sharma, Ivan Ustyuzhaninov, Wieland Brendel, Matthias Bethge, Dylan Paiton", "tldr": "", "abstract": "Disentangling the underlying generative factors from complex data has so far been limited to carefully constructed scenarios. We propose a path towards natural data by first showing that the statistics of natural data provide enough structure to enable disentanglement, both theoretically and empirically. Specifically, we provide evidence that objects in natural movies undergo transitions that are typically small in magnitude with occasional large jumps, which is characteristic of a temporally sparse distribution. To address this finding we provide a novel proof that relies on a sparse prior on temporally adjacent observations to recover the true latent variables up to permutations and sign flips, directly providing a stronger result than previous work. We show that equipping practical estimation methods with our prior often surpasses the current state-of-the-art on several established benchmark datasets without any impractical assumptions, such as knowledge of the number of changing generative factors. Furthermore, we contribute two new benchmarks, Natural Sprites and KITTI Masks, which integrate the measured natural dynamics to enable disentanglement evaluation with more realistic datasets. We leverage these benchmarks to test our theory, demonstrating improved performance. We also identify non-obvious challenges for current methods in scaling to more natural domains. Taken together our work addresses key issues in disentanglement research for moving towards more natural settings. ", "keywords": "disentanglement;independent component analysis;natural scene statistics", "primary_area": "", "supplementary_material": "", "author": "David A. Klindt;Lukas Schott;Yash Sharma;Ivan Ustyuzhaninov;Wieland Brendel;Matthias Bethge;Dylan Paiton", "authorids": "~David_A._Klindt1;~Lukas_Schott2;~Yash_Sharma1;~Ivan_Ustyuzhaninov1;~Wieland_Brendel1;~Matthias_Bethge1;dpaiton@gmail.com", "gender": ";;;M;M;M;", "homepage": ";;http://www.yash-sharma.com;;;https://bethgelab.org;", "dblp": ";;121/9967-1;182/2480;37/11107;77/3005;", "google_scholar": ";;AlGCn8wAAAAJ;YGEMpYUAAAAJ;v-JL-hsAAAAJ;https://scholar.google.com/citations?hl=en;", "orcid": ";;;;;;", "linkedin": ";;yashjsharma/;;;;", "or_profile": "~David_A._Klindt1;~Lukas_Schott2;~Yash_Sharma1;~Ivan_Ustyuzhaninov1;~Wieland_Brendel1;~Matthias_Bethge1;dpaiton@gmail.com", "aff": ";;University of Tuebingen;University of T\u00fcbingen;University of Tuebingen;University of Tuebingen;", "aff_domain": ";;uni-tuebingen.de;uni-tuebingen.de;uni-tuebingen.de;uni-tuebingen.de;", "position": ";;PhD student;PhD student;Principal Researcher;Full Professor;", "bibtex": "@inproceedings{\nklindt2021towards,\ntitle={Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding},\nauthor={David A. Klindt and Lukas Schott and Yash Sharma and Ivan Ustyuzhaninov and Wieland Brendel and Matthias Bethge and Dylan Paiton},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=EbIDjBynYJ8}\n}", "github": "[![github](/images/github_icon.svg) bethgelab/slow_disentanglement](https://github.com/bethgelab/slow_disentanglement)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "7;8;9;9", "confidence": "3;4;3;4", "wc_review": "570;718;644;528", "wc_reply_reviewers": "363;0;36;0", "wc_reply_authors": "2201;410;926;460", "reply_reviewers": "1;0;1;0", "reply_authors": "4;1;2;1", "rating_avg": [ 8.25, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 615.0, 72.53275122315436 ], "wc_reply_reviewers_avg": [ 99.75, 152.69638993767992 ], "wc_reply_authors_avg": [ 999.25, 722.4220978763038 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.0, 1.224744871391589 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 154, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6210780149209435477&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=EbIDjBynYJ8", "email": ";;uni-tuebingen.de;uni-tuebingen.de;uni-tuebingen.de;uni-tuebingen.de;", "author_num": 7, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "University of Tuebingen;University of T\u00fcbingen", "aff_unique_dep": ";", "aff_unique_url": "https://www.uni-tuebingen.de/;https://www.uni-tuebingen.de/", "aff_unique_abbr": "Uni T\u00fcbingen;Uni T\u00fcbingen", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Germany" }, { "title": "Hyperbolic Neural Networks++", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3337", "id": "Ec85b0tUwbA", "poster": "", "openreview": "https://openreview.net/forum?id=Ec85b0tUwbA", "slides": "https://iclr.cc/virtual/2021/poster/3337", "video": "https://iclr.cc/virtual/2021/poster/3337", "author_site": "Ryohei Shimizu, YUSUKE Mukuta, Tatsuya Harada", "tldr": "", "abstract": "Hyperbolic spaces, which have the capacity to embed tree structures without distortion owing to their exponential volume growth, have recently been applied to machine learning to better capture the hierarchical nature of data. In this study, we generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar\u00e9 ball model. This novel methodology constructs a multinomial logistic regression, fully-connected layers, convolutional layers, and attention mechanisms under a unified mathematical interpretation, without increasing the parameters. Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.", "keywords": "Hyperbolic Geometry;Poincar\u00e9 Ball Model;Parameter-Reduced MLR;Geodesic-Aware FC Layer;Convolutional Layer;Attention Mechanism", "primary_area": "", "supplementary_material": "", "author": "Ryohei Shimizu;YUSUKE Mukuta;Tatsuya Harada", "authorids": "~Ryohei_Shimizu1;~YUSUKE_Mukuta1;~Tatsuya_Harada1", "gender": "M;;M", "homepage": ";https://www.mi.t.u-tokyo.ac.jp/mukuta/;https://www.mi.t.u-tokyo.ac.jp/harada/", "dblp": ";153/5464;14/5849", "google_scholar": "https://scholar.google.co.jp/citations?view_op=list_works;https://scholar.google.co.jp/citations?user=emo91rIAAAAJ;https://scholar.google.com/citations?hl=ja", "orcid": ";;", "linkedin": ";;", "or_profile": "~Ryohei_Shimizu1;~YUSUKE_Mukuta1;~Tatsuya_Harada1", "aff": "The University of Tokyo;The University of Tokyo;The University of Tokyo", "aff_domain": "tokyo.ac.jp;u-tokyo.ac.jp;u-tokyo.ac.jp", "position": "MS student;Lecturer;Full Professor", "bibtex": "@inproceedings{\nshimizu2021hyperbolic,\ntitle={Hyperbolic Neural Networks++},\nauthor={Ryohei Shimizu and YUSUKE Mukuta and Tatsuya Harada},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ec85b0tUwbA}\n}", "github": "[![github](/images/github_icon.svg) mil-tokyo/hyperbolic_nn_plusplus](https://github.com/mil-tokyo/hyperbolic_nn_plusplus)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "4;3;4;4", "wc_review": "179;752;317;835", "wc_reply_reviewers": "0;76;0;0", "wc_reply_authors": "292;1797;673;728", "reply_reviewers": "0;1;0;0", "reply_authors": "1;3;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 520.75, 278.6291217730121 ], "wc_reply_reviewers_avg": [ 19.0, 32.90896534380867 ], "wc_reply_authors_avg": [ 872.5, 559.5446809683745 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 179, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13702563246653838309&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Ec85b0tUwbA", "email": "tokyo.ac.jp;u-tokyo.ac.jp;u-tokyo.ac.jp", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Japan" }, { "id": "EdXhmWvvQV", "title": "Center-wise Local Image Mixture For Contrastive Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent advances in unsupervised representation learning have experienced remarkable progress, especially with the achievements of contrastive learning, which regards each image as well its augmentations as a separate class, while does not consider the semantic similarity among images. This paper proposes a new kind of data augmentation, named Center-wise Local Image Mixture, to expand the neighborhood space of an image. CLIM encourages both local similarity and global aggregation while pulling similar images. This is achieved by searching local similar samples of an image, and only selecting images that are closer to the corresponding cluster center, which we denote as center-wise local selection. As a result, similar representations are progressively approaching the clusters, while do not break the local similarity. Furthermore, image mixture is used as a smoothing regularization to avoid overconfident the selected samples. Besides, we introduce multi-resolution augmentation, which enables the representation to be scale invariant. Integrating the two augmentations produces better feature representation on several unsupervised benchmarks. Notably, we reach 75.5% top-1 accuracy with linear evaluation over ResNet-50, and 59.3% top-1 accuracy when fine-tuned with only 1% labels, as well as consistently outperforming supervised pretraining on several downstream transfer tasks.", "keywords": "Self-supervised Learning;Data Mixing;Contrastive Learning", "primary_area": "", "supplementary_material": "", "author": "Hao Li;XIAOPENG ZHANG;Ruoyu Sun;Hongkai Xiong;Qi Tian", "authorids": "~Hao_Li11;~XIAOPENG_ZHANG7;~Ruoyu_Sun2;~Hongkai_Xiong1;~Qi_Tian3", "gender": "M;M;M;M;M", "homepage": "https://github.com/lihao0374;https://sites.google.com/site/zxphistory/;http://www.com;http://min.sjtu.edu.cn;https://www.qitian1987.com/index.html", "dblp": ";;;21/3569;78/1467-1.html", "google_scholar": "j6Clad4AAAAJ;Ud6aBAcAAAAJ;;bB16iN4AAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;0000-0003-4552-0029;0000-0002-7252-5047", "linkedin": ";;;;", "or_profile": "~Hao_Li11;~XIAOPENG_ZHANG7;~Ruoyu_Sun2;~Hongkai_Xiong1;~Qi_Tian3", "aff": "Shanghai Jiaotong University;Huawei Technologies Ltd.;Shanghai Jiaotong University;Shanghai Jiaotong University;Huawei Technologies Ltd.", "aff_domain": "sjtu.edu.cn;huawei.com;sjtu.edu.cn;sjtu.edu.cn;huawei.com", "position": "MS student;Principal Researcher;MS student;Full Professor;Principal Researcher", "bibtex": "@misc{\nli2021centerwise,\ntitle={Center-wise Local Image Mixture For Contrastive Representation Learning},\nauthor={Hao Li and XIAOPENG ZHANG and Ruoyu Sun and Hongkai Xiong and Qi Tian},\nyear={2021},\nurl={https://openreview.net/forum?id=EdXhmWvvQV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=EdXhmWvvQV", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;4;4;4", "wc_review": "614;436;454;798", "wc_reply_reviewers": "0;0;0;278", "wc_reply_authors": "570;418;497;1420", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;3", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 575.5, 145.9546162339513 ], "wc_reply_reviewers_avg": [ 69.5, 120.37753112603697 ], "wc_reply_authors_avg": [ 726.25, 404.12768712376044 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16111269070606529187&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;0;0;1", "aff_unique_norm": "Shanghai Jiao Tong University;Huawei", "aff_unique_dep": ";Huawei Technologies", "aff_unique_url": "https://www.sjtu.edu.cn;https://www.huawei.com", "aff_unique_abbr": "SJTU;Huawei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "China" }, { "id": "EeeOTYhLlVm", "title": "EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Epidemiologists model the dynamics of epidemics in order to propose control strategies based on pharmaceutical and non-pharmaceutical interventions (contact limitation, lock down, vaccination, etc). Hand-designing such strategies is not trivial because of the number of possible interventions and the difficulty to predict long-term effects. This task can be cast as an optimization problem where state-of-the-art machine learning algorithms such as deep reinforcement learning might bring significant value. However, the specificity of each domain - epidemic modelling or solving optimization problems - requires strong collaborations between researchers from different fields of expertise.\nThis is why we introduce EpidemiOptim, a Python toolbox that facilitates collaborations between researchers in epidemiology and optimization. EpidemiOptim turns epidemiological models and cost functions into optimization problems via a standard interface commonly used by optimization practitioners (OpenAI Gym). Reinforcement learning algorithms based on Q-Learning with deep neural networks (DQN) and evolutionary algorithms (NSGA-II) are already implemented. We illustrate the use of EpidemiOptim to find optimal policies for dynamical on-off lock-down control under the optimization of death toll and economic recess using a Susceptible-Exposed-Infectious-Removed (SEIR) model for SARS-CoV-2/COVID-19. \nUsing EpidemiOptim and its interactive visualization platform in Jupyter notebooks, epidemiologists, optimization practitioners and others (e.g. economists) can easily compare epidemiological models, costs functions and optimization algorithms to address important choices to be made by health decision-makers.", "keywords": "epidemiology;covid19;reinforcement learning;evolutionary algorithms;multi-objective optimization;decision-making;toolbox", "primary_area": "", "supplementary_material": "", "author": "C\u00e9dric Colas;Boris Hejblum;S\u00e9bastien Rouillon;Rodolphe Thiebaut;Pierre-Yves Oudeyer;Cl\u00e9ment Moulin-Frier;M\u00e9lanie Prague", "authorids": "~C\u00e9dric_Colas1;boris.hejblum@u-bordeaux.fr;sebastien.rouillon@u-bordeaux.fr;rodolphe.thiebaut@inria.fr;~Pierre-Yves_Oudeyer1;~Cl\u00e9ment_Moulin-Frier2;melanie.prague@u-bordeaux.fr", "gender": "M;;;;M;M;", "homepage": "https://cedriccolas.com;;;;http://www.pyoudeyer.com;http://clement-moulin-frier.github.io/;", "dblp": "215/3872;;;;33/5513;124/0220;", "google_scholar": "https://scholar.google.fr/citations?user=VBz8gZ4AAAAJ;;;;https://scholar.google.fr/citations?user=gCqGj4sAAAAJ;rBnV60QAAAAJ;", "orcid": "0000-0003-0212-427X;;;;;;", "linkedin": ";;;;pierreyvesoudeyer/;;", "or_profile": "~C\u00e9dric_Colas1;boris.hejblum@u-bordeaux.fr;sebastien.rouillon@u-bordeaux.fr;rodolphe.thiebaut@inria.fr;~Pierre-Yves_Oudeyer1;~Cl\u00e9ment_Moulin-Frier2;melanie.prague@u-bordeaux.fr", "aff": "INRIA;;;;Microsoft;Inria;", "aff_domain": "inria.fr;;;;microsoft.com;inria.fr;", "position": "PhD student;;;;Visiting researcher;Associate Professor;", "bibtex": "@misc{\ncolas2021epidemioptim,\ntitle={EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models},\nauthor={C{\\'e}dric Colas and Boris Hejblum and S{\\'e}bastien Rouillon and Rodolphe Thiebaut and Pierre-Yves Oudeyer and Cl{\\'e}ment Moulin-Frier and M{\\'e}lanie Prague},\nyear={2021},\nurl={https://openreview.net/forum?id=EeeOTYhLlVm}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=EeeOTYhLlVm", "pdf_size": 0, "rating": "3;3;4", "confidence": "3;4;4", "wc_review": "309;243;198", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "310;0;0", "reply_reviewers": "0;0;0", "reply_authors": "1;0;0", "rating_avg": [ 3.3333333333333335, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 250.0, 45.58508528016593 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 103.33333333333333, 146.13540144521983 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.49999999999999983, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17904877261633007251&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 15, "aff_unique_index": "0;1;0", "aff_unique_norm": "INRIA;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "https://www.inria.fr;https://www.microsoft.com", "aff_unique_abbr": "INRIA;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "France;United States" }, { "id": "Ef1nNHQHZ20", "title": "Layer-wise Adversarial Defense: An ODE Perspective", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep neural networks are observed to be fragile against adversarial attacks, which have dramatically limited their practical applicability. On improving model robustness, the adversarial training techniques have proven effective and gained increasing attention from research communities. Existing adversarial training approaches mainly focus on perturbations to inputs, while the effect of the perturbations in hidden layers remains underexplored. In this work, we propose layer-wise adversarial defense which improves adversarial training by a noticeable margin. The basic idea of our method is to strengthen all of the hidden layers with perturbations that are proportional to the back-propagated gradients. In order to study the layer-wise neural dynamics, we formulate our approach from the perspective of ordinary differential equations (ODEs) and build up its extended relationship with conventional adversarial training methods, which tightens the relationship between neural networks and ODEs. In the implementation, we propose two different training algorithms by discretizing the ODE model with the Lie-Trotter and the Strang-Marchuk splitting schemes from the operator-splitting theory. Experiments on CIFAR-10 and CIFAR-100 benchmarks show that our methods consistently improve adversarial model robustness on top of widely-used strong adversarial training techniques. ", "keywords": "adversarial training;robustness;ODE", "primary_area": "", "supplementary_material": "", "author": "Zonghan Yang;Yang Liu;Chenglong Bao;Zuoqiang Shi", "authorids": "~Zonghan_Yang1;liuyang2011@tsinghua.edu.cn;~Chenglong_Bao3;~Zuoqiang_Shi1", "gender": "M;;M;M", "homepage": "https://minicheshire.github.io/;;https://matbc.github.io/;https://shizqi.github.io/", "dblp": "222/7860;;;18/1960", "google_scholar": "rt9HOIUAAAAJ;;;", "orcid": ";;;0000-0002-9122-0302", "linkedin": ";;;", "or_profile": "~Zonghan_Yang1;liuyang2011@tsinghua.edu.cn;~Chenglong_Bao3;~Zuoqiang_Shi1", "aff": "Department of Computer Science and Technology, Tsinghua University;;Tsinghua University;Tsinghua University", "aff_domain": "cs.tsinghua.edu.cn;;tsinghua.edu.cn;tsinghua.edu.cn", "position": "PhD student;;Assistant Professor;Associate Professor", "bibtex": "@misc{\nyang2021layerwise,\ntitle={Layer-wise Adversarial Defense: An {\\{}ODE{\\}} Perspective},\nauthor={Zonghan Yang and Yang Liu and Chenglong Bao and Zuoqiang Shi},\nyear={2021},\nurl={https://openreview.net/forum?id=Ef1nNHQHZ20}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=Ef1nNHQHZ20", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;4;4;3", "wc_review": "525;418;401;413", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 439.25, 49.89175783634006 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:q3ECf1jPswoJ:scholar.google.com/&scioq=Layer-wise+Adversarial+Defense:+An+ODE+Perspective&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Tsinghua University", "aff_unique_dep": "Department of Computer Science and Technology", "aff_unique_url": "https://www.tsinghua.edu.cn", "aff_unique_abbr": "THU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "Efiwpsy0ZE_", "title": "Contextual Graph Reasoning Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Graph Reasoning has shown great potential recently in modeling long-range dependencies, which are crucial for various computer vision tasks. However, the graph representation learned by existing methods is not effective enough as the relation between feature and graph is under-explored. In this work, we propose a novel method named Contextual Graph Reasoning (CGR) that learns a context-aware relation between feature and graph. This is achieved by constructing the projection matrix based on a global set of descriptors during graph projection, and calibrating the evolved graph based on the self-attention of all nodes during graph reprojection. Therefore, contextual information is well explored in both graph projection and reprojection with our method. To verify the effectiveness of our method, we conduct extensive experiments on semantic segmentation, instance segmentation, and 2D human pose estimation. Our method consistently achieves remarkable improvements over state-of-the-art methods, demonstrating the effectiveness and generalization ability of our method.", "keywords": "graph reasoning;context-aware representation;long-range dependencies;semantic segmentation", "primary_area": "", "supplementary_material": "", "author": "Zhaoqing Wang;Jiaming Liu;Yangyuxuan Kang;Mingming Gong;Chuang Zhang;Ming Lu;Ming Wu", "authorids": "~Zhaoqing_Wang1;~Jiaming_Liu2;~Yangyuxuan_Kang1;~Mingming_Gong1;~Chuang_Zhang1;~Ming_Lu2;~Ming_Wu2", "gender": "M;M;M;M;M;;", "homepage": "https://derrickwang005.github.io/;https://github.com/liujiaming1996;;https://mingming-gong.github.io/;;;", "dblp": ";;282/6002.html;98/8479;;;", "google_scholar": "ZqOjPKQAAAAJ;cPki5sUAAAAJ;https://scholar.google.com/citations?hl=zh-CN;https://scholar.google.com.au/citations?user=6BmiCJIAAAAJ;;;", "orcid": ";0000-0002-6770-4390;0009-0009-4597-0100;0000-0001-7147-5589;0000-0002-1115-5580;;", "linkedin": "%E5%85%86%E5%8D%BF-%E7%8E%8B-ba58221b7/;;;;;;", "or_profile": "~Zhaoqing_Wang1;~Jiaming_Liu2;~Yangyuxuan_Kang1;~Mingming_Gong1;~Chuang_Zhang1;~Ming_Lu2;~Ming_Wu2", "aff": "The University of Sydney;Beijing University of Post and Telecommunication;Chinese Academy of Sciences, Chinese Academy of Sciences;University of Melbourne;Beijing University of Posts and Telecommunications;;", "aff_domain": "uni.sydney.edu.au;bupt.edu.cn;ios.ac.cn;unimelb.edu.au;bupt.edu.cn;;", "position": "MS student;MS student;PhD student;Assistant Professor;Full Professor;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=Efiwpsy0ZE_", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;4;4", "wc_review": "522;1016;234", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 590.6666666666666, 322.9213870622728 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1;2;3;1", "aff_unique_norm": "University of Sydney;Beijing University of Posts and Telecommunications;Chinese Academy of Sciences;University of Melbourne", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.sydney.edu.au;http://www.bupt.edu.cn/;http://www.cas.cn;https://www.unimelb.edu.au", "aff_unique_abbr": "USYD;BUPT;CAS;UniMelb", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Beijing", "aff_country_unique_index": "0;1;1;0;1", "aff_country_unique": "Australia;China" }, { "id": "Ek7qrYhJMbn", "title": "Central Server Free Federated Learning over Single-sided Trust Social Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Federated learning has become increasingly important for modern machine learning, especially for data privacy-sensitive scenarios. Existing federated learning mostly adopts the central server-based architecture or centralized architecture. However, in many social network scenarios, centralized federated learning is not applicable (e.g., a central agent or server connecting all users may not exist, or the communication cost to the central server is not affordable). In this paper, we consider a generic setting: 1) the central server may not exist, and 2) the social network is unidirectional or of single-sided trust (i.e., user A trusts user B but user B may not trust user A). We propose a central server free federated learning algorithm, named Online Push-Sum (OPS) method, to handle this challenging but generic scenario. A rigorous regret analysis is also provided, which shows interesting results on how users can benefit from communication with trusted users in the federated learning scenario. This work builds upon the fundamental algorithm framework and theoretical guarantees for federated learning in the generic social network scenario.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/6b5a2e7d2cadea09b2f2b20ff0e4084c82536d3b.zip", "author": "Chaoyang He;Conghui Tan;Hanlin Tang;Shuang Qiu;Ji Liu", "authorids": "~Chaoyang_He1;~Conghui_Tan1;htang14@ur.rochester.edu;~Shuang_Qiu2;~Ji_Liu1", "gender": "M;M;;M;M", "homepage": "http://chaoyanghe.com;;;https://shq-ml.github.io/;http://jiliu-ml.org", "dblp": "222/6721-1.html;180/5927;;;51/4433-2.html", "google_scholar": "2z2camUAAAAJ;https://scholar.google.com.hk/citations?user=LS3sGwcAAAAJ;;-Z7fY00AAAAJ;RRzVwKkAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Chaoyang_He1;~Conghui_Tan1;htang14@ur.rochester.edu;~Shuang_Qiu2;~Ji_Liu1", "aff": "University of Southern California;WeBank Co., Ltd.;;;", "aff_domain": "usc.edu;webank.com;;;", "position": "PhD student;Researcher;;;", "bibtex": "@misc{\nhe2021central,\ntitle={Central Server Free Federated Learning over Single-sided Trust Social Networks},\nauthor={Chaoyang He and Conghui Tan and Hanlin Tang and Shuang Qiu and Ji Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=Ek7qrYhJMbn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=Ek7qrYhJMbn", "pdf_size": 0, "rating": "4;4;5;8", "confidence": "3;4;3;5", "wc_review": "335;279;372;336", "wc_reply_reviewers": "59;0;0;0", "wc_reply_authors": "542;593;283;221", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 1.6393596310755 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 330.5, 33.26033673912518 ], "wc_reply_reviewers_avg": [ 14.75, 25.54774941164094 ], "wc_reply_authors_avg": [ 409.75, 160.28314789771255 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.7816608327818948, "gs_citation": 94, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5086361062867952357&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of Southern California;WeBank", "aff_unique_dep": ";", "aff_unique_url": "https://www.usc.edu;https://www.webank.com", "aff_unique_abbr": "USC;WeBank", "aff_campus_unique_index": "0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;China" }, { "id": "Enb37i4iMry", "title": "Unsupervised inference for optimizing deep feedforward neural network architecture", "track": "main", "status": "Desk Reject", "tldr": "", "abstract": "We propose an unsupervised inference algorithm to find optimal deep feedforward neural network architecture by modeling hierarchical representations of given data set. Our algorithm learns the optimal neural network architecture that represents a system that generates observed data, in a forward manner without backpropagation. We hypothesize that a neural network architecture, which models the hierarchical representations of given data, provides the optimal feedforward neural network architecture with competitive predictive performance in a supervised manner when compared with supervised-based neural network architecture optimization models. To prove the hypothesis, we evaluated the predictive performance with the feedforward neural network architectures with intensive experiments using various well-known benchmark data sets, such as SPECT, FMNIST, CIFAR-10, heart disease, German Credit, and Statlog heart data. Our algorithm requires much smaller search space for feasible computation than conventional supervised-based neural network architecture optimization methods. The proposed unsupervised inference approach can fully enjoy optimizing neural network architecture from small size of labeled data, when a huge unlabeled data is available. ", "keywords": "Neural network architecture optimization;unsupervised inference;deep feedforward neural network", "primary_area": "", "supplementary_material": "", "author": "Anonymous", "authorids": "ICLR.cc/2021/Conference/Paper2914/Authors", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nanonymous2021unsupervised,\ntitle={Unsupervised inference for optimizing deep feedforward neural network architecture},\nauthor={Anonymous},\nbooktitle={Submitted to International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Enb37i4iMry},\nnote={under review}\n}", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=Enb37i4iMry", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1 }, { "title": "Neural gradients are near-lognormal: improved quantized and sparse training", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2716", "id": "EoFNy62JGd", "poster": "", "openreview": "https://openreview.net/forum?id=EoFNy62JGd", "slides": "https://iclr.cc/virtual/2021/poster/2716", "video": "https://iclr.cc/virtual/2021/poster/2716", "author_site": "Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry", "tldr": "", "abstract": "While training can mostly be accelerated by reducing the time needed to propagate neural gradients (loss gradients with respect to the intermediate neural layer outputs) back throughout the model, most previous works focus on the quantization/pruning of weights and activations. These methods are often not applicable to neural gradients, which have very different statistical properties. Distinguished from weights and activations, we find that the distribution of neural gradients is approximately lognormal. Considering this, we suggest two closed-form analytical methods to reduce the computational and memory burdens of neural gradients. The first method optimizes the floating-point format and scale of the gradients. The second method accurately sets sparsity thresholds for gradient pruning. Each method achieves state-of-the-art results on ImageNet. To the best of our knowledge, this paper is the first to (1) quantize the gradients to 6-bit floating-point formats, or (2) achieve up to 85% gradient sparsity --- in each case without accuracy degradation.\nReference implementation accompanies the paper in the supplementary material.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/8d06cc6f332ae81f2d7094e8e2bc9e6727a7f477.zip", "author": "Brian Chmiel;Liad Ben-Uri;Moran Shkolnik;Elad Hoffer;Ron Banner;Daniel Soudry", "authorids": "~Brian_Chmiel1;liadgo2@gmail.com;~Moran_Shkolnik1;~Elad_Hoffer1;~Ron_Banner1;~Daniel_Soudry1", "gender": "M;;F;M;M;M", "homepage": ";;;http://www.deeplearning.co.il;;https://soudry.github.io/", "dblp": "239/6051;;249/2235;156/0135;03/5857;126/1779", "google_scholar": "https://scholar.google.co.il/citations?user=2U8VtKsAAAAJ;;8x9rOboAAAAJ;https://scholar.google.co.il/citations?user=iEfTH7AAAAAJ;;https://scholar.google.co.il/citations?user=AEBWEm8AAAAJ", "orcid": ";;;;;0000-0001-9368-6352", "linkedin": "brian-chmiel-89653893/;;moran-shkolnik-b8b76132/;;https://il.linkedin.com/in/ron-banner-69403a51;daniel-soudry-2aa3a88/", "or_profile": "~Brian_Chmiel1;liadgo2@gmail.com;~Moran_Shkolnik1;~Elad_Hoffer1;~Ron_Banner1;~Daniel_Soudry1", "aff": "Technion - Israel Institute of Technology, Technion;;;Habana Labs (Intel);Intel;Technion - Israel Institute of Technology", "aff_domain": "campus.technion.ac.il;;;habana.ai;intel.com;technion.ac.il", "position": "PhD student;;;Researcher;Researcher;Assistant Professor", "bibtex": "@inproceedings{\nchmiel2021neural,\ntitle={Neural gradients are near-lognormal: improved quantized and sparse training},\nauthor={Brian Chmiel and Liad Ben-Uri and Moran Shkolnik and Elad Hoffer and Ron Banner and Daniel Soudry},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=EoFNy62JGd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "4;4;3;4", "wc_review": "257;502;427;184", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "197;442;512;344", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 342.5, 127.4882347512899 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 373.75, 118.21246761657588 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 58, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9381914834828756618&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=EoFNy62JGd", "email": "campus.technion.ac.il;;;habana.ai;intel.com;technion.ac.il", "author_num": 6, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "Technion - Israel Institute of Technology;Habana Labs;Intel", "aff_unique_dep": ";;Intel Corporation", "aff_unique_url": "https://www.technion.ac.il;https://www.habana.ai;https://www.intel.com", "aff_unique_abbr": "Technion;Habana Labs;Intel", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "Israel;United States" }, { "id": "EoVmlONgI9e", "title": "The Emergence of Individuality in Multi-Agent Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Individuality is essential in human society, which induces the division of labor and thus improves the efficiency and productivity. Similarly, it should also be a key to multi-agent cooperation. Inspired by that individuality is of being an individual separate from others, we propose a simple yet efficient method for the emergence of individuality (EOI) in multi-agent reinforcement learning (MARL). EOI learns a probabilistic classifier that predicts a probability distribution over agents given their observation and gives each agent an intrinsic reward of being correctly predicted by the classifier. The intrinsic reward encourages the agents to visit their own familiar observations, and learning the classifier by such observations makes the intrinsic reward signals stronger and in turn makes the agents more identifiable. To further enhance the intrinsic reward and promote the emergence of individuality, two regularizers are proposed to increase the discriminability of the classifier. We implement EOI on top of popular MARL algorithms. Empirically, we show that EOI outperforms existing methods in a variety of multi-agent cooperative scenarios.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Jiechuan Jiang;Zongqing Lu", "authorids": "~Jiechuan_Jiang1;~Zongqing_Lu2", "gender": ";", "homepage": ";", "dblp": "220/4026;", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Jiechuan_Jiang1;~Zongqing_Lu2", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@misc{\njiang2021the,\ntitle={The Emergence of Individuality in Multi-Agent Reinforcement Learning},\nauthor={Jiechuan Jiang and Zongqing Lu},\nyear={2021},\nurl={https://openreview.net/forum?id=EoVmlONgI9e}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=EoVmlONgI9e", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;4;5;5", "wc_review": "570;530;776;361", "wc_reply_reviewers": "0;0;333;0", "wc_reply_authors": "228;298;335;248", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 559.25, 147.69457505270802 ], "wc_reply_reviewers_avg": [ 83.25, 144.19322973010904 ], "wc_reply_authors_avg": [ 277.25, 41.97246120970273 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.9045340337332909, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10249542640142595535&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0 }, { "id": "EohGx2HgNsA", "title": "NASLib: A Modular and Flexible Neural Architecture Search Library", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Neural Architecture Search (NAS) is one of the focal points for the Deep Learning community, but reproducing NAS methods is extremely challenging due to numerous low-level implementation details. To alleviate this problem we introduce NASLib, a NAS library built upon PyTorch. This framework offers high-level abstractions for designing and reusing search spaces, interfaces to benchmarks and evaluation pipelines, enabling the implementation and extension of state-of-the-art NAS methods with a few lines of code. The modularized nature of NASlib allows researchers to easily innovate on individual components (e.g., define a new search space while reusing an optimizer and evaluation pipeline, or propose a new optimizer with existing search spaces). As a result, NASLib has the potential to facilitate NAS research by allowing fast advances and evaluations that are by design free of confounding factors. To demonstrate that NASLib is a sound library, we implement and achieve state-of-the-art results with one-shot NAS optimizers (DARTS and GDAS) over the DARTS search space and the popular NAS-Bench-201 benchmark. Last but not least, we showcase how easily novel approaches are coded in NASLib, by training DARTS on a hierarchical search space.", "keywords": "Neural Architecture Search;Automated Machine Learning;Deep Learning;Open-Source;Software;Python;PyTorch", "primary_area": "", "supplementary_material": "/attachment/57027c744ae5badec3c9c7fecc07af9246c85db5.zip", "author": "Michael Ruchte;Arber Zela;Julien Niklas Siems;Josif Grabocka;Frank Hutter", "authorids": "~Michael_Ruchte1;~Arber_Zela1;~Julien_Niklas_Siems1;~Josif_Grabocka1;~Frank_Hutter1", "gender": ";M;M;M;M", "homepage": "https://relea.informatik.uni-freiburg.de/people/michael-ruchte;https://ml.informatik.uni-freiburg.de/people/zela/index.html;https://juliensiems.github.io;https://www.utn.de/departments/department-engineering/machine-learning-lab/;http://ml.informatik.uni-freiburg.de/~hutter/", "dblp": ";;257/3075;117/4936;89/5383", "google_scholar": ";hD_6YioAAAAJ;https://scholar.google.de/citations?user=rKgTTh8AAAAJ;KRy27XcAAAAJ;https://scholar.google.de/citations?user=YUrxwrkAAAAJ", "orcid": ";;;;0000-0002-2037-3694", "linkedin": ";https://de.linkedin.com/in/arber-zela-ba85a2145;julien-niklas-siems/;;frank-hutter-9190b24b/", "or_profile": "~Michael_Ruchte1;~Arber_Zela1;~Julien_Niklas_Siems1;~Josif_Grabocka1;~Frank_Hutter1", "aff": "Universit\u00e4t Freiburg;University of Freiburg;Department of Informatics, University of Zurich, University of Zurich;Universit\u00e4t Freiburg;Albert-Ludwigs-Universit\u00e4t Freiburg", "aff_domain": "uni-freiburg.de;uni-freiburg.de;ifi.uzh.ch;uni-freiburg.de;uni-freiburg.de", "position": "PhD student;PhD student;Researcher;Assistant Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=EohGx2HgNsA", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "4;4;5;3", "wc_review": "367;196;376;427", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "586;292;413;463", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 341.5, 87.06463116558871 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 438.5, 105.43837062473983 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 25, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16643295904635158894&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;1;0;2", "aff_unique_norm": "University of Freiburg;University of Zurich;Albert-Ludwigs-Universit\u00e4t Freiburg", "aff_unique_dep": ";Department of Informatics;", "aff_unique_url": "https://www.uni-freiburg.de;https://www.uzh.ch;https://www.uni-freiburg.de", "aff_unique_abbr": "Uni Freiburg;UZH;Albert-Ludwigs-Universit\u00e4t", "aff_campus_unique_index": "1", "aff_campus_unique": ";Freiburg", "aff_country_unique_index": "0;0;1;0;0", "aff_country_unique": "Germany;Switzerland" }, { "title": "Robust early-learning: Hindering the memorization of noisy labels", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3129", "id": "Eql5b1_hTE4", "poster": "", "openreview": "https://openreview.net/forum?id=Eql5b1_hTE4", "slides": "https://iclr.cc/virtual/2021/poster/3129", "video": "https://iclr.cc/virtual/2021/poster/3129", "author_site": "Xiaobo Xia, Tongliang Liu, Bo Han, Chen Gong, Nannan Wang, Zongyuan Ge, Yi Chang", "tldr": "", "abstract": "The \\textit{memorization effects} of deep networks show that they will first memorize training data with clean labels and then those with noisy labels. The \\textit{early stopping} method therefore can be exploited for learning with noisy labels. However, the side effect brought by noisy labels will influence the memorization of clean labels before early stopping. In this paper, motivated by the \\textit{lottery ticket hypothesis} which shows that only partial parameters are important for generalization, we find that only partial parameters are important for fitting clean labels and generalize well, which we term as \\textit{critical parameters}; while the other parameters tend to fit noisy labels and cannot generalize well, which we term as \\textit{non-critical parameters}. Based on this, we propose \\textit{robust early-learning} to reduce the side effect of noisy labels before early stopping and thus enhance the memorization of clean labels. Specifically, in each iteration, we divide all parameters into the critical and non-critical ones, and then perform different update rules for different types of parameters. Extensive experiments on benchmark-simulated and real-world label-noise datasets demonstrate the superiority of the proposed method over the state-of-the-art label-noise learning methods.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Xiaobo Xia;Tongliang Liu;Bo Han;Chen Gong;Nannan Wang;Zongyuan Ge;Yi Chang", "authorids": "~Xiaobo_Xia1;~Tongliang_Liu1;~Bo_Han1;~Chen_Gong5;~Nannan_Wang1;~Zongyuan_Ge1;yichang@jlu.edu.cn", "gender": "M;M;;M;M;M;", "homepage": "https://xiaoboxia.github.io/;https://tongliang-liu.github.io/;;http://www.escience.cn/people/chengong/index.html;;https://research.monash.edu/en/persons/zongyuan-ge;", "dblp": "242/8072;150/6667;;21/8587-2;10/8359-1;147/2757;", "google_scholar": "jRsugY0AAAAJ;https://scholar.google.com.au/citations?user=EiLdZ_YAAAAJ;;https://scholar.google.com.hk/citations?user=guttoBwAAAAJ;SRBn7oUAAAAJ;https://scholar.google.com.au/citations?user=Q0gUrcIAAAAJ;", "orcid": ";;;;;0000-0002-5880-8673;", "linkedin": ";;;;;;", "or_profile": "~Xiaobo_Xia1;~Tongliang_Liu1;~Bo_Han1;~Chen_Gong5;~Nannan_Wang1;~Zongyuan_Ge1;yichang@jlu.edu.cn", "aff": "The University of Sydney;University of Sydney;;Nanjing University of Science and Technology;Xidian University;Monash University;", "aff_domain": "sydney.edu.au;sydney.edu.au;;njust.edu.cn;xidian.edu.cn;monash.edu;", "position": "PhD student;Lecturer;;Full Professor;Full Professor;Assistant Professor;", "bibtex": "@inproceedings{\nxia2021robust,\ntitle={Robust early-learning: Hindering the memorization of noisy labels},\nauthor={Xiaobo Xia and Tongliang Liu and Bo Han and Chen Gong and Nannan Wang and Zongyuan Ge and Yi Chang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Eql5b1_hTE4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;5;4", "wc_review": "164;530;371;428", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "686;170;230;120", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 373.25, 133.56529302180263 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 301.5, 225.38134350473644 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 354, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2902897074318028759&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=Eql5b1_hTE4", "email": "sydney.edu.au;sydney.edu.au;;njust.edu.cn;xidian.edu.cn;monash.edu;", "author_num": 7, "aff_unique_index": "0;0;1;2;3", "aff_unique_norm": "University of Sydney;Nanjing University of Science and Technology;Xidian University;Monash University", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.sydney.edu.au;http://www.nust.edu.cn/;http://www.xidian.edu.cn/;https://www.monash.edu", "aff_unique_abbr": "USYD;NUST;Xidian;Monash", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;1;0", "aff_country_unique": "Australia;China" }, { "title": "Support-set bottlenecks for video-text representation learning", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2999", "id": "EqoXe2zmhrh", "poster": "", "openreview": "https://openreview.net/forum?id=EqoXe2zmhrh", "slides": "https://iclr.cc/virtual/2021/poster/2999", "video": "https://iclr.cc/virtual/2021/poster/2999", "author_site": "Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander G Hauptmann, Joao F. Henriques, Andrea Vedaldi", "tldr": "", "abstract": "The dominant paradigm for learning video-text representations \u2013 noise contrastive learning \u2013 increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes away the representations of all other pairs. We posit that this last behaviour is too strict, enforcing dissimilar representations even for samples that are semantically-related \u2013 for example, visually similar videos or ones that share the same depicted action. In this paper, we propose a novel method that alleviates this by leveraging a generative model to naturally push these related samples together: each sample\u2019s caption must be reconstructed as a weighted combination of a support set of visual representations. This simple idea ensures that representations are not overly-specialized to individual samples, are reusable across the dataset, and results in representations that explicitly encode semantics shared between samples, unlike noise contrastive learning. Our proposed method outperforms others by a large margin on MSR-VTT, VATEX, ActivityNet, and MSVD for video-to-text and text-to-video retrieval.", "keywords": "video representation learning;multi-modal learning;video-text learning;contrastive learning", "primary_area": "", "supplementary_material": "", "author": "Mandela Patrick;Po-Yao Huang;Yuki Asano;Florian Metze;Alexander G Hauptmann;Joao F. Henriques;Andrea Vedaldi", "authorids": "~Mandela_Patrick1;~Po-Yao_Huang1;~Yuki_Asano1;~Florian_Metze1;~Alexander_G_Hauptmann1;~Joao_F._Henriques1;~Andrea_Vedaldi1", "gender": "M;M;M;M;M;M;M", "homepage": ";https://berniebear.github.io/;https://yukimasano.github.io/;http://www.cs.cmu.edu/~fmetze;;http://www.robots.ox.ac.uk/~joao/;https://www.robots.ox.ac.uk/~vedaldi/", "dblp": ";154/3943-1;239/8823;26/1652.html;h/AlexanderGHauptmann;31/8617.html;99/2825", "google_scholar": "https://scholar.google.com/citations?hl=en;E8K25LIAAAAJ;CdpLhlgAAAAJ;pSqVgOkAAAAJ;https://scholar.google.co.uk/citations?user=Py54GcEAAAAJ;aCQjyp0AAAAJ;bRT7t28AAAAJ", "orcid": ";;;0000-0002-6663-8600;;;0000-0003-1374-2858", "linkedin": ";;;florianmetze/;;;", "or_profile": "~Mandela_Patrick1;~Po-Yao_Huang1;~Yuki_Asano1;~Florian_Metze1;~Alexander_G_Hauptmann1;~Joao_F._Henriques1;~Andrea_Vedaldi1", "aff": "University of Oxford;Carnegie Mellon University;University of Oxford;Carnegie Mellon University;School of Computer Science, Carnegie Mellon University;University of Oxford;Meta", "aff_domain": "ox.ac.uk;cmu.edu;ox.ac.uk;cmu.edu;cs.cmu.edu;ox.ac.uk;meta.com", "position": "PhD student;PhD student;PhD student;Associate Professor;Full Professor;Principal Researcher;Researcher", "bibtex": "@inproceedings{\npatrick2021supportset,\ntitle={Support-set bottlenecks for video-text representation learning},\nauthor={Mandela Patrick and Po-Yao Huang and Yuki Asano and Florian Metze and Alexander G Hauptmann and Joao F. Henriques and Andrea Vedaldi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=EqoXe2zmhrh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;9", "confidence": "4;5;4;3", "wc_review": "458;807;525;214", "wc_reply_reviewers": "0;0;149;0", "wc_reply_authors": "758;811;885;187", "reply_reviewers": "0;0;2;0", "reply_authors": "1;1;2;1", "rating_avg": [ 7.25, 1.0897247358851685 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 501.0, 211.20487683763366 ], "wc_reply_reviewers_avg": [ 37.25, 64.51889258194068 ], "wc_reply_authors_avg": [ 660.25, 276.9290297170017 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.6488856845230502, "gs_citation": 302, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12944320330800868480&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=EqoXe2zmhrh", "email": "ox.ac.uk;cmu.edu;ox.ac.uk;cmu.edu;cs.cmu.edu;ox.ac.uk;meta.com", "author_num": 7, "aff_unique_index": "0;1;0;1;1;0;2", "aff_unique_norm": "University of Oxford;Carnegie Mellon University;Meta", "aff_unique_dep": ";;Meta Platforms, Inc.", "aff_unique_url": "https://www.ox.ac.uk;https://www.cmu.edu;https://meta.com", "aff_unique_abbr": "Oxford;CMU;Meta", "aff_campus_unique_index": "1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;1;0;1;1;0;1", "aff_country_unique": "United Kingdom;United States" }, { "id": "ErrNJYcVRmS", "title": "F^2ed-Learning: Good Fences Make Good Neighbors", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we present F^2ed-Learning, the first federated learning protocol simultaneously defending against both semi-honest server and Byzantine malicious clients. Using a robust mean estimator called FilterL2, F^2ed-Learning is the first FL protocol with dimension-free estimation error against Byzantine malicious clients. Besides, F^2ed-Learning leverages secure aggregation to protect the clients from a semi-honest server who wants to infer the clients' information from the legitimate updates. The main challenge stems from the incompatibility between FilterL2 and secure aggregation. Specifically, to run FilterL2, the server needs to access individual updates from clients while secure aggregation hides those updates from it. We propose to split the clients into shards, securely aggregate each shard's updates and run FilterL2 on the updates from different shards. The evaluation shows that F^2ed-Learning consistently achieves optimal or sub-optimal performance under three attacks among five robust FL protocols. The code for evaluation is available in the supplementary material.", "keywords": "Byzantine-Robust Federated Learning;Secure Aggregation", "primary_area": "", "supplementary_material": "/attachment/73e061de1da3c705908a56b162d445d772948d5e.zip", "author": "Lun Wang;Qi Pang;Shuai Wang;Dawn Song", "authorids": "~Lun_Wang1;~Qi_Pang1;~Shuai_Wang7;~Dawn_Song1", "gender": ";;M;F", "homepage": "https://wanglun1996.github.io/;;https://home.cse.ust.hk/~shuaiw/;", "dblp": ";;42/1503-11;s/DXSong", "google_scholar": ";;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Lun_Wang1;~Qi_Pang1;~Shuai_Wang7;~Dawn_Song1", "aff": "University of California, Berkeley;;;University of California, Berkeley", "aff_domain": "berkeley.edu;;;berkeley.edu", "position": "PhD student;;;Full Professor", "bibtex": "@misc{\nwang2021fedlearning,\ntitle={F{\\textasciicircum}2ed-Learning: Good Fences Make Good Neighbors},\nauthor={Lun Wang and Qi Pang and Shuai Wang and Dawn Song},\nyear={2021},\nurl={https://openreview.net/forum?id=ErrNJYcVRmS}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=ErrNJYcVRmS", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "2;4;3;3", "wc_review": "259;938;252;739", "wc_reply_reviewers": "102;182;0;249", "wc_reply_authors": "985;1613;165;855", "reply_reviewers": "2;1;0;2", "reply_authors": "3;3;1;3", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 547.0, 299.8808096561032 ], "wc_reply_reviewers_avg": [ 133.25, 92.87996285528973 ], "wc_reply_authors_avg": [ 904.5, 514.238028543203 ], "reply_reviewers_avg": [ 1.25, 0.82915619758885 ], "reply_authors_avg": [ 2.5, 0.8660254037844386 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Zr6-Uq5L82EJ:scholar.google.com/&scioq=F%5E2ed-Learning:+Good+Fences+Make+Good+Neighbors&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of California, Berkeley", "aff_unique_dep": "", "aff_unique_url": "https://www.berkeley.edu", "aff_unique_abbr": "UC Berkeley", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Berkeley", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "EsA9Nr9JHvy", "title": "The Heavy-Tail Phenomenon in SGD", "track": "main", "status": "Reject", "tldr": "", "abstract": "In recent years, various notions of capacity and complexity have been proposed for characterizing the generalization properties of stochastic gradient descent (SGD) in deep learning. Some of the popular notions that correlate well with the performance on unseen data are (i) the 'flatness' of the local minimum found by SGD, which is related to the eigenvalues of the Hessian, (ii) the ratio of the stepsize $\\eta$ to the batch size $b$, which essentially controls the magnitude of the stochastic gradient noise, and (iii) the 'tail-index', which measures the heaviness of the tails of the network weights at convergence. In this paper, we argue that these three seemingly unrelated perspectives for generalization are deeply linked to each other. We claim that depending on the structure of the Hessian of the loss at the minimum, and the choices of the algorithm parameters $\\eta$ and $b$, the SGD iterates will converge to a \\emph{heavy-tailed} stationary distribution. We rigorously prove this claim in the setting of quadratic optimization: we show that even in a simple linear regression problem with independent and identically distributed Gaussian data, the iterates can be heavy-tailed with infinite variance. We further characterize the behavior of the tails with respect to algorithm parameters, the dimension, and the curvature. We then translate our results into insights about the behavior of SGD in deep learning. We finally support our theory with experiments conducted on both synthetic data and fully connected neural networks.", "keywords": "heavy tails;stochastic gradient descent;deep learning", "primary_area": "", "supplementary_material": "/attachment/9b09ce47011738b3e4e560425cf6635ca81e8e66.zip", "author": "Mert Gurbuzbalaban;Umut Simsekli;Lingjiong Zhu", "authorids": "~Mert_Gurbuzbalaban1;~Umut_Simsekli1;~Lingjiong_Zhu1", "gender": ";M;M", "homepage": ";https://www.di.ens.fr/~simsekli/;", "dblp": "09/9185;https://dblp.org/pers/s/Simsekli:Umut.html;178/6958", "google_scholar": ";https://scholar.google.fr/citations?user=CuArAkgAAAAJ;Z9JkFaoAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Mert_Gurbuzbalaban1;~Umut_Simsekli1;~Lingjiong_Zhu1", "aff": "Rutgers University;INRIA;Florida State University", "aff_domain": "rutgers.edu;inria.fr;fsu.edu", "position": "Assistant Professor;Research Faculty;Associate Professor", "bibtex": "@misc{\ngurbuzbalaban2021the,\ntitle={The Heavy-Tail Phenomenon in {\\{}SGD{\\}}},\nauthor={Mert Gurbuzbalaban and Umut Simsekli and Lingjiong Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=EsA9Nr9JHvy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=EsA9Nr9JHvy", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;3;3;4", "wc_review": "264;345;196;312", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "102;515;40;48", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 279.25, 56.03291443428586 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 176.25, 197.02585490234523 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 164, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11485380306468946114&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;2", "aff_unique_norm": "Rutgers University;INRIA;Florida State University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.rutgers.edu;https://www.inria.fr;https://www.fsu.edu", "aff_unique_abbr": "Rutgers;INRIA;FSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;France" }, { "id": "Ew0zR07CYRd", "title": "Bounded Myopic Adversaries for Deep Reinforcement Learning Agents", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adversarial attacks against deep neural networks have been widely studied. Adversarial examples for deep reinforcement learning (DeepRL) have significant security implications, due to the deployment of these algorithms in many application domains. In this work we formalize an optimal myopic adversary for deep reinforcement learning agents. Our adversary attempts to find a bounded perturbation of the state which minimizes the value of the action taken by the agent. We show with experiments in various games in the Atari environment that our attack formulation achieves significantly larger impact as compared to the current state-of-the-art. Furthermore, this enables us to lower the bounds by several orders of magnitude on the perturbation needed to efficiently achieve significant impacts on DeepRL agents.", "keywords": "deep reinforcement learning;adversarial", "primary_area": "", "supplementary_material": "", "author": "Ezgi Korkmaz;Henrik Sandberg;Gyorgy Dan", "authorids": "~Ezgi_Korkmaz1;hsan@kth.se;gyuri@kth.se", "gender": "Unspecified;;", "homepage": "https://daylightframework.github.io/;;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Ezgi_Korkmaz1;hsan@kth.se;gyuri@kth.se", "aff": "-;;", "aff_domain": "metu.edu;;", "position": "-;;", "bibtex": "@misc{\nkorkmaz2021bounded,\ntitle={Bounded Myopic Adversaries for Deep Reinforcement Learning Agents},\nauthor={Ezgi Korkmaz and Henrik Sandberg and Gyorgy Dan},\nyear={2021},\nurl={https://openreview.net/forum?id=Ew0zR07CYRd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=Ew0zR07CYRd", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;4;3;4", "wc_review": "417;1615;236;228", "wc_reply_reviewers": "81;603;0;182", "wc_reply_authors": "353;1436;351;1327", "reply_reviewers": "1;2;0;1", "reply_authors": "2;3;1;2", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 624.0, 577.1243366901105 ], "wc_reply_reviewers_avg": [ 216.5, 232.27408378895825 ], "wc_reply_authors_avg": [ 866.75, 516.191037795117 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:PQRPSzXy9FkJ:scholar.google.com/&scioq=Bounded+Myopic+Adversaries+for+Deep+Reinforcement+Learning+Agents&hl=en&as_sdt=0,33", "gs_version_total": 0 }, { "id": "EwsLcX5NRKr", "title": "Deep Active Learning for Object Detection with Mixture Density Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Active learning aims to reduce the labeling costs by selecting only samples that are informative to improve the accuracy of the network. Few existing works have addressed the problem of active learning for object detection, and most of them estimate the informativeness of an image based only on the classification head, neglecting the influence of the localization head. In this paper, we propose a novel deep active learning approach for object detection. Our approach relies on mixture density networks to provide a mixture distribution for every output parameter. Based on these distributions, our approach is able to compute, separately and in a single forward pass of a single model, the epistemic and aleatoric uncertainty. We further propose a more efficient approach to reduce the computational cost of the mixture model. For active learning, we propose a scoring function that aggregates uncertainties from both the classification and the localization outputs of the network. Our extensive set of experiments on PASCAL VOC and COCO demonstrates that our modification to the object detection network yields better accuracy compared to the original one and, for active learning, our approach outperforms single-model based methods and performs on par when compared to methods using multiple models while requiring significantly lower computational cost. In addition, we show that our approach scales to different object detection networks and datasets acquired actively using our approach to transfer to different networks.", "keywords": "Active Learning;Object Detection", "primary_area": "", "supplementary_material": "", "author": "Jiwoong Choi;Ismail Elezi;Hyuk-Jae Lee;Clement Farabet;Jose M. Alvarez", "authorids": "~Jiwoong_Choi1;~Ismail_Elezi1;~Hyuk-Jae_Lee1;~Clement_Farabet1;~Jose_M._Alvarez2", "gender": "M;M;M;;", "homepage": ";https://therevanchist.github.io/;http://capp.snu.ac.kr/;;", "dblp": "79/11295;186/8256;;;", "google_scholar": "kiNYl_MAAAAJ;tpaCLrsAAAAJ;;;", "orcid": ";;;;", "linkedin": ";ismail-elezi-33958b32/?originalSubdomain=uk;;;", "or_profile": "~Jiwoong_Choi1;~Ismail_Elezi1;~Hyuk-Jae_Lee1;~Clement_Farabet1;~Jose_M._Alvarez2", "aff": "Seoul National University;Technische Universit\u00e4t M\u00fcnchen;Seoul National University;;", "aff_domain": "snu.ac.kr;tum.de;snu.ac.kr;;", "position": "PhD student;Postdoc;Full Professor;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=EwsLcX5NRKr", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "4;3;5;2", "wc_review": "426;227;211;66", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 232.5, 128.1181095708175 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5129891760425771, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7674769250508015014&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Seoul National University;Technische Universit\u00e4t M\u00fcnchen", "aff_unique_dep": ";", "aff_unique_url": "https://www.snu.ac.kr;https://www.tum.de", "aff_unique_abbr": "SNU;TUM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "South Korea;Germany" }, { "id": "EyDgK7q5vwJ", "title": "Streamlining EM into Auto-Encoder Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present a new deep neural network architecture, named EDGaM, for deep clustering. This architecture can seamlessly learn deep auto-encoders and capture common group features of complex inputs in the encoded latent space. The key idea is to introduce a differentiable Gaussian mixture neural network between an encoder and a decoder. In particular, EDGaM streamlines the iterative Expectation-Maximum (EM) algorithm of the Gaussian mixture models into network design and replaces the alternative update with a forward-backward optimization. Being differentiable, both network weights and clustering centroids in EDGaM can be learned simultaneously in an end-to-end manner through standard stochastic gradient descent. To avoid preserving too many sample-specific details, we use both the clustering centroid and the original latent embedding for decoding. Meanwhile, we distill the soft clustering assignment for each sample via entropy minimization such that a clear cluster structure is exhibited. Our experiments show that our method outperforms state-of-the-art unsupervised clustering techniques in terms of both efficiency and clustering performance. ", "keywords": "Deep Clustering;Differentiable EM", "primary_area": "", "supplementary_material": "/attachment/d647e0bab8dd0067bbb472688aef2d770bda3a3e.zip", "author": "Yuangang Pan;Ivor Tsang", "authorids": "~Yuangang_Pan2;~Ivor_Tsang1", "gender": "M;M", "homepage": "https://www.a-star.edu.sg/cfar/about-cfar/management/prof-ivor-tsang;https://yuangang-pan.github.io/", "dblp": "35/5873;215/4933", "google_scholar": "rJMOlVsAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Ivor_W_Tsang1;~Yuangang_Pan1", "aff": "University of Technology Sydney;University of Technology Sydney", "aff_domain": "uts.edu.au;uts.edu.au", "position": "Full Professor;Postdoc", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=EyDgK7q5vwJ", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "5;5;4;4", "wc_review": "520;312;263;224", "wc_reply_reviewers": "0;120;0;0", "wc_reply_authors": "1679;888;295;719", "reply_reviewers": "0;1;0;0", "reply_authors": "3;2;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 329.75, 114.1805040276141 ], "wc_reply_reviewers_avg": [ 30.0, 51.96152422706632 ], "wc_reply_authors_avg": [ 895.25, 501.4181762760501 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1035299569802578638&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Technology Sydney", "aff_unique_dep": "", "aff_unique_url": "https://www.uts.edu.au", "aff_unique_abbr": "UTS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Australia" }, { "title": "Rapid Task-Solving in Novel Environments", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3211", "id": "F-mvpFpn_0q", "poster": "", "openreview": "https://openreview.net/forum?id=F-mvpFpn_0q", "slides": "https://iclr.cc/virtual/2021/poster/3211", "video": "https://iclr.cc/virtual/2021/poster/3211", "author_site": "Samuel Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matthew Botvinick, David Raposo", "tldr": "", "abstract": "We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern deep RL agents exhibit some of these abilities in isolation, none are suitable for the full RTS challenge. To enable progress toward RTS, we introduce two challenge domains: (1) a minimal RTS challenge called the Memory&Planning Game and (2) One-Shot StreetLearn Navigation, which introduces scale and complexity from real-world data. We demonstrate that state-of-the-art deep RL agents fail at RTS in both domains, and that this failure is due to an inability to plan over gathered knowledge. We develop Episodic Planning Networks (EPNs) and show that deep-RL agents with EPNs excel at RTS, outperforming the nearest baseline by factors of 2-3 and learning to navigate held-out StreetLearn maps within a single episode. We show that EPNs learn to execute a value iteration-like planning algorithm and that they generalize to situations beyond their training experience.", "keywords": "deep reinforcement learning;meta learning;deep learning;exploration;planning", "primary_area": "", "supplementary_material": "", "author": "Samuel Ritter;Ryan Faulkner;Laurent Sartran;Adam Santoro;Matthew Botvinick;David Raposo", "authorids": "~Samuel_Ritter1;~Ryan_Faulkner2;~Laurent_Sartran1;~Adam_Santoro1;~Matthew_Botvinick1;~David_Raposo1", "gender": "M;M;;M;;M", "homepage": "http://www.princeton.edu/~swritter/;;;;;", "dblp": "176/2015;159/6155;;180/5951;98/5712;52/11138", "google_scholar": "https://scholar.google.co.uk/citations?user=dg7wnfAAAAAJ;F0nxdKYAAAAJ;;;;", "orcid": ";;;;;", "linkedin": ";ryan-faulkner-49b6412/;;;;", "or_profile": "~Samuel_Ritter1;~Ryan_Faulkner2;~Laurent_Sartran1;~Adam_Santoro1;~Matthew_Botvinick1;~David_Raposo1", "aff": "Google DeepMind;Google DeepMind;Google DeepMind;Google;Google DeepMind;Google", "aff_domain": "deepmind.com;deepmind.com;deepmind.com;google.com;google.com;google.com", "position": "Research Scientist;Researcher;Research Engineer;Research Scientist;Researcher;Research Scientist", "bibtex": "@inproceedings{\nritter2021rapid,\ntitle={Rapid Task-Solving in Novel Environments},\nauthor={Samuel Ritter and Ryan Faulkner and Laurent Sartran and Adam Santoro and Matthew Botvinick and David Raposo},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=F-mvpFpn_0q}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "4;7;7;8", "confidence": "5;3;4;4", "wc_review": "222;1134;409;231", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "309;1900;419;451", "reply_reviewers": "0;0;0;0", "reply_authors": "1;3;1;1", "rating_avg": [ 6.5, 1.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 499.0, 374.1249791179413 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 769.75, 654.672198508536 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 34, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3513118620186502845&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=F-mvpFpn_0q", "email": "deepmind.com;deepmind.com;deepmind.com;google.com;google.com;google.com", "author_num": 6, "aff_unique_index": "0;0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google DeepMind", "aff_unique_url": "https://deepmind.com", "aff_unique_abbr": "DeepMind", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0;1;0;1", "aff_country_unique": "United Kingdom;United States" }, { "title": "Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2550", "id": "F1vEjWK-lH_", "poster": "", "openreview": "https://openreview.net/forum?id=F1vEjWK-lH_", "slides": "https://iclr.cc/virtual/2021/poster/2550", "video": "https://iclr.cc/virtual/2021/poster/2550", "author_site": "Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao", "tldr": "", "abstract": "Massively multilingual models subsuming tens or even hundreds of languages pose great challenges to multi-task optimization. While it is a common practice to apply a language-agnostic procedure optimizing a joint multilingual task objective, how to properly characterize and take advantage of its underlying problem structure for improving optimization efficiency remains under-explored. In this paper, we attempt to peek into the black-box of multilingual optimization through the lens of loss function geometry. We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with not only language proximity but also the overall model performance. Such observation helps us to identify a critical limitation of existing gradient-based multi-task learning methods, and thus we derive a simple and scalable optimization procedure, named Gradient Vaccine, which encourages more geometrically aligned parameter updates for close tasks. Empirically, our method obtains significant model performance gains on multilingual machine translation and XTREME benchmark tasks for multilingual language models. Our work reveals the importance of properly measuring and utilizing language proximity in multilingual optimization, and has broader implications for multi-task learning beyond multilingual modeling.", "keywords": "Multi-task Learning;Multilingual Modeling", "primary_area": "", "supplementary_material": "", "author": "Zirui Wang;Yulia Tsvetkov;Orhan Firat;Yuan Cao", "authorids": "~Zirui_Wang1;~Yulia_Tsvetkov1;~Orhan_Firat1;~Yuan_Cao2", "gender": "M;F;M;M", "homepage": ";https://homes.cs.washington.edu/~yuliats/;;", "dblp": ";75/8157;120/2225;52/4472-7.html", "google_scholar": "GgD-B68AAAAJ;SEDPkrsAAAAJ;https://scholar.google.com.tr/citations?user=dLaR9lgAAAAJ;Q82vvqcAAAAJ", "orcid": ";0000-0002-4634-7128;;0000-0002-1267-8930", "linkedin": ";;;", "or_profile": "~Zirui_Wang1;~Yulia_Tsvetkov1;~Orhan_Firat1;~Yuan_Cao2", "aff": "School of Computer Science, Carnegie Mellon University;School of Computer Science, Carnegie Mellon University;Google;Google DeepMind", "aff_domain": "cs.cmu.edu;cs.cmu.edu;google.com;google.com", "position": "PhD student;Assistant Professor;Research Scientist;Research scientist", "bibtex": "@inproceedings{\nwang2021gradient,\ntitle={Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models},\nauthor={Zirui Wang and Yulia Tsvetkov and Orhan Firat and Yuan Cao},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=F1vEjWK-lH_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "3;3;4;4", "wc_review": "201;336;293;979", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "371;442;201;680", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 452.25, 308.004362793776 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 423.5, 172.04432568381904 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.9045340337332909, "gs_citation": 215, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6938665088862905379&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=F1vEjWK-lH_", "email": "cs.cmu.edu;cs.cmu.edu;google.com;google.com", "author_num": 4, "aff_unique_index": "0;0;1;1", "aff_unique_norm": "Carnegie Mellon University;Google", "aff_unique_dep": "School of Computer Science;Google", "aff_unique_url": "https://www.cmu.edu;https://www.google.com", "aff_unique_abbr": "CMU;Google", "aff_campus_unique_index": "0;0;1", "aff_campus_unique": "Pittsburgh;Mountain View;", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "United States;United Kingdom" }, { "title": "CPR: Classifier-Projection Regularization for Continual Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3353", "id": "F2v4aqEL6ze", "poster": "", "openreview": "https://openreview.net/forum?id=F2v4aqEL6ze", "slides": "https://iclr.cc/virtual/2021/poster/3353", "video": "https://iclr.cc/virtual/2021/poster/3353", "author_site": "Sungmin Cha, Hsiang Hsu, Taebaek Hwang, Flavio Calmon, Taesup Moon", "tldr": "", "abstract": "We propose a general, yet simple patch that can be applied to existing regularization-based continual learning methods called classifier-projection regularization (CPR). Inspired by both recent results on neural networks with wide local minima and information theory, CPR adds an additional regularization term that maximizes the entropy of a classifier's output probability. We demonstrate that this additional term can be interpreted as a projection of the conditional probability given by a classifier's output to the uniform distribution. By applying the Pythagorean theorem for KL divergence, we then prove that this projection may (in theory) improve the performance of continual learning methods. In our extensive experimental results, we apply CPR to several state-of-the-art regularization-based continual learning methods and benchmark performance on popular image recognition datasets. Our results demonstrate that CPR indeed promotes a wide local minima and significantly improves both accuracy and plasticity while simultaneously mitigating the catastrophic forgetting of baseline continual learning methods. The codes and scripts for this work are available at https://github.com/csm9493/CPR_CL.", "keywords": "continual learning;regularization;wide local minima", "primary_area": "", "supplementary_material": "/attachment/585af03a26a1c58a0bcddd802bc97e69188caec6.zip", "author": "Sungmin Cha;Hsiang Hsu;Taebaek Hwang;Flavio Calmon;Taesup Moon", "authorids": "~Sungmin_Cha1;~Hsiang_Hsu1;gxq9106@gmail.com;~Flavio_Calmon1;~Taesup_Moon1", "gender": "M;M;;;", "homepage": "https://sites.google.com/view/sungmin-cha/;https://hsianghsu.github.io;;http://people.seas.harvard.edu/~flavio/;https://mindlab-snu.github.io/people/pi/", "dblp": "206/6287;;;89/4611;05/4084", "google_scholar": "i0PPhfAAAAAJ;https://scholar.google.com.tw/citations?user=JRl3iYIAAAAJ;;P8N_YH4AAAAJ;lQlioBoAAAAJ", "orcid": ";0000-0001-8084-3929;;;0000-0002-9257-6503", "linkedin": ";;;;", "or_profile": "~Sungmin_Cha1;~Hsiang_Hsu1;gxq9106@gmail.com;~Flavio_Calmon1;~Taesup_Moon1", "aff": "Sungkyunkwan University;Harvard University;;Harvard University;Sungkyunkwan University", "aff_domain": "skku.edu;harvard.edu;;harvard.edu;skku.edu", "position": "PhD student;PhD student;;Assistant Professor;Associate Professor", "bibtex": "@inproceedings{\ncha2021cpr,\ntitle={{\\{}CPR{\\}}: Classifier-Projection Regularization for Continual Learning},\nauthor={Sungmin Cha and Hsiang Hsu and Taebaek Hwang and Flavio Calmon and Taesup Moon},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=F2v4aqEL6ze}\n}", "github": "[![github](/images/github_icon.svg) csm9493/CPR_CL](https://github.com/csm9493/CPR_CL)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;4;5;4", "wc_review": "714;327;347;185", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 393.25, 195.43333262266188 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 95, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17725325187082298099&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 12, "pdf": "https://openreview.net/pdf?id=F2v4aqEL6ze", "email": "skku.edu;harvard.edu;;harvard.edu;skku.edu", "author_num": 5, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "Sungkyunkwan University;Harvard University", "aff_unique_dep": ";", "aff_unique_url": "https://www.skku.edu;https://www.harvard.edu", "aff_unique_abbr": "SKKU;Harvard", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "South Korea;United States" }, { "title": "Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3335", "id": "F3s69XzWOia", "poster": "", "openreview": "https://openreview.net/forum?id=F3s69XzWOia", "slides": "https://iclr.cc/virtual/2021/poster/3335", "video": "https://iclr.cc/virtual/2021/poster/3335", "author_site": "T. Konstantin Rusch, Siddhartha Mishra", "tldr": "", "abstract": "Circuits of biological neurons, such as in the functional parts of the brain can be modeled as networks of coupled oscillators. Inspired by the ability of these systems to express a rich set of outputs while keeping (gradients of) state variables bounded, we propose a novel architecture for recurrent neural networks. Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations, modeling networks of controlled nonlinear oscillators. We prove precise bounds on the gradients of the hidden states, leading to the mitigation of the exploding and vanishing gradient problem for this RNN. Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks, demonstrating the potential of this architecture to provide stable and accurate RNNs for processing complex sequential data.", "keywords": "RNNs;Oscillators;Gradient stability;Long-term dependencies", "primary_area": "", "supplementary_material": "/attachment/e35e7d17e21181e18c5b221800fba862ae5eda59.zip", "author": "T. Konstantin Rusch;Siddhartha Mishra", "authorids": "~T._Konstantin_Rusch1;~Siddhartha_Mishra1", "gender": ";M", "homepage": "https://konstantinrusch.com;http://www.sam.math.ethz.ch/", "dblp": "266/1519;07/2856.html", "google_scholar": "9LajlSsAAAAJ;FmEqyNcAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~T._Konstantin_Rusch1;~Siddhartha_Mishra1", "aff": "Swiss Federal Institute of Technology;Swiss Federal Institute of Technology", "aff_domain": "ethz.ch;ethz.ch", "position": "PhD student;Full Professor", "bibtex": "@inproceedings{\nrusch2021coupled,\ntitle={Coupled Oscillatory Recurrent Neural Network (co{\\{}RNN{\\}}): An accurate and (gradient) stable architecture for learning long time dependencies},\nauthor={T. Konstantin Rusch and Siddhartha Mishra},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=F3s69XzWOia}\n}", "github": "[![github](/images/github_icon.svg) tk-rusch/coRNN](https://github.com/tk-rusch/coRNN)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "7;7;7;8", "confidence": "3;3;3;5", "wc_review": "363;786;385;612", "wc_reply_reviewers": "0;214;0;0", "wc_reply_authors": "391;1527;1307;995", "reply_reviewers": "0;2;0;0", "reply_authors": "1;4;2;2", "rating_avg": [ 7.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 536.5, 173.92886476947982 ], "wc_reply_reviewers_avg": [ 53.5, 92.66471820493493 ], "wc_reply_authors_avg": [ 1055.0, 427.42952635493026 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.25, 1.0897247358851685 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 118, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12873705644376791624&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=F3s69XzWOia", "email": "ethz.ch;ethz.ch", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Swiss Federal Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Switzerland" }, { "id": "F438zjb-XaM", "title": "Crowd-sourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The case of Fon Language", "track": "main", "status": "Reject", "tldr": "", "abstract": "Building effective neural machine translation (NMT) models for very low-resourced and morphologically rich African indigenous languages is an open challenge. Besides the issue of finding available resources for them, a lot of work is put into preprocessing and tokenization. Recent studies have shown that standard tokenization methods do not always adequately deal with the grammatical, diacritical, and tonal properties of some African languages. That, coupled with the extremely low availability of training samples, hinders the production of reliable NMT models. In this paper, using Fon language as a case study, we revisit standard tokenization methods and introduce Word-Expressions-Based (WEB) tokenization, a human-involved super-words tokenization strategy to create a better representative vocabulary for training.", "keywords": "nmt;nlp;neural machine translation;natural language processing;deep learning;machine learning;machine translation;mt", "primary_area": "", "supplementary_material": "", "author": "Bonaventure F. P. Dossou;Chris Chinenye Emezue", "authorids": "~Bonaventure_F._P._Dossou1;~Chris_Chinenye_Emezue1", "gender": "M;M", "homepage": "https://twitter.com/bonadossou;https://twitter.com/ChrisEmezue", "dblp": "261/9506;261/9858", "google_scholar": "2J581k0AAAAJ;PBHOsekAAAAJ", "orcid": ";0000-0002-3533-6829", "linkedin": "bonaventuredossou/;chris-emezue-4878471a9/", "or_profile": "~Bonaventure_F._P._Dossou1;~Chris_Chinenye_Emezue1", "aff": "Jacobs University Bremen;", "aff_domain": "jacobs-university.de;", "position": "MS student;", "bibtex": "@misc{\ndossou2021crowdsourced,\ntitle={Crowd-sourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The case of Fon Language},\nauthor={Bonaventure F. P. Dossou and Chris Chinenye Emezue},\nyear={2021},\nurl={https://openreview.net/forum?id=F438zjb-XaM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=F438zjb-XaM", "pdf_size": 0, "rating": "3;4;5", "confidence": "3;4;5", "wc_review": "352;338;875", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 521.6666666666666, 249.90976149180105 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1749449466836117086&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "Jacobs University", "aff_unique_dep": "", "aff_unique_url": "https://www.jacobs-university.de", "aff_unique_abbr": "JUB", "aff_campus_unique_index": "0", "aff_campus_unique": "Bremen", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "id": "F8lXvXpZdrL", "title": "Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights.\nMany successful experimental results have been achieved with empirical straight-through (ST) approaches, proposing a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete weights. At the same time, ST methods can be truly derived as estimators in the stochastic binary network (SBN) model with Bernoulli weights. We advance these derivations to a more complete and systematic study. We analyze properties, estimation accuracy, obtain different forms of correct ST estimators for activations and weights, explain existing empirical approaches and their shortcomings, explain how latent weights arise from the mirror descent method when optimizing over probabilities. This allows to reintroduce, once empirical, ST methods as sound approximations, apply them with clarity and develop further improvements.", "keywords": "straight-through;binary;stochastic binary;mirror descent", "primary_area": "", "supplementary_material": "", "author": "Alexander Shekhovtsov;Viktor Yanush", "authorids": "~Alexander_Shekhovtsov1;yanushviktor@gmail.com", "gender": "M;", "homepage": "http://cmp.felk.cvut.cz/~shekhovt/;", "dblp": "61/5386;", "google_scholar": "https://scholar.google.cz/citations?hl=en;", "orcid": ";", "linkedin": ";", "or_profile": "~Alexander_Shekhovtsov1;yanushviktor@gmail.com", "aff": "Czech Technical University in Prague;", "aff_domain": "cvut.cz;", "position": "Assistant Professor;", "bibtex": "@misc{\nshekhovtsov2021reintroducing,\ntitle={Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks},\nauthor={Alexander Shekhovtsov and Viktor Yanush},\nyear={2021},\nurl={https://openreview.net/forum?id=F8lXvXpZdrL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=F8lXvXpZdrL", "pdf_size": 0, "rating": "5;5;7;7", "confidence": "3;3;2;2", "wc_review": "220;326;524;186", "wc_reply_reviewers": "150;0;32;0", "wc_reply_authors": "1108;820;568;207", "reply_reviewers": "1;0;1;0", "reply_authors": "3;2;1;1", "rating_avg": [ 6.0, 1.0 ], "confidence_avg": [ 2.5, 0.5 ], "wc_review_avg": [ 314.0, 131.78011989674314 ], "wc_reply_reviewers_avg": [ 45.5, 61.731272463800714 ], "wc_reply_authors_avg": [ 675.75, 331.27962131709825 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=173263409787168618&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "Czech Technical University", "aff_unique_dep": "", "aff_unique_url": "https://www.ctu.cz", "aff_unique_abbr": "CTU", "aff_campus_unique_index": "0", "aff_campus_unique": "Prague", "aff_country_unique_index": "0", "aff_country_unique": "Czech Republic" }, { "title": "Contrastive Syn-to-Real Generalization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2949", "id": "F8whUO8HNbP", "poster": "", "openreview": "https://openreview.net/forum?id=F8whUO8HNbP", "slides": "https://iclr.cc/virtual/2021/poster/2949", "video": "https://iclr.cc/virtual/2021/poster/2949", "author_site": "Wuyang Chen, Zhiding Yu, Shalini De Mello, Sifei Liu, Jose M. Alvarez, Zhangyang Wang, Anima Anandkumar", "tldr": "", "abstract": "Training on synthetic data can be beneficial for label or data-scarce scenarios. However, synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that the diversity of the learned feature embeddings plays an important role in the generalization performance. To this end, we propose contrastive synthetic-to-real generalization (CSG), a novel framework that leverage the pre-trained ImageNet knowledge to prevent overfitting to the synthetic domain, while promoting the diversity of feature embeddings as an inductive bias to improve generalization. In addition, we enhance the proposed CSG framework with attentional pooling (A-pool) to let the model focus on semantically important regions and further improve its generalization. We demonstrate the effectiveness of CSG on various synthetic training tasks, exhibiting state-of-the-art performance on zero-shot domain generalization.", "keywords": "synthetic-to-real generalization;domain generalization", "primary_area": "", "supplementary_material": "", "author": "Wuyang Chen;Zhiding Yu;Shalini De Mello;Sifei Liu;Jose M. Alvarez;Zhangyang Wang;Anima Anandkumar", "authorids": "~Wuyang_Chen1;~Zhiding_Yu1;~Shalini_De_Mello1;~Sifei_Liu2;~Jose_M._Alvarez2;~Zhangyang_Wang1;~Anima_Anandkumar1", "gender": ";;Not Specified;F;;M;", "homepage": ";;https://research.nvidia.com/person/shalini-de-mello;https://www.sifeiliu.net;;https://vita-group.github.io;", "dblp": ";;206/7364;118/1301;;119/4026;", "google_scholar": ";;xQM4BlMAAAAJ;j4pcHV4AAAAJ;;pxFyKAIAAAAJ;", "orcid": ";;;;;;", "linkedin": ";;shalini-de-mello-02b8251/;;;;", "or_profile": "~Wuyang_Chen1;~Zhiding_Yu1;~Shalini_De_Mello1;~Sifei_Liu2;~Jose_M._Alvarez2;~Zhangyang_Wang1;~Anima_Anandkumar1", "aff": ";;NVIDIA;NVIDIA;;University of Texas, Austin;", "aff_domain": ";;nvidia.com;nvidia.com;;utexas.edu;", "position": ";;Principal Researcher;Researcher;;Assistant Professor;", "bibtex": "@inproceedings{\nchen2021contrastive,\ntitle={Contrastive Syn-to-Real Generalization},\nauthor={Wuyang Chen and Zhiding Yu and Shalini De Mello and Sifei Liu and Jose M. Alvarez and Zhangyang Wang and Anima Anandkumar},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=F8whUO8HNbP}\n}", "github": "[![github](/images/github_icon.svg) NVlabs/CSG](https://github.com/NVlabs/CSG) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=F8whUO8HNbP)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;4;4;4", "wc_review": "291;301;315;447", "wc_reply_reviewers": "0;115;0;18", "wc_reply_authors": "448;395;405;280", "reply_reviewers": "0;1;0;1", "reply_authors": "1;2;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 338.5, 63.21985447626402 ], "wc_reply_reviewers_avg": [ 33.25, 47.76701267611363 ], "wc_reply_authors_avg": [ 382.0, 62.165102750659074 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 59, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14950252501736329080&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=F8whUO8HNbP", "email": ";;nvidia.com;nvidia.com;;utexas.edu;", "author_num": 7, "aff_unique_index": "0;0;1", "aff_unique_norm": "NVIDIA;University of Texas at Austin", "aff_unique_dep": "NVIDIA Corporation;", "aff_unique_url": "https://www.nvidia.com;https://www.utexas.edu", "aff_unique_abbr": "NVIDIA;UT Austin", "aff_campus_unique_index": "1", "aff_campus_unique": ";Austin", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "F8xpAPm_ZKS", "title": "Model-Free Counterfactual Credit Assignment", "track": "main", "status": "Reject", "tldr": "", "abstract": "Credit assignment in reinforcement learning is the problem of measuring an action\u2019s influence on future rewards. \nIn particular, this requires separating \\emph{skill} from \\emph{luck}, ie.\\ disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. \nThe key idea is to condition value functions on \\emph{future} events, by learning to extract relevant information from a trajectory. We then propose to use these as future-conditional baselines and critics in policy gradient algorithms and we develop a valid, practical variant with provably lower variance, while achieving unbiasedness by constraining the hindsight information not to contain information about the agent\u2019s actions. We demonstrate the efficacy and validity of our algorithm on a number of illustrative problems.", "keywords": "credit assignment;model-free RL;causality;hindsight", "primary_area": "", "supplementary_material": "", "author": "Thomas Mesnard;Theophane Weber;Fabio Viola;Shantanu Thakoor;Alaa Saade;Anna Harutyunyan;Will Dabney;Tom Stepleton;Nicolas Heess;Marcus Hutter;Lars Holger Buesing;Remi Munos", "authorids": "mesnard@google.com;~Theophane_Weber1;~Fabio_Viola2;thakoor@google.com;alaas@google.com;~Anna_Harutyunyan1;~Will_Dabney1;~Tom_Stepleton1;~Nicolas_Heess1;~Marcus_Hutter1;~Lars_Holger_Buesing1;~Remi_Munos1", "gender": ";M;;;;;M;;;;M;M", "homepage": ";http://www.thphn.com/;;;;;;;;http://www.hutter1.net/;;http://researchers.lille.inria.fr/~munos/", "dblp": ";;;;;121/3997;https://dblp.uni-trier.de/pers/hd/d/Dabney:Will;82/6271.html;76/9181;h/MarcusHutter;https://dblp.uni-trier.de/pers/hd/b/Buesing:Lars;69/6815", "google_scholar": ";LZxqcX4AAAAJ;;;;;https://scholar.google.co.uk/citations?user=dR-7QW8AAAAJ;;79k7bGEAAAAJ;https://scholar.google.com.tw/citations?user=7hmCntEAAAAJ;1h_mxPMAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;;;;;0000-0002-3263-4097;;", "linkedin": ";;;;;;;stepleton;;hutter1/;;", "or_profile": "mesnard@google.com;~Theophane_Weber1;~Fabio_Viola2;thakoor@google.com;alaas@google.com;~Anna_Harutyunyan1;~Will_Dabney1;~Tom_Stepleton1;~Nicolas_Heess1;~Marcus_Hutter1;~Lars_Holger_Buesing1;~Remi_Munos1", "aff": ";;;;;Google DeepMind;Google DeepMind;Google DeepMind;Google DeepMind;Australian National University;Deepmind;Google DeepMind", "aff_domain": ";;;;;google.com;google.com;google.com;google.com;anu.edu.au;google.com;google.com", "position": ";;;;;Research Scientist;Research Scientist;Researcher;Research Scientist;Full Professor;Postdoc;Research scientist", "bibtex": "@misc{\nmesnard2021modelfree,\ntitle={Model-Free Counterfactual Credit Assignment},\nauthor={Thomas Mesnard and Theophane Weber and Fabio Viola and Shantanu Thakoor and Alaa Saade and Anna Harutyunyan and Will Dabney and Tom Stepleton and Nicolas Heess and Marcus Hutter and Lars Holger Buesing and Remi Munos},\nyear={2021},\nurl={https://openreview.net/forum?id=F8xpAPm_ZKS}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=F8xpAPm_ZKS", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "3;4;3;3", "wc_review": "1067;274;424;682", "wc_reply_reviewers": "0;220;0;0", "wc_reply_authors": "2665;596;229;778", "reply_reviewers": "0;2;0;0", "reply_authors": "5;3;1;1", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 611.75, 300.6296517311624 ], "wc_reply_reviewers_avg": [ 55.0, 95.26279441628824 ], "wc_reply_authors_avg": [ 1067.0, 943.5584242642318 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.5, 1.6583123951777 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 12, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10184658778580576564&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;1;2;0", "aff_unique_norm": "Google;Australian National University;DeepMind", "aff_unique_dep": "Google DeepMind;;", "aff_unique_url": "https://deepmind.com;https://www.anu.edu.au;https://deepmind.com", "aff_unique_abbr": "DeepMind;ANU;DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;1;0;0", "aff_country_unique": "United Kingdom;Australia" }, { "id": "F9sPTWSKznC", "title": "DiP Benchmark Tests: Evaluation Benchmarks for Discourse Phenomena in MT", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite increasing instances of machine translation (MT) systems including extrasentential context information, the evidence for translation quality improvement is sparse, especially for discourse phenomena. Popular metrics like BLEU are not expressive or sensitive enough to capture quality improvements or drops that are minor in size but significant in perception. We introduce the first of their kind MT benchmark testsets that aim to track and hail improvements across four main discourse phenomena: anaphora, lexical consistency, coherence and readability, and discourse connective translation. We also introduce evaluation methods for these tasks, and evaluate several competitive baseline MT systems on the curated datasets. Surprisingly, we find that the complex context-aware models that we test do not improve discourse-related translations consistently across languages and phenomena. Our evaluation benchmark is available as a leaderboard at . ", "keywords": "machine translation;discourse;evaluation;benchmark;testsets;leaderboard", "primary_area": "", "supplementary_material": "", "author": "Prathyusha Jwalapuram;Barbara Rychalska;Shafiq Joty;Dominika Basaj", "authorids": "~Prathyusha_Jwalapuram1;~Barbara_Rychalska1;~Shafiq_Joty1;~Dominika_Basaj1", "gender": "F;F;M;", "homepage": "https://pjwalapuram.com/;;https://raihanjoty.github.io/;", "dblp": "214/9948;186/7257;62/2078;227/2444", "google_scholar": "https://scholar.google.co.in/citations?hl=en;;hR249csAAAAJ;", "orcid": ";;;", "linkedin": "prathyusha-jwalapuram-094220154/;;;", "or_profile": "~Prathyusha_Jwalapuram1;~Barbara_Rychalska1;~Shafiq_Joty1;~Dominika_Basaj1", "aff": "Nanyang Technological University, Singapore;Warsaw University of Technology;Nanyang Technological University;", "aff_domain": "ntu.edu.sg;pw.edu.pl;ntu.edu.sg;", "position": "PhD student;PhD student;Assistant Professor;", "bibtex": "@misc{\njwalapuram2021dip,\ntitle={DiP Benchmark Tests: Evaluation Benchmarks for Discourse Phenomena in {\\{}MT{\\}}},\nauthor={Prathyusha Jwalapuram and Barbara Rychalska and Shafiq Joty and Dominika Basaj},\nyear={2021},\nurl={https://openreview.net/forum?id=F9sPTWSKznC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=F9sPTWSKznC", "pdf_size": 0, "rating": "4;4;6;7", "confidence": "4;4;4;3", "wc_review": "473;618;254;194", "wc_reply_reviewers": "0;367;21;101", "wc_reply_authors": "1032;1794;474;652", "reply_reviewers": "0;1;1;1", "reply_authors": "2;3;2;2", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 384.75, 170.05495435299733 ], "wc_reply_reviewers_avg": [ 122.25, 146.2452990697479 ], "wc_reply_authors_avg": [ 988.0, 507.1153714885795 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7777777777777777, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:wHlYbis7bVQJ:scholar.google.com/&scioq=DiP+Benchmark+Tests:+Evaluation+Benchmarks+for+Discourse+Phenomena+in+MT&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Nanyang Technological University;Warsaw University of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.ntu.edu.sg;https://www.pw.edu.pl", "aff_unique_abbr": "NTU;WUT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "Singapore;Poland" }, { "title": "Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2881", "id": "FGqiDsBUKL0", "poster": "", "openreview": "https://openreview.net/forum?id=FGqiDsBUKL0", "slides": "https://iclr.cc/virtual/2021/poster/2881", "video": "https://iclr.cc/virtual/2021/poster/2881", "author_site": "Xingang Pan, Bo DAI, Ziwei Liu, Chen Change Loy, Ping Luo", "tldr": "", "abstract": "Natural images are projections of 3D objects on a 2D image plane. While state-of-the-art 2D generative models like GANs show unprecedented quality in modeling the natural image manifold, it is unclear whether they implicitly capture the underlying 3D object structures. And if so, how could we exploit such knowledge to recover the 3D shapes of objects in the images? To answer these questions, in this work, we present the first attempt to directly mine 3D geometric cues from an off-the-shelf 2D GAN that is trained on RGB images only. Through our investigation, we found that such a pre-trained GAN indeed contains rich 3D knowledge and thus can be used to recover 3D shape from a single 2D image in an unsupervised manner. The core of our framework is an iterative strategy that explores and exploits diverse viewpoint and lighting variations in the GAN image manifold. The framework does not require 2D keypoint or 3D annotations, or strong assumptions on object shapes (e.g. shapes are symmetric), yet it successfully recovers 3D shapes with high precision for human faces, cats, cars, and buildings. The recovered 3D shapes immediately allow high-quality image editing like relighting and object rotation. We quantitatively demonstrate the effectiveness of our approach compared to previous methods in both 3D shape reconstruction and face rotation. Our code is available at https://github.com/XingangPan/GAN2Shape.", "keywords": "Generative Adversarial Network;3D Reconstruction", "primary_area": "", "supplementary_material": "/attachment/bd9ec6f44176927771844efe0eadc152e0aeab43.zip", "author": "Xingang Pan;Bo Dai;Ziwei Liu;Chen Change Loy;Ping Luo", "authorids": "~Xingang_Pan1;~Bo_Dai2;~Ziwei_Liu1;~Chen_Change_Loy2;~Ping_Luo2", "gender": "M;M;M;M;", "homepage": "https://xingangpan.github.io/;http://daibo.info/;https://liuziwei7.github.io/;https://www.mmlab-ntu.com/person/ccloy/index.html;http://luoping.me/", "dblp": "211/7940;64/2903-2;05/6300-2;01/5855;54/4989-2.html", "google_scholar": "https://scholar.google.com.hk/citations?user=uo0q9WgAAAAJ;https://scholar.google.com.hk/citations?user=KNWTvgEAAAAJ;https://scholar.google.com.hk/citations?user=lc45xlcAAAAJ;https://scholar.google.co.uk/citations?user=559LF80AAAAJ;https://scholar.google.com.hk/citations?hl=en", "orcid": "0000-0002-5825-9467;0000-0003-0777-9232;;0000-0001-5345-1591;0000-0002-6685-7950", "linkedin": ";;;;", "or_profile": "~Xingang_Pan1;~Bo_Dai2;~Ziwei_Liu1;~Chen_Change_Loy2;~Luo_Ping2", "aff": "The Chinese University of Hong Kong;Nanyang Technological University;Nanyang Technological University;Nanyang Technological University;The University of Hong Kong", "aff_domain": "cuhk.edu.hk;ntu.edu.sg;ntu.edu.sg;ntu.edu.sg;hku.hk", "position": "PhD student;Research Assistant Professor;Assistant Professor;Full Professor;Assistant Professor", "bibtex": "@inproceedings{\npan2021do,\ntitle={Do 2D {\\{}GAN{\\}}s Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image {\\{}GAN{\\}}s},\nauthor={Xingang Pan and Bo Dai and Ziwei Liu and Chen Change Loy and Ping Luo},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=FGqiDsBUKL0}\n}", "github": "[![github](/images/github_icon.svg) XingangPan/GAN2Shape](https://github.com/XingangPan/GAN2Shape)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "7;8;8", "confidence": "3;4;5", "wc_review": "298;393;517", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "348;424;753", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 7.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 402.6666666666667, 89.6672862432126 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 508.3333333333333, 175.7656267748491 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 133, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8733088455639387061&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 13, "pdf": "https://openreview.net/pdf?id=FGqiDsBUKL0", "email": "cuhk.edu.hk;ntu.edu.sg;ntu.edu.sg;ntu.edu.sg;hku.hk", "author_num": 5, "aff_unique_index": "0;1;1;1;2", "aff_unique_norm": "Chinese University of Hong Kong;Nanyang Technological University;University of Hong Kong", "aff_unique_dep": ";;", "aff_unique_url": "https://www.cuhk.edu.hk;https://www.ntu.edu.sg;https://www.hku.hk", "aff_unique_abbr": "CUHK;NTU;HKU", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hong Kong SAR;", "aff_country_unique_index": "0;1;1;1;0", "aff_country_unique": "China;Singapore" }, { "id": "FGvJvxn2wWO", "title": "Domain Adaptation with Morphologic Segmentation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We present a novel domain adaptation framework that uses morphologic segmentation to translate images from arbitrary input domains (real and synthetic) into a uniform output domain. Our framework is based on an established image-to-image translation pipeline that allows us to first transform the input image into a generalized representation that encodes morphology and semantics \u2013 the edge-plus-segmentation map (EPS) \u2013 which is then transformed into an output domain. Images transformed into the output domain are photo-realistic and free of artifacts that are commonly present across different real (e.g. lens flare, motion blur, etc.) and synthetic (e.g. unrealistic textures, simplified geometry, etc.) data sets. Our goal is to establish a preprocessing step that unifies data from multiple sources into a common representation that facilitates training downstream tasks in computer vision. This way, neural networks for existing tasks can be trained on a larger variety of training data, while they are also less affected by overfitting to specific data sets. We showcase the effectiveness of our approach by qualitatively and quantitatively evaluating our method on four data sets of simulated and real data of urban scenes.", "keywords": "Domain Adaptation;Morphologic Segmentation;Image-to-image Translation", "primary_area": "", "supplementary_material": "/attachment/c881565b899fde4d20bc262be749760d420ed756.zip", "author": "Jonathan Klein;Soren Pirk;Dominik Michels", "authorids": "~Jonathan_Klein1;~Soren_Pirk2;~Dominik_Michels1", "gender": "M;M;Not Specified", "homepage": "https://www.kaust.edu.sa/en/study/faculty/dominik-michels;https://jonathank.de/research/;http://www.pirk.io", "dblp": "131/3147;22/5462;79/9280", "google_scholar": ";wzejV1EAAAAJ;X9AjIugAAAAJ", "orcid": ";0000-0001-6560-0988;0000-0003-1937-9797", "linkedin": ";;", "or_profile": "~Dominik_Michels1;~Jonathan_Klein2;~Soren_Pirk1", "aff": "KAUST;Rheinische Friedrich-Wilhelms Universit\u00e4t Bonn;Google Inc", "aff_domain": "kaust.edu.sa;uni-bonn.de;googel.com", "position": "Associate Professor;PhD student;Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer5;AnonReviewer3", "site": "https://openreview.net/forum?id=FGvJvxn2wWO", "pdf_size": 0, "rating": "3;3;4;4;5", "confidence": "5;5;5;5;5", "wc_review": "242;320;482;241;330", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 3.8, 0.7483314773547882 ], "confidence_avg": [ 5.0, 0.0 ], "wc_review_avg": [ 323.0, 87.89084138862252 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8694325797034221284&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;2", "aff_unique_norm": "King Abdullah University of Science and Technology;Rheinische Friedrich-Wilhelms Universit\u00e4t Bonn;Google", "aff_unique_dep": ";;Google", "aff_unique_url": "https://www.kaust.edu.sa;https://www.uni-bonn.de/;https://www.google.com", "aff_unique_abbr": "KAUST;Uni Bonn;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;1;2", "aff_country_unique": "Saudi Arabia;Germany;United States" }, { "id": "FKotzp6PZJw", "title": "On the Estimation Bias in Double Q-Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operator. Its variants in the deep Q-learning paradigm have shown great promise in producing reliable value prediction and improving learning performance. However, as shown by prior work, double Q-learning is not fully unbiased and still suffers from underestimation bias. In this paper, we show that such underestimation bias may lead to multiple non-optimal fixed points under an approximated Bellman operation. To address the concerns of converging to non-optimal stationary solutions, we propose a simple and effective approach as a partial fix for underestimation bias in double Q-learning. This approach leverages real returns to bound the target value. We extensively evaluate the proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms.", "keywords": "Reinforcement learning;Q-learning;Estimation bias", "primary_area": "", "supplementary_material": "/attachment/c337d505a25653ed6034e58985c5948381649a2f.zip", "author": "Zhizhou Ren;Guangxiang Zhu;Beining Han;Jianglun Chen;Chongjie Zhang", "authorids": "~Zhizhou_Ren1;~Guangxiang_Zhu1;~Beining_Han1;~Jianglun_Chen2;~Chongjie_Zhang1", "gender": "M;M;M;;", "homepage": ";https://guangxiangzhu.github.io/;;;", "dblp": "https://dblp.uni-trier.de/pid/239/5714.html;206/6861;266/7819;;29/6693", "google_scholar": "xgpMeDgAAAAJ;pTS7LTkAAAAJ;LVjU7xIAAAAJ;;LjxqXycAAAAJ", "orcid": ";;;;", "linkedin": ";guangxiang-zhu-14baa9120/;%E8%B4%9D%E5%AE%81-%E9%9F%A9-b79204207/details/experience/;%E6%B1%9F%E4%BC%A6-%E9%99%88-00271416b/;", "or_profile": "~Zhizhou_Ren1;~Guangxiang_Zhu1;~Beining_Han1;~Jianglun_Chen2;~Chongjie_Zhang1", "aff": "University of Illinois, Urbana Champaign;;IIIS, Tsinghua University;Tsinghua University;Tsinghua University", "aff_domain": "illinois.edu;;mails.tsinghua.edu.cn;tsinghua.edu.cn;tsinghua.edu.cn", "position": "PhD student;;Undergrad student;Undergrad student;Assistant Professor", "bibtex": "@misc{\nren2021on,\ntitle={On the Estimation Bias in Double Q-Learning},\nauthor={Zhizhou Ren and Guangxiang Zhu and Beining Han and Jianglun Chen and Chongjie Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=FKotzp6PZJw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=FKotzp6PZJw", "pdf_size": 0, "rating": "3;6;6;6", "confidence": "4;4;3;3", "wc_review": "322;451;184;408", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "876;547;739;601", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;2;1", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 341.25, 101.97885810304017 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 690.75, 127.83265427894392 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 27, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6701423240345765419&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "University of Illinois Urbana-Champaign;Tsinghua University", "aff_unique_dep": ";Institute for Interdisciplinary Information Sciences", "aff_unique_url": "https://illinois.edu;https://www.tsinghua.edu.cn", "aff_unique_abbr": "UIUC;THU", "aff_campus_unique_index": "0", "aff_campus_unique": "Urbana-Champaign;", "aff_country_unique_index": "0;1;1;1", "aff_country_unique": "United States;China" }, { "id": "FMdjYY6H8-Z", "title": "RETHINKING LOCAL LOW RANK MATRIX DETECTION:A MULTIPLE-FILTER BASED NEURAL NETWORK FRAMEWORK", "track": "main", "status": "Reject", "tldr": "", "abstract": "The matrix local low rank representation (MLLRR) is a critical dimension reduction technique widely used in recommendation systems, text mining and computer vision. In MLLRR, how to robustly identify the row and column indices that forma distinct low rank sub-matrix is a major challenge. In this work, we first organized the general MLLRR problem into three inter-connected sub-problems based on different low rank properties, namely, LLR-1C, LLR-1, and LLR-r. Existing solutions on MLLRR all leverage problem-specific assumptions and mainly focused on the LLR-1C problem, which lacks the capacity to detect a substantial amount of true and interesting patterns generalizability and prohibits. In this work, we developed a novel multiple-filter based neural network framework, namely FLLRM, which is the first of its kind to solve all three MLLRR problems.We systematically benchmarked FLLRM with state-of-the-art methods on an extensive set of synthetic data, empowered by a robustness evaluation of parameters and theoretical discussions. Experimental results showed that FLLRM outperforms all existing methods and enables a general solution to all the three sub-problems. Experiments on real-world datasets also validated the effectiveness of FLLRM on identifying local low rank matrices corresponding to novel context specific knowledge.", "keywords": "Matrix decomposition;Local Low Rank matrix detection;Representation learning;Subspace learning", "primary_area": "", "supplementary_material": "", "author": "Pengtao Dang;Wennan Chang;Haiqi Zhu;Changlin Wan;Tong Zhao;Tingbo Guo;Paul Salama;Sha Cao;Chi Zhang", "authorids": "~Pengtao_Dang1;chang534@purdue.edu;haiqzhu@iu.edu;~Changlin_Wan1;zhaoton@amazon.com;guotingbo.tbg@foxmail.com;~Paul_Salama1;robincaosha@gmail.com;czhang87@iu.edu", "gender": "M;;;M;;;M;;", "homepage": "https://zcslab.github.io/people/pengtao/;;;https://clwan.github.io/;;;;;", "dblp": "312/3705;;;15/158;;;;;", "google_scholar": "p1j1-YIAAAAJ;;;DISpxbgAAAAJ;;;r5wLPJkAAAAJ;;", "orcid": ";;;;;;0000-0002-7643-3879;;", "linkedin": ";;;;;;;;", "or_profile": "~Pengtao_Dang1;chang534@purdue.edu;haiqzhu@iu.edu;~Changlin_Wan1;zhaoton@amazon.com;guotingbo.tbg@foxmail.com;~Paul_Salama1;robincaosha@gmail.com;czhang87@iu.edu", "aff": "Purdue University;;;Purdue University;;;Indiana University/Purdue University at Indianapolis;;", "aff_domain": "purdue.edu;;;purdue.edu;;;iupui.edu;;", "position": "PhD student;;;PhD student;;;Full Professor;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer6;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=FMdjYY6H8-Z", "pdf_size": 0, "rating": "3;4;5", "confidence": "3;3;5", "wc_review": "182;93;728", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "494;569;1661", "reply_reviewers": "0;0;0", "reply_authors": "1;1;3", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 334.3333333333333, 280.72564700947595 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 908.0, 533.3310416617431 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.8660254037844387, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:xSwI165Das4J:scholar.google.com/&scioq=RETHINKING+LOCAL+LOW+RANK+MATRIX+DETECTION:A+MULTIPLE-FILTER+BASED+NEURAL+NETWORK+FRAMEWORK&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "Purdue University;Indiana University-Purdue University Indianapolis", "aff_unique_dep": ";", "aff_unique_url": "https://www.purdue.edu;https://www.iupui.edu", "aff_unique_abbr": "Purdue;IUPUI", "aff_campus_unique_index": "1", "aff_campus_unique": ";Indianapolis", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "FN7_BUOG78e", "title": "Computing Preimages of Deep Neural Networks with Applications to Safety", "track": "main", "status": "Reject", "tldr": "", "abstract": "To apply an algorithm in a sensitive domain it is important to understand the set of input values that result in specific decisions. Deep neural networks suffer from an inherent instability that makes this difficult: different outputs can arise from very similar inputs. \n\nWe present a method to check that the decisions of a deep neural network are as intended by constructing the exact, analytical preimage of its predictions. Preimages generalize verification in the sense that they can be used to verify a wide class of properties, and answer much richer questions besides. We examine the functioning and failures of neural networks used in robotics, including an aircraft collision avoidance system, related to sequential decision making and extrapolation.\n\nOur method iterates backwards through the layers of piecewise linear deep neural networks. Uniquely, we compute \\emph{all} intermediate values that correspond to a prediction, propagating this calculation through layers using analytical formulae for layer preimages. \n\n", "keywords": "Deep neural networks;verification;interpretation;AI safety;ACAS", "primary_area": "", "supplementary_material": "/attachment/352fe11c1f55499314ea7e159bd13e090699c9a1.zip", "author": "Kyle Matoba;Fran\u00e7ois Fleuret", "authorids": "~Kyle_Matoba1;~Fran\u00e7ois_Fleuret2", "gender": ";M", "homepage": ";https://fleuret.org/francois/", "dblp": "https://dblp.uni-trier.de/pid/150/1860.html;90/5265", "google_scholar": ";https://scholar.google.ch/citations?user=Bj1tRlsAAAAJ", "orcid": ";0000-0001-9457-7393", "linkedin": ";francois-fleuret/", "or_profile": "~Kyle_Matoba1;~Francois_Fleuret1", "aff": "Swiss Federal Institute of Technology Lausanne;University of Geneva", "aff_domain": "epfl.ch;unige.ch", "position": "PhD student;Full Professor", "bibtex": "@misc{\nmatoba2021computing,\ntitle={Computing Preimages of Deep Neural Networks with Applications to Safety},\nauthor={Kyle Matoba and Fran{\\c{c}}ois Fleuret},\nyear={2021},\nurl={https://openreview.net/forum?id=FN7_BUOG78e}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=FN7_BUOG78e", "pdf_size": 0, "rating": "2;3;3;4", "confidence": "5;4;4;3", "wc_review": "305;318;402;139", "wc_reply_reviewers": "0;0;184;0", "wc_reply_authors": "385;478;818;482", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 3.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 291.0, 95.3283798246881 ], "wc_reply_reviewers_avg": [ 46.0, 79.67433714816836 ], "wc_reply_authors_avg": [ 540.75, 164.70788536071976 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12103325405301122212&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Swiss Federal Institute of Technology Lausanne;University of Geneva", "aff_unique_dep": ";", "aff_unique_url": "https://www.epfl.ch;https://www.unige.ch", "aff_unique_abbr": "EPFL;UNIGE", "aff_campus_unique_index": "0", "aff_campus_unique": "Lausanne;", "aff_country_unique_index": "0;0", "aff_country_unique": "Switzerland" }, { "id": "FOR2VqgJXb", "title": "Evaluating representations by the complexity of learning low-loss predictors", "track": "main", "status": "Reject", "tldr": "", "abstract": "We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest. To this end, we introduce two measures: surplus description length (SDL) and $\\varepsilon$ sample complexity ($\\varepsilon$SC). To compare our methods to prior work, we also present a framework based on plotting the validation loss versus dataset size (the \"loss-data\" curve). Existing measures, such as mutual information and minimum description length, correspond to slices and integrals along the data-axis of the loss-data curve, while ours correspond to slices and integrals along the loss-axis. This analysis shows that prior methods measure properties of an evaluation dataset of a specified size, whereas our methods measure properties of a predictor with a specified loss. We conclude with experiments on real data to compare the behavior of these methods over datasets of varying size.", "keywords": "representation learning;representation evaluation;unsupervised learning;self-supervised learning", "primary_area": "", "supplementary_material": "/attachment/d9684bc189502d5980a66ce9532330f5130ba53e.zip", "author": "William F Whitney;Min Jae Song;David Brandfonbrener;Jaan Altosaar;Kyunghyun Cho", "authorids": "~William_F_Whitney1;~Min_Jae_Song1;~David_Brandfonbrener1;~Jaan_Altosaar1;~Kyunghyun_Cho1", "gender": ";M;M;M;M", "homepage": "http://willwhitney.com;https://mjsong32.github.io/;https://davidbrandfonbrener.github.io;http://jaan.io;http://kyunghyuncho.me", "dblp": "160/8671;169/9994;214/9461;http://dblp.uni-trier.de/pers/hd/a/Altosaar:Jaan;41/9736", "google_scholar": "aQcYWDMAAAAJ;6TIktJgAAAAJ;https://scholar.google.com/citations?hl=en;95Q3cPQAAAAJ;https://scholar.google.fi/citations?user=0RAmmIAAAAAJ", "orcid": ";;;0000-0003-1294-4159;", "linkedin": ";;;jaanaltosaar;", "or_profile": "~William_F_Whitney1;~Min_Jae_Song1;~David_Brandfonbrener1;~Jaan_Altosaar1;~Kyunghyun_Cho1", "aff": "New York University;New York University;New York University;Columbia University;New York University", "aff_domain": "nyu.edu;nyu.edu;nyu.edu;columbia.edu;nyu.edu", "position": "PhD student;PhD student;PhD student;Officer of Research;Associate Professor", "bibtex": "@misc{\nwhitney2021evaluating,\ntitle={Evaluating representations by the complexity of learning low-loss predictors},\nauthor={William F Whitney and Min Jae Song and David Brandfonbrener and Jaan Altosaar and Kyunghyun Cho},\nyear={2021},\nurl={https://openreview.net/forum?id=FOR2VqgJXb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=FOR2VqgJXb", "pdf_size": 0, "rating": "4;4;7", "confidence": "3;3;4", "wc_review": "156;311;379", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "426;663;335", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 1.4142135623730951 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 282.0, 93.32023717643814 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 474.6666666666667, 138.25660522697962 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 28, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16426044868989791183&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0;0;1;0", "aff_unique_norm": "New York University;Columbia University", "aff_unique_dep": ";", "aff_unique_url": "https://www.nyu.edu;https://www.columbia.edu", "aff_unique_abbr": "NYU;Columbia", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "A Critique of Self-Expressive Deep Subspace Clustering", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2688", "id": "FOyuZ26emy", "poster": "", "openreview": "https://openreview.net/forum?id=FOyuZ26emy", "slides": "https://iclr.cc/virtual/2021/poster/2688", "video": "https://iclr.cc/virtual/2021/poster/2688", "author_site": "Benjamin Haeffele, Chong You, Rene Vidal", "tldr": "", "abstract": "Subspace clustering is an unsupervised clustering technique designed to cluster data that is supported on a union of linear subspaces, with each subspace defining a cluster with dimension lower than the ambient space. Many existing formulations for this problem are based on exploiting the self-expressive property of linear subspaces, where any point within a subspace can be represented as linear combination of other points within the subspace. To extend this approach to data supported on a union of non-linear manifolds, numerous studies have proposed learning an embedding of the original data using a neural network which is regularized by a self-expressive loss function on the data in the embedded space to encourage a union of linear subspaces prior on the data in the embedded space. Here we show that there are a number of potential flaws with this approach which have not been adequately addressed in prior work. In particular, we show the model formulation is often ill-posed in that it can lead to a degenerate embedding of the data, which need not correspond to a union of subspaces at all and is poorly suited for clustering. We validate our theoretical results experimentally and also repeat prior experiments reported in the literature, where we conclude that a significant portion of the previously claimed performance benefits can be attributed to an ad-hoc post processing step rather than the deep subspace clustering model.", "keywords": "Subspace clustering;Manifold clustering;Theory of deep learning;Autoencoders", "primary_area": "", "supplementary_material": "", "author": "Benjamin David Haeffele;Chong You;Rene Vidal", "authorids": "~Benjamin_David_Haeffele1;~Chong_You2;~Rene_Vidal1", "gender": ";M;", "homepage": ";https://sites.google.com/view/cyou;http://www.vision.jhu.edu", "dblp": ";164/7311;v/ReneVidal", "google_scholar": ";Mfrpm_IAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;", "linkedin": ";;rene-vidal-74844928/", "or_profile": "~Benjamin_David_Haeffele1;~Chong_You2;~Rene_Vidal1", "aff": ";University of California, Berkeley;Johns Hopkins University", "aff_domain": ";berkeley.edu;jhu.edu", "position": ";Postdoc;Professor", "bibtex": "@inproceedings{\nhaeffele2021a,\ntitle={A Critique of Self-Expressive Deep Subspace Clustering},\nauthor={Benjamin David Haeffele and Chong You and Rene Vidal},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=FOyuZ26emy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "3;4;4;3", "wc_review": "192;485;170;327", "wc_reply_reviewers": "0;46;0;0", "wc_reply_authors": "222;1027;11;455", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 293.5, 125.84613621402923 ], "wc_reply_reviewers_avg": [ 11.5, 19.91858428704209 ], "wc_reply_authors_avg": [ 428.75, 379.4248113921914 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 40, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17497727743010378755&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=FOyuZ26emy", "email": ";berkeley.edu;jhu.edu", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of California, Berkeley;Johns Hopkins University", "aff_unique_dep": ";", "aff_unique_url": "https://www.berkeley.edu;https://www.jhu.edu", "aff_unique_abbr": "UC Berkeley;JHU", "aff_campus_unique_index": "0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "FP9kKyNWwwE", "title": "Zero-shot Transfer Learning for Gray-box Hyper-parameter Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Zero-shot hyper-parameter optimization refers to the process of selecting hyper- parameter configurations that are expected to perform well for a given dataset upfront, without access to any observations of the losses of the target response. Existing zero-shot approaches are posed as initialization strategies for Bayesian Optimization and they often rely on engineered meta-features to measure dataset similarity, operating under the assumption that the responses of similar datasets behaves similarly with respect to the same hyper-parameters. Solutions for zero- shot HPO are embarrassingly parallelizable and thus can reduce vastly the required wallclock time of learning a single model. We propose a very simple HPO model called Gray-box Zero(0)-Shot Initialization (GROSI) as a conditional parametric surrogate that learns a universal response model by exploiting the relationship between the hyper-parameters and the dataset meta-features directly. In contrast to existing HPO solutions, we achieve transfer of knowledge without engineered meta- features, but rather through a shared model that is trained simultaneously across all datasets. We design and optimize a novel loss function that allows us to regress from the dataset/hyper-parameter pair unto the response. Experiments on 120 datasets demonstrate the strong performance of GROSI, compared to conventional initialization strategies. We also show that by fine-tuning GROSI to the target dataset, we can outperform state-of-the-art sequential HPO algorithms.", "keywords": "Hyper-parameter Optimization;Transfer Learning;Meta-learning", "primary_area": "", "supplementary_material": "/attachment/ba8d6cf45f1a4c4a9c6f3ded1f6a6ba9211975dd.zip", "author": "Hadi Samer Jomaa;Lars Schmidt-Thieme;Josif Grabocka", "authorids": "~Hadi_Samer_Jomaa1;~Lars_Schmidt-Thieme1;~Josif_Grabocka1", "gender": "M;M;M", "homepage": "https://www.ismll.uni-hildesheim.de/personen/hsjomaa.html;https://www.ismll.uni-hildesheim.de/personen/lst_en.html;https://www.utn.de/departments/department-engineering/machine-learning-lab/", "dblp": ";s/LarsSchmidtThieme;117/4936", "google_scholar": "QLSZWNkAAAAJ;https://scholar.google.de/citations?user=l3taTdYAAAAJ;KRy27XcAAAAJ", "orcid": ";0000-0001-5729-6023;", "linkedin": "hadisamerjomaa/;;", "or_profile": "~Hadi_Samer_Jomaa1;~Lars_Schmidt-Thieme1;~Josif_Grabocka1", "aff": "University of Hildesheim;University of Hildesheim;Universit\u00e4t Freiburg", "aff_domain": "uni-hildesheim.de;uni-hildesheim.de;uni-freiburg.de", "position": "PhD student;Full Professor;Assistant Professor", "bibtex": "@misc{\njomaa2021zeroshot,\ntitle={Zero-shot Transfer Learning for Gray-box Hyper-parameter Optimization},\nauthor={Hadi Samer Jomaa and Lars Schmidt-Thieme and Josif Grabocka},\nyear={2021},\nurl={https://openreview.net/forum?id=FP9kKyNWwwE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=FP9kKyNWwwE", "pdf_size": 0, "rating": "4;6;6;6;7", "confidence": "4;3;3;3;3", "wc_review": "875;195;201;278;592", "wc_reply_reviewers": "563;0;0;217;0", "wc_reply_authors": "1835;296;260;771;313", "reply_reviewers": "2;0;0;1;0", "reply_authors": "4;1;1;2;1", "rating_avg": [ 5.8, 0.9797958971132712 ], "confidence_avg": [ 3.2, 0.39999999999999997 ], "wc_review_avg": [ 428.2, 266.46680843962537 ], "wc_reply_reviewers_avg": [ 156.0, 220.17175113987716 ], "wc_reply_authors_avg": [ 695.0, 599.9543315953307 ], "reply_reviewers_avg": [ 0.6, 0.7999999999999999 ], "reply_authors_avg": [ 1.8, 1.1661903789690602 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9185586535436918, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Bvzjb55qrKMJ:scholar.google.com/&scioq=Zero-shot+Transfer+Learning+for+Gray-box+Hyper-parameter+Optimization&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of Hildesheim;University of Freiburg", "aff_unique_dep": ";", "aff_unique_url": "https://www.uni-hildesheim.de/;https://www.uni-freiburg.de", "aff_unique_abbr": ";Uni Freiburg", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Germany" }, { "id": "FPpZrRfz6Ss", "title": "To Learn Effective Features: Understanding the Task-Specific Adaptation of MAML", "track": "main", "status": "Reject", "tldr": "", "abstract": "Meta learning, an effective way for learning unseen tasks with few samples, is an important research\narea in machine learning.\nModel Agnostic Meta-Learning~(MAML)~(\\cite{finn2017model}) is one of the most well-known gradient-based meta learning algorithms, that learns\nthe meta-initialization through the inner and outer optimization loop.\nThe inner loop is to perform fast adaptation in several gradient update steps with the support datapoints, \nwhile the outer loop to generalize the updated model to the query datapoints.\nRecently, it has been argued that instead of rapid learning and adaptation, the learned meta-initialization through MAML\nhas already absorbed the high-quality features prior, where the task-specific head at training \nfacilitates the feature learning.\nIn this work, we investigate the impact of the task-specific adaptation of MAML and discuss the general formula for\nother gradient-based and metric-based meta-learning approaches.\nFrom our analysis, we further devise the Random Decision Planes~(RDP) algorithm to find a suitable linear classifier\nwithout any gradient descent step and the Meta Contrastive Learning~(MCL) algorithm to exploit the inter-samples relationship\ninstead of the expensive inner-loop adaptation. \nWe conduct sufficient experiments on various datasets to explore our proposed algorithms.", "keywords": "Meta-Learning;Few-Shot Learning;Meta-initialization;Task-specific Adaptation", "primary_area": "", "supplementary_material": "/attachment/12961dd2ce60145db21a15cbd61bb4ef8f8a8289.zip", "author": "Zhijie Lin;Zhou Zhao;Zhu Zhang;Huai Baoxing;Jing Yuan", "authorids": "~Zhijie_Lin1;~Zhou_Zhao2;~Zhu_Zhang3;huaibaoxing@huawei.com;nicholas.yuan@huawei.com", "gender": "M;M;M;;", "homepage": ";https://dblp.uni-trier.de/pid/75/7785.html?;;;", "dblp": ";75/7785;;;", "google_scholar": "xXMj6_EAAAAJ;https://scholar.google.com.hk/citations?user=IIoFY90AAAAJ;https://scholar.google.com.hk/citations?user=cjWy38wAAAAJ;;", "orcid": "0000-0003-3461-8952;0000-0001-6121-0384;;;", "linkedin": ";;;;", "or_profile": "~Zhijie_Lin1;~Zhou_Zhao2;~Zhu_Zhang3;huaibaoxing@huawei.com;nicholas.yuan@huawei.com", "aff": "Zhejiang University;Zhejiang University;Zhejiang University;;", "aff_domain": "zju.edu.cn;zju.edu.cn;zju.edu.cn;;", "position": "MS student;Associate Professor;MS student;;", "bibtex": "@misc{\nlin2021to,\ntitle={To Learn Effective Features: Understanding the Task-Specific Adaptation of {\\{}MAML{\\}}},\nauthor={Zhijie Lin and Zhou Zhao and Zhu Zhang and Huai Baoxing and Jing Yuan},\nyear={2021},\nurl={https://openreview.net/forum?id=FPpZrRfz6Ss}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=FPpZrRfz6Ss", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "5;3;3;4", "wc_review": "361;597;1110;801", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "585;664;1599;814", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;3;1", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 717.25, 275.0639698324737 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 915.5, 403.0995534604324 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.6363636363636364, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9924037513447047777&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Zhejiang University", "aff_unique_dep": "", "aff_unique_url": "https://www.zju.edu.cn", "aff_unique_abbr": "ZJU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "FTit3PiAw4", "title": "Training Federated GANs with Theoretical Guarantees: A Universal Aggregation Approach", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recently, Generative Adversarial Networks (GANs) have demonstrated their potential in federated learning, i.e., learning a centralized model from data privately hosted by multiple sites. A federated GAN jointly trains a centralized generator and multiple private discriminators hosted at different sites. A major theoretical challenge for the federated GAN is the heterogeneity of the local data distributions. Traditional approaches cannot guarantee to learn the target distribution, which is a mixture of the highly different local distributions. This paper tackles this theoretical challenge, and for the first time, provides a provably correct framework for federated GAN. We propose a new approach called Universal Aggregation, which simulates a centralized discriminator via carefully aggregating the mixture of all private discriminators. We prove that a generator trained with this simulated centralized discriminator can learn the desired target distribution. Through synthetic and real datasets, we show that our method can learn the mixture of largely different distributions, when existing federated GAN methods fail to.", "keywords": "Federated Learning;GAN;Deep Learning", "primary_area": "", "supplementary_material": "", "author": "Yikai Zhang;Hui Qu;Huidong Liu;Qi Chang;Dimitris N. Metaxas;Chao Chen", "authorids": "~Yikai_Zhang1;~Hui_Qu1;~Huidong_Liu1;~Qi_Chang1;~Dimitris_N._Metaxas1;~Chao_Chen1", "gender": ";M;M;M;;M", "homepage": ";;https://harryliew.github.io/;https://www.linkedin.com/in/tommy-qichang/;;https://chaochen.github.io/", "dblp": ";;174/9885;;;66/3019-12", "google_scholar": ";47vBQD4nVCoC;https://scholar.google.com/citations?hl=en;;;J-iIIFAAAAAJ", "orcid": ";;;;;0000-0003-1703-6483", "linkedin": ";hui-qu/;;tommy-qichang/;;", "or_profile": "~Yikai_Zhang1;~Hui_Qu1;~Huidong_Liu1;~Qi_Chang1;~Dimitris_N._Metaxas1;~Chao_Chen1", "aff": ";Rutgers University;State University of New York, Stony Brook;Rutgers University;;State University of New York, Stony Brook", "aff_domain": ";rutgers.edu;stonybrook.edu;rutgers.edu;;stonybrook.edu", "position": ";PhD student;PhD student;PhD student;;Assistant Professor", "bibtex": "@misc{\nzhang2021training,\ntitle={Training Federated {\\{}GAN{\\}}s with Theoretical Guarantees: A Universal Aggregation Approach},\nauthor={Yikai Zhang and Hui Qu and Huidong Liu and Qi Chang and Dimitris N. Metaxas and Chao Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=FTit3PiAw4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=FTit3PiAw4", "pdf_size": 0, "rating": "3;5;6;6", "confidence": "3;4;4;4", "wc_review": "685;91;327;428", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "889;265;269;489", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 382.75, 213.08962316358813 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 478.0, 254.0137791538089 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.9428090415820632, "gs_citation": 21, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17054528838446539672&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;0;1", "aff_unique_norm": "Rutgers University;State University of New York", "aff_unique_dep": ";", "aff_unique_url": "https://www.rutgers.edu;https://www.stonybrook.edu", "aff_unique_abbr": "Rutgers;SUNY Stony Brook", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Stony Brook", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "FU5IpSznDKd", "title": "Ranking Neural Checkpoints", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task. Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints from various sources. Which of them transfers the best to our downstream task of interest? Striving to answer this question thoroughly, we establish a neural checkpoint ranking benchmark (\\benchmark) and study some intuitive ranking measures. These measures are generic, applying to the checkpoints of different output types without knowing how the checkpoints are pre-trained on which dataset. They also incur low computation costs, making them practically meaningful. Our results suggest that the linear separability of the features extracted by the checkpoints is a strong indicator of transferability. We also arrive at a new ranking measure, $\\mathcal{N}$LEEP, which gives rise to the best performance in the experiments.", "keywords": "checkpoint selection;transfer learning;task transferability;network generalization prediction", "primary_area": "", "supplementary_material": "", "author": "YANDONG LI;Xuhui Jia;Ruoxin Sang;Yukun Zhu;Bradley Green;Liqiang Wang;Boqing Gong", "authorids": "~YANDONG_LI1;~Xuhui_Jia1;rxsang@google.com;~Yukun_Zhu1;~Bradley_Green3;~Liqiang_Wang1;~Boqing_Gong1", "gender": "M;M;;M;;M;M", "homepage": "https://cold-winter.github.io/;https://scholar.google.com/citations?view_op=search_authors&mauthors=xuhui+jia&hl=en&oi=ao;;;;http://www.cs.ucf.edu/~lwang;http://boqinggong.info", "dblp": ";116/8360;;18/10777;;;29/7457", "google_scholar": "kRLb6PkAAAAJ;https://scholar.google.com/citations?view_op=search_authors;;;;mZKxB10AAAAJ;lv9ZeVUAAAAJ", "orcid": "0000-0003-2448-1294;;;;;;", "linkedin": ";;;;brad-green-b0247915/;;boqing-gong-46aa5821/", "or_profile": "~YANDONG_LI1;~Xuhui_Jia1;rxsang@google.com;~Yukun_Zhu1;~Bradley_Green3;~Liqiang_Wang1;~Boqing_Gong1", "aff": "University of Central Florida;Google;;Google;Google;University of Central Florida;Google", "aff_domain": "ucf.edu;google.com;;google.com;google.com;ucf.edu;google.com", "position": "PhD student;Researcher;;SWE;Director of Research;Full Professor;Research Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=FU5IpSznDKd", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "3;5;4;4", "wc_review": "478;466;282;353", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 394.75, 81.33687663046817 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 61, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3088841370184707243&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;1;1;0;1", "aff_unique_norm": "University of Central Florida;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.ucf.edu;https://www.google.com", "aff_unique_abbr": "UCF;Google", "aff_campus_unique_index": "1;1;1;1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "FUdBF49WRV1", "title": "Directional graph networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "In order to overcome the expressive limitations of graph neural networks (GNNs), we propose the first method that exploits vector flows over graphs to develop globally consistent directional and asymmetric aggregation functions. \nWe show that our directional graph networks (DGNs) generalize convolutional neural networks (CNNs) when applied on a grid. Whereas recent theoretical works focus on understanding local neighbourhoods, local structures and local isomorphism with no global information flow, our novel theoretical framework allows directional convolutional kernels in any graph.\nFirst, by defining a vector field in the graph, we develop a method of applying directional derivatives and smoothing by projecting node-specific messages into the field. \nThen we propose the use of the Laplacian eigenvectors as such vector field, and we show that the method generalizes CNNs on an $n$-dimensional grid, and is provably more discriminative than standard GNNs regarding the Weisfeiler-Lehman 1-WL test.\nFinally, we bring the power of CNN data augmentation to graphs by providing a means of doing reflection, rotation and distortion on the underlying directional field. We evaluate our method on different standard benchmarks and see a relative error reduction of 8% on the CIFAR10 graph dataset and 11% to 32% on the molecular ZINC dataset. An important outcome of this work is that it enables to translate any physical or biological problems with intrinsic directional axes into a graph network formalism with an embedded directional field. ", "keywords": "graph;neural networks;deep learning;spectral theory;directional aggregation;over-smoothing", "primary_area": "", "supplementary_material": "", "author": "Dominique Beaini;Saro Passaro;Vincent Letourneau;William L. Hamilton;Gabriele Corso;Pietro Li\u00f2", "authorids": "~Dominique_Beaini1;sp976@cam.ac.uk;vincent@invivoai.com;~William_L._Hamilton1;gc579@cam.ac.uk;~Pietro_Li\u00f21", "gender": "M;;;;;", "homepage": ";;;;;", "dblp": "201/8526;;;137/3314;;l/PietroLio", "google_scholar": "https://scholar.google.ca/citations?hl=en;;;;;", "orcid": "0000-0002-4613-9388;;;;;", "linkedin": "dbeaini/;;;;;", "or_profile": "~Dominique_Beaini1;sp976@cam.ac.uk;vincent@invivoai.com;~William_L._Hamilton1;gc579@cam.ac.uk;~Pietro_Li\u00f21", "aff": "Valence Discovery;;;McGill University;;", "aff_domain": "valencediscovery.com;;;mcgill.ca;;", "position": "Principal Researcher;;;Assistant Professor;;", "bibtex": "@misc{\nbeaini2021directional,\ntitle={Directional graph networks},\nauthor={Dominique Beaini and Saro Passaro and Vincent Letourneau and William L. Hamilton and Gabriele Corso and Pietro Li{\\`o}},\nyear={2021},\nurl={https://openreview.net/forum?id=FUdBF49WRV1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=FUdBF49WRV1", "pdf_size": 0, "rating": "4;5;5;7", "confidence": "5;3;2;4", "wc_review": "305;1310;368;601", "wc_reply_reviewers": "0;325;0;0", "wc_reply_authors": "792;935;435;515", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.25, 1.0897247358851685 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 646.0, 398.90036349945836 ], "wc_reply_reviewers_avg": [ 81.25, 140.72912811497127 ], "wc_reply_authors_avg": [ 669.25, 202.7046805083691 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.10259783520851541, "gs_citation": 225, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6256455976929564913&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1", "aff_unique_norm": "Valence Discovery;McGill University", "aff_unique_dep": ";", "aff_unique_url": ";https://www.mcgill.ca", "aff_unique_abbr": ";McGill", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "1", "aff_country_unique": ";Canada" }, { "id": "FUtMxDTJ_h", "title": "Symmetry Control Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper continues the quest for designing the optimal physics bias for neural networks predicting the dynamics of systems when the underlying dynamics shall be inferred from the data directly. The description of physical systems is greatly simplified when the underlying symmetries of the system are taken into account. In classical systems described via Hamiltonian dynamics this is achieved by using appropriate coordinates, so-called cyclic coordinates, which reveal conserved quantities directly. Without changing the Hamiltonian, these coordinates can be obtained via canonical transformations. We show that such coordinates can be searched for automatically with appropriate loss functions which naturally arise from Hamiltonian dynamics. As a proof of principle, we test our method on standard classical physics systems using synthetic and experimental data where our network identifies the conserved quantities in an unsupervised way and find improved performance on predicting the dynamics of the system compared to networks biasing just to the Hamiltonian. Effectively, these new coordinates guarantee that motion takes place on symmetry orbits in phase space, i.e.~appropriate lower dimensional sub-spaces of phase space. By fitting analytic formulae we recover that our networks are utilising conserved quantities such as (angular) momentum.", "keywords": "Inductive (symmetry) Bias;Predictive Models;Hamiltonian Dynamics;Physics", "primary_area": "", "supplementary_material": "/attachment/0bc78e7858a73b7249304794336df135f123fb59.zip", "author": "Marc Syvaeri;Sven Krippendorf", "authorids": "~Marc_Syvaeri1;~Sven_Krippendorf1", "gender": ";M", "homepage": ";https://www.theorie.physik.uni-muenchen.de/lsluest/members/asc/sci_mem/krippendorf_sven/index.html", "dblp": ";", "google_scholar": ";kHK80lUAAAAJ", "orcid": ";", "linkedin": "marc-syv%C3%A4ri-4283ab126/;", "or_profile": "~Marc_Syvaeri1;~Sven_Krippendorf1", "aff": "University of Munich, Institut f\u00fcr Physik;LMU Munich", "aff_domain": "campus.lmu.de;lmu.de", "position": "PhD student;Postdoc", "bibtex": "@misc{\nsyvaeri2021symmetry,\ntitle={Symmetry Control Neural Networks},\nauthor={Marc Syvaeri and Sven Krippendorf},\nyear={2021},\nurl={https://openreview.net/forum?id=FUtMxDTJ_h}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=FUtMxDTJ_h", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;3;4;5", "wc_review": "228;582;213;715", "wc_reply_reviewers": "0;0;0;46", "wc_reply_authors": "532;651;452;388", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 434.5, 219.16945498860008 ], "wc_reply_reviewers_avg": [ 11.5, 19.91858428704209 ], "wc_reply_authors_avg": [ 505.75, 98.15899092798377 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:nfFv4he4BSwJ:scholar.google.com/&scioq=Symmetry+Control+Neural+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Munich;Ludwig Maximilian University of Munich", "aff_unique_dep": "Institut f\u00fcr Physik;", "aff_unique_url": "https://www.uni-muenchen.de;https://www.lmu.de", "aff_unique_abbr": "LMU;LMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Munich", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "FVhZIBWqykk", "title": "Resurrecting Submodularity for Neural Text Generation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Submodularity is a desirable property for a variety of objectives in content selection where the current neural encoder-decoder framework is inadequate. We define a class of novel attention mechanisms with submodular functions and in turn, prove the submodularity of the effective neural coverage. The resulting attention module offers an architecturally simple and empirically effective method to improve the coverage of neural text generation. We run experiments on three directed text generation tasks with different levels of recovering rate, across two modalities, three different neural model architectures and two training strategy variations. The results and analyses demonstrate that our method generalizes well across these settings, produces texts of good quality, outperforms comparable baselines and achieves state-of-the-art performance. ", "keywords": "submodularity;text generation;attention", "primary_area": "", "supplementary_material": "", "author": "SIMENG HAN;Xiang Lin;Shafiq Joty", "authorids": "~SIMENG_HAN1;~Xiang_Lin2;~Shafiq_Joty1", "gender": "F;M;M", "homepage": "https://shirleyhan6.github.io/;https://shawnlimn.github.io;https://raihanjoty.github.io/", "dblp": ";29/6347;62/2078", "google_scholar": "D0dpploAAAAJ;R4ZlMwIAAAAJ;hR249csAAAAJ", "orcid": ";;", "linkedin": "simeng-sophia-han-746135159/;;", "or_profile": "~SIMENG_HAN1;~Xiang_Lin2;~Shafiq_Joty1", "aff": "Nanyang Technological University;Nanyang Technological University;Nanyang Technological University", "aff_domain": "ntu.edu;ntu.edu.sg;ntu.edu.sg", "position": "Undergrad student;PhD student;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=FVhZIBWqykk", "pdf_size": 0, "rating": "3;4;6;6", "confidence": "4;4;3;3", "wc_review": "646;385;381;212", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.299038105676658 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 406.0, 155.16281771094518 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9622504486493761, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=81257007375256021&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Nanyang Technological University", "aff_unique_dep": "", "aff_unique_url": "https://www.ntu.edu.sg", "aff_unique_abbr": "NTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Singapore" }, { "title": "Isometric Transformation Invariant and Equivariant Graph Convolutional Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3192", "id": "FX0vR39SJ5q", "poster": "", "openreview": "https://openreview.net/forum?id=FX0vR39SJ5q", "slides": "https://iclr.cc/virtual/2021/poster/3192", "video": "https://iclr.cc/virtual/2021/poster/3192", "author_site": "Masanobu Horie, Naoki Morita, Toshiaki Hishinuma, Yu Ihara, Naoto Mitsume", "tldr": "", "abstract": "Graphs are one of the most important data structures for representing pairwise relations between objects. Specifically, a graph embedded in a Euclidean space is essential to solving real problems, such as physical simulations. A crucial requirement for applying graphs in Euclidean spaces to physical simulations is learning and inferring the isometric transformation invariant and equivariant features in a computationally efficient manner. In this paper, we propose a set of transformation invariant and equivariant models based on graph convolutional networks, called IsoGCNs. We demonstrate that the proposed model has a competitive performance compared to state-of-the-art methods on tasks related to geometrical and physical simulation data. Moreover, the proposed model can scale up to graphs with 1M vertices and conduct an inference faster than a conventional finite element analysis, which the existing equivariant models cannot achieve.", "keywords": "Machine Learning;Graph Neural Network;Invariance;Equivariance;Simulation;Mesh", "primary_area": "", "supplementary_material": "/attachment/ef3f7b520cfdc434ce7f780bdede1c39abd179b8.zip", "author": "Masanobu Horie;Naoki Morita;Toshiaki Hishinuma;Yu Ihara;Naoto Mitsume", "authorids": "~Masanobu_Horie1;morita@ricos.co.jp;hishinuma@ricos.co.jp;ihara@ricos.co.jp;mitsume@kz.tsukuba.ac.jp", "gender": "M;;;;", "homepage": "https://yellowshippo.github.io/;;;;", "dblp": "264/9957;;;;", "google_scholar": "https://scholar.google.com/citations?hl=en;;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Masanobu_Horie1;morita@ricos.co.jp;hishinuma@ricos.co.jp;ihara@ricos.co.jp;mitsume@kz.tsukuba.ac.jp", "aff": "University of Tsukuba;;;;", "aff_domain": "tsukuba.ac.jp;;;;", "position": "PhD student;;;;", "bibtex": "@inproceedings{\nhorie2021isometric,\ntitle={Isometric Transformation Invariant and Equivariant Graph Convolutional Networks},\nauthor={Masanobu Horie and Naoki Morita and Toshiaki Hishinuma and Yu Ihara and Naoto Mitsume},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=FX0vR39SJ5q}\n}", "github": "[![github](/images/github_icon.svg) yellowshippo/isogcn-iclr2021](https://github.com/yellowshippo/isogcn-iclr2021)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer5", "pdf_size": 0, "rating": "5;6;7", "confidence": "3;3;4", "wc_review": "187;443;966", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "780;726;1153", "reply_reviewers": "0;0;0", "reply_authors": "1;1;2", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 532.0, 324.1923297468135 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 886.3333333333334, 189.84613653049556 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 41, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8837825832802039712&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=FX0vR39SJ5q", "email": "tsukuba.ac.jp;;;;", "author_num": 5, "aff_unique_index": "0", "aff_unique_norm": "University of Tsukuba", "aff_unique_dep": "", "aff_unique_url": "https://www.tsukuba.ac.jp", "aff_unique_abbr": "UT", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "title": "Optimal Conversion of Conventional Artificial Neural Networks to Spiking Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2644", "id": "FZ1oTwcXchK", "poster": "", "openreview": "https://openreview.net/forum?id=FZ1oTwcXchK", "slides": "https://iclr.cc/virtual/2021/poster/2644", "video": "https://iclr.cc/virtual/2021/poster/2644", "author_site": "Shikuang Deng, Shi Gu", "tldr": "", "abstract": "Spiking neural networks (SNNs) are biology-inspired artificial neural networks (ANNs) that comprise of spiking neurons to process asynchronous discrete signals. While more efficient in power consumption and inference speed on the neuromorphic hardware, SNNs are usually difficult to train directly from scratch with spikes due to the discreteness. As an alternative, many efforts have been devoted to converting conventional ANNs into SNNs by copying the weights from ANNs and adjusting the spiking threshold potential of neurons in SNNs. Researchers have designed new SNN architectures and conversion algorithms to diminish the conversion error. However, an effective conversion should address the difference between the SNN and ANN architectures with an efficient approximation of the loss function, which is missing in the field. In this work, we analyze the conversion error by recursive reduction to layer-wise summation and propose a novel strategic pipeline that transfers the weights to the target SNN by combining threshold balance and soft-reset mechanisms. This pipeline enables almost no accuracy loss between the converted SNNs and conventional ANNs with only $\\sim1/10$ of the typical SNN simulation time. Our method is promising to get implanted onto embedded platforms with better support of SNNs with limited energy and memory. Codes are available at https://github.com/Jackn0/snn_optimal_conversion_pipeline.", "keywords": "spiking neural network;weight balance;second-order approximation", "primary_area": "", "supplementary_material": "", "author": "Shikuang Deng;Shi Gu", "authorids": "~Shikuang_Deng1;~Shi_Gu1", "gender": "M;", "homepage": "https://www.guslab.org/;https://nangongwubu.github.io/", "dblp": "286/8188;175/1269", "google_scholar": "rtlmA3gAAAAJ;9_jlOXUAAAAJ", "orcid": ";0000-0003-2303-6770", "linkedin": ";", "or_profile": "~Shikuang_Deng1;~Shi_Gu1", "aff": "University of Electronic Science and Technology of China;University of Electronic Science and Technology of China, Tsinghua University", "aff_domain": "uestc.edu.cn;uestc.edu.cn", "position": "PhD student;Full Professor", "bibtex": "@inproceedings{\ndeng2021optimal,\ntitle={Optimal Conversion of Conventional Artificial Neural Networks to Spiking Neural Networks},\nauthor={Shikuang Deng and Shi Gu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=FZ1oTwcXchK}\n}", "github": "[![github](/images/github_icon.svg) Jackn0/snn_optimal_conversion_pipeline](https://github.com/Jackn0/snn_optimal_conversion_pipeline)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "5;7;7", "confidence": "3;3;3", "wc_review": "511;144;276", "wc_reply_reviewers": "495;0;67", "wc_reply_authors": "1208;482;502", "reply_reviewers": "1;0;1", "reply_authors": "2;1;2", "rating_avg": [ 6.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 310.3333333333333, 151.7812753786038 ], "wc_reply_reviewers_avg": [ 187.33333333333334, 219.26594709520117 ], "wc_reply_authors_avg": [ 730.6666666666666, 337.6243803729556 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 271, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1643416764815138161&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=FZ1oTwcXchK", "email": "uestc.edu.cn;uestc.edu.cn", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Electronic Science and Technology of China", "aff_unique_dep": "", "aff_unique_url": "https://www.uestc.edu.cn", "aff_unique_abbr": "UESTC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "F_txysyDFbw", "title": "Online Limited Memory Neural-Linear Bandits", "track": "main", "status": "Reject", "tldr": "", "abstract": "We study neural-linear bandits for solving problems where both exploration and representation learning play an important role. Neural-linear bandits leverage the representation power of deep neural networks and combine it with efficient exploration mechanisms, designed for linear contextual bandits, on top of the last hidden layer. Since the representation is optimized during learning, information regarding exploration with \u201cold\u201d features is lost. We propose the first limited memory neural- linear bandit that is resilient to this catastrophic forgetting phenomenon by solving a semi-definite program. We then approximate the semi-definite program using stochastic gradient descent to make the algorithm practical and adjusted for online usage. We perform simulations on a variety of data sets, including regression, classification, and sentiment analysis. In addition, we evaluate our algorithm in a challenging uplink rate-control application. The bandit controls the transmission rates of data segments over cellular links to achieve optimal throughput. We observe that our algorithm achieves superior performance and shows resilience to catastrophic forgetting.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Tom Zahavy;Ofir Nabati;Leor Cohen;Shie Mannor", "authorids": "~Tom_Zahavy2;~Ofir_Nabati1;liorcohen5@gmail.com;~Shie_Mannor2", "gender": "M;M;;M", "homepage": "http://tomzahavy.wixsite.com/zahavy;https://rlrl.net.technion.ac.il/;;https://shie.net.technion.ac.il", "dblp": "149/0142;;;20/1669", "google_scholar": "https://scholar.google.co.il/citations?user=9dXN6cMAAAAJ;;;https://scholar.google.com.tw/citations?user=q1HlbIUAAAAJ", "orcid": ";;;", "linkedin": "tomzahavy/;;;", "or_profile": "~Tom_Zahavy2;~Ofir_Nabati1;liorcohen5@gmail.com;~Shie_Mannor2", "aff": "Google DeepMind;Technion - Israel Institute of Technology, Technion;;Technion - Israel Institute of Technology, Technion", "aff_domain": "deepmind.com;campus.technion;;technion.il", "position": "Research Scientist;PhD student;;Full Professor", "bibtex": "@misc{\nzahavy2021online,\ntitle={Online Limited Memory Neural-Linear Bandits},\nauthor={Tom Zahavy and Ofir Nabati and Leor Cohen and Shie Mannor},\nyear={2021},\nurl={https://openreview.net/forum?id=F_txysyDFbw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=F_txysyDFbw", "pdf_size": 0, "rating": "3;5;5", "confidence": "4;3;3", "wc_review": "503;673;194", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 456.6666666666667, 198.27646243454 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.9999999999999998, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:T8zfZMjVu40J:scholar.google.com/&scioq=Online+Limited+Memory+Neural-Linear+Bandits&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1", "aff_unique_norm": "Google;Technion - Israel Institute of Technology", "aff_unique_dep": "Google DeepMind;", "aff_unique_url": "https://deepmind.com;https://www.technion.ac.il", "aff_unique_abbr": "DeepMind;Technion", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1", "aff_country_unique": "United Kingdom;Israel" }, { "id": "Fa3a14yX8zA", "title": "Joint Descent: Training and Tuning Simultaneously", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Typically in machine learning, training and tuning are done in an alternating manner: for a fixed set of hyperparameters $y$, we apply gradient descent to our objective $f(x, y)$ over trainable variables $x$ until convergence; then, we apply a tuning step over $y$ to find another promising setting of hyperparameters. Because the full training cycle is completed before a tuning step is applied, the optimization procedure greatly emphasizes the gradient step, which seems justified as first-order methods provides a faster convergence rate. In this paper, we argue that an equal emphasis on training and tuning lead to faster convergence both theoretically and empirically. We present Joint Descent (JD) and a novel theoretical analysis of acceleration via an unbiased gradient estimate to give an optimal iteration complexity of $O(\\sqrt{\\kappa}n_y\\log(n/\\epsilon))$, where $\\kappa$ is the condition number and $n_y$ is the dimension of $y$. This provably improves upon the naive classical bound and implies that we essentially train for free if we apply equal emphasis on training and tuning steps. Empirically, we observe that an unbiased gradient estimate achieves the best convergence results, supporting our theory. ", "keywords": "First Order Optimization;Zeroth Order Optimization", "primary_area": "", "supplementary_material": "/attachment/987a23fb177e7871516408d349e275e7611f37cc.zip", "author": "Qiuyi Zhang", "authorids": "~Qiuyi_Zhang1", "gender": "M", "homepage": "https://qiuyiz.github.io", "dblp": "133/8559", "google_scholar": "mE11hO8AAAAJ", "orcid": "", "linkedin": "", "or_profile": "~Qiuyi_Zhang1", "aff": "Google", "aff_domain": "google.com", "position": "Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=Fa3a14yX8zA", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "3;5;4;2", "wc_review": "254;1446;405;190", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 573.75, 509.6078762146441 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.674199862463242, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:nP03HPRWM5EJ:scholar.google.com/&scioq=Joint+Descent:+Training+and+Tuning+Simultaneously&hl=en&as_sdt=0,14", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "Fblk4_Fd7ao", "title": "Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Effective communication is an important skill for enabling information exchange and cooperation in multi-agent settings. Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. One limitation of this setting is that it does not allow for the emergent protocols to generalize beyond the training partners. \n Furthermore, so far emergent communication has primarily focused on the use of symbolic channels. In this work, we extend this line of work to a new modality, by studying agents that learn to communicate via actuating their joints in a 3D environment. We show that under realistic assumptions, a non-uniform distribution of intents and a common-knowledge energy cost, these agents can find protocols that generalize to novel partners. We also explore and analyze specific difficulties associated with finding these solutions in practice. Finally, we propose and evaluate initial training improvements to address these challenges, involving both specific training curricula and providing the latent feature that can be coordinated on during training.", "keywords": "emergent communication;multi-agent communication;multi-agent reinforcement learning", "primary_area": "", "supplementary_material": "", "author": "Kalesha Bullard;Franziska Meier;Douwe Kiela;Joelle Pineau;Jakob Nicolaus Foerster", "authorids": "~Kalesha_Bullard1;~Franziska_Meier2;~Douwe_Kiela1;~Joelle_Pineau1;~Jakob_Nicolaus_Foerster1", "gender": "F;;M;F;M", "homepage": "http://www.kaleshabullard.com;;https://douwekiela.github.io;http://www.cs.mcgill.ca/~jpineau;https://www.jakobfoerster.com", "dblp": "153/7408;;136/9140;p/JoellePineau;176/5095", "google_scholar": "QehMdGIAAAAJ;;Q0piorUAAAAJ;https://scholar.google.ca/citations?user=CEt6_mMAAAAJ;6z4lQzMAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Kalesha_Bullard1;~Franziska_Meier2;~Douwe_Kiela1;~Joelle_Pineau1;~Jakob_Nicolaus_Foerster1", "aff": "Facebook AI Research;;Facebook AI Research;Meta Facebook;Facebook AI Research", "aff_domain": "fb.com;;fb.com;fb.com;fb.com", "position": "Postdoc;;Research Scientist;Researcher Manager;Research Scientist", "bibtex": "@misc{\nbullard2021exploring,\ntitle={Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations},\nauthor={Kalesha Bullard and Franziska Meier and Douwe Kiela and Joelle Pineau and Jakob Nicolaus Foerster},\nyear={2021},\nurl={https://openreview.net/forum?id=Fblk4_Fd7ao}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=Fblk4_Fd7ao", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;3;1;3", "wc_review": "2052;491;171;211", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1255;640;129;260", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 2.75, 1.0897247358851685 ], "wc_review_avg": [ 731.25, 772.4378211221923 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 571.0, 437.23620618608425 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.6622661785325219, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12757967769081401379&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Facebook AI Research", "aff_unique_url": "https://research.facebook.com", "aff_unique_abbr": "FAIR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "FcfH5Pskt2G", "title": "Clearing the Path for Truly Semantic Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "The performance of $\\beta$-Variational-Autoencoders ($\\beta$-VAEs) and their variants on learning semantically meaningful, disentangled representations is unparalleled. On the other hand, there are theoretical arguments suggesting impossibility of unsupervised disentanglement. In this work, we show that small perturbations of existing datasets hide the convenient correlation structure that is easily exploited by VAE-based architectures. To demonstrate this, we construct modified versions of the standard datasets on which (i) the generative factors are perfectly preserved; (ii) each image undergoes a transformation barely visible to the human eye; (iii) the leading disentanglement architectures fail to produce disentangled representations. We intend for these datasets to play a role in separating correlation-based models from those that discover the true causal structure.\n\nThe construction of the modifications is non-trivial and relies on recent progress on mechanistic understanding of $\\beta$-VAEs and their connection to PCA, while also providing additional insights that might be of stand-alone interest.", "keywords": "Representation Learning;Disentanglement;Unsupervised Learning;Semantic Representations;VAE;Causal Representations;PCA", "primary_area": "", "supplementary_material": "", "author": "Dominik Zietlow;Michal Rolinek;Georg Martius", "authorids": "~Dominik_Zietlow1;~Michal_Rolinek2;~Georg_Martius1", "gender": ";M;M", "homepage": ";;https://uni-tuebingen.de/de/264672", "dblp": "232/2075;159/1618;47/2706", "google_scholar": "jkIx0f8AAAAJ;DVdSTFQAAAAJ;https://scholar.google.de/citations?user=b-JF-UIAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Dominik_Zietlow1;~Michal_Rolinek2;~Georg_Martius1", "aff": "University of Tuebingen;;Max Planck Institute for Intelligent Systems", "aff_domain": "uni-tuebingen.de;;tuebingen.mpg.de", "position": "PhD student;;Assistant Professor", "bibtex": "@misc{\nzietlow2021clearing,\ntitle={Clearing the Path for Truly Semantic Representation Learning},\nauthor={Dominik Zietlow and Michal Rolinek and Georg Martius},\nyear={2021},\nurl={https://openreview.net/forum?id=FcfH5Pskt2G}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=FcfH5Pskt2G", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "2;5;2;4", "wc_review": "599;342;458;482", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 1.299038105676658 ], "wc_review_avg": [ 470.25, 91.25890367520311 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:xH35NpetKnEJ:scholar.google.com/&scioq=Clearing+the+Path+for+Truly+Semantic+Representation+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Tuebingen;Max Planck Institute for Intelligent Systems", "aff_unique_dep": ";Intelligent Systems", "aff_unique_url": "https://www.uni-tuebingen.de/;https://www.mpi-is.mpg.de", "aff_unique_abbr": "Uni T\u00fcbingen;MPI-IS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "FlhlcARywRz", "title": "Learning a unified label space", "track": "main", "status": "Reject", "tldr": "", "abstract": "How do we build a general and broad object detection system? We use all labels of all concepts ever annotated. These labels span many diverse datasets with potentially inconsistent semantic labels. In this paper, we show how to integrate these datasets and their semantic taxonomies in a completely automated fashion. Once integrated, we train an off-the-shelf object detector on the union of the datasets. This unified recognition system performs as well as dataset-specific models on each training domain, but generalizes much better to new unseen domains. Entries based on the presented methodology ranked first in the object detection and instance segmentation tracks of the ECCV 2020 Robust Vision Challenge.", "keywords": "object detection;image recognition;computer vision", "primary_area": "", "supplementary_material": "", "author": "Xingyi Zhou;Vladlen Koltun;Philipp Kraehenbuehl", "authorids": "~Xingyi_Zhou2;~Vladlen_Koltun1;~Philipp_Kraehenbuehl1", "gender": "M;M;M", "homepage": "http://xingyizhou.xyz;http://vladlen.info/;http://www.philkr.net/", "dblp": "182/2328;66/5458.html;43/7592", "google_scholar": "47n-0mwAAAAJ;kg4bCpgAAAAJ;https://scholar.google.com.tw/citations?user=dzOd2hgAAAAJ", "orcid": "0000-0002-0914-8525;0000-0003-0858-0970;", "linkedin": "xingyi-zhou-21925290/;vladlenkoltun/;", "or_profile": "~Xingyi_Zhou2;~Vladlen_Koltun1;~Philipp_Kraehenbuehl1", "aff": "Intel;Intel;University of Texas, Austin", "aff_domain": "intel.com;intel.com;utexas.edu", "position": "Intern;Chief Scientist for Intelligent Systems;Assistant Professor", "bibtex": "@misc{\nzhou2021learning,\ntitle={Learning a unified label space},\nauthor={Xingyi Zhou and Vladlen Koltun and Philipp Kraehenbuehl},\nyear={2021},\nurl={https://openreview.net/forum?id=FlhlcARywRz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=FlhlcARywRz", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "4;4;4;5", "wc_review": "1021;465;768;374", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "950;388;479;335", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 657.0, 255.81731763115647 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 538.0, 243.37933355155693 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.4714045207910316, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:905tgdxzbCsJ:scholar.google.com/&scioq=Learning+a+unified+label+space&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "Intel;University of Texas at Austin", "aff_unique_dep": "Intel Corporation;", "aff_unique_url": "https://www.intel.com;https://www.utexas.edu", "aff_unique_abbr": "Intel;UT Austin", "aff_campus_unique_index": "1", "aff_campus_unique": ";Austin", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2793", "id": "FmMKSO4e8JK", "poster": "", "openreview": "https://openreview.net/forum?id=FmMKSO4e8JK", "slides": "https://iclr.cc/virtual/2021/poster/2793", "video": "https://iclr.cc/virtual/2021/poster/2793", "author_site": "Justin Fu, Sergey Levine", "tldr": "", "abstract": "In this work we consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points. This problem setting emerges in many domains where function evaluation is a complex and expensive process, such as in the design of materials, vehicles, or neural network architectures. Because the available data typically only covers a small manifold of the possible space of inputs, a principal challenge is to be able to construct algorithms that can reason about uncertainty and out-of-distribution values, since a naive optimizer can easily exploit an estimated model to return adversarial inputs. We propose to tackle the MBO problem by leveraging the normalized maximum-likelihood (NML) estimator, which provides a principled approach to handling uncertainty and out-of-distribution inputs. While in the standard formulation NML is intractable, we propose a tractable approximation that allows us to scale our method to high-capacity neural network models. We demonstrate that our method can effectively optimize high-dimensional design problems in a variety of disciplines such as chemistry, biology, and materials engineering.", "keywords": "model-based optimization;normalized maximum likelihood", "primary_area": "", "supplementary_material": "", "author": "Justin Fu;Sergey Levine", "authorids": "~Justin_Fu1;~Sergey_Levine1", "gender": ";M", "homepage": ";https://people.eecs.berkeley.edu/~svlevine/", "dblp": ";80/7594", "google_scholar": "T9To2C0AAAAJ;8R35rCwAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Justin_Fu1;~Sergey_Levine1", "aff": "Berkeley;Google", "aff_domain": "berkeley.edu;google.com", "position": "PhD student;Research Scientist", "bibtex": "@inproceedings{\nfu2021offline,\ntitle={Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation},\nauthor={Justin Fu and Sergey Levine},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=FmMKSO4e8JK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;6;8", "confidence": "3;4;4", "wc_review": "309;506;1752", "wc_reply_reviewers": "0;0;999", "wc_reply_authors": "310;769;1076", "reply_reviewers": "0;0;2", "reply_authors": "2;2;2", "rating_avg": [ 6.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 855.6666666666666, 638.8856618275987 ], "wc_reply_reviewers_avg": [ 333.0, 470.93311627024065 ], "wc_reply_authors_avg": [ 718.3333333333334, 314.76375620808415 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.49999999999999983, "gs_citation": 62, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3425661009896752277&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=FmMKSO4e8JK", "email": "berkeley.edu;google.com", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "University of California, Berkeley;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.berkeley.edu;https://www.google.com", "aff_unique_abbr": "UC Berkeley;Google", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Berkeley;Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Linear Mode Connectivity in Multitask and Continual Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2876", "id": "Fmg_fQYUejf", "poster": "", "openreview": "https://openreview.net/forum?id=Fmg_fQYUejf", "slides": "https://iclr.cc/virtual/2021/poster/2876", "video": "https://iclr.cc/virtual/2021/poster/2876", "author_site": "Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur, Razvan Pascanu, Hassan Ghasemzadeh", "tldr": "", "abstract": "Continual (sequential) training and multitask (simultaneous) training are often attempting to solve the same overall objective: to find a solution that performs well on all considered tasks. The main difference is in the training regimes, where continual learning can only have access to one task at a time, which for neural networks typically leads to catastrophic forgetting. That is, the solution found for a subsequent task does not perform well on the previous ones anymore. \n However, the relationship between the different minima that the two training regimes arrive at is not well understood. What sets them apart? Is there a local structure that could explain the difference in performance achieved by the two different schemes? \n Motivated by recent work showing that different minima of the same task are typically connected by very simple curves of low error, we investigate whether multitask and continual solutions are similarly connected. We empirically find that indeed such connectivity can be reliably achieved and, more interestingly, it can be done by a linear path, conditioned on having the same initialization for both. We thoroughly analyze this observation and discuss its significance for the continual learning process.\n Furthermore, we exploit this finding to propose an effective algorithm that constrains the sequentially learned minima to behave as the multitask solution. We show that our method outperforms several state of the art continual learning algorithms on various vision benchmarks.", "keywords": "continual learning;catastrophic forgetting;mode connectivity;multitask learning", "primary_area": "", "supplementary_material": "", "author": "Seyed Iman Mirzadeh;Mehrdad Farajtabar;Dilan Gorur;Razvan Pascanu;Hassan Ghasemzadeh", "authorids": "seyediman.mirzadeh@wsu.edu;~Mehrdad_Farajtabar1;~Dilan_Gorur1;~Razvan_Pascanu1;~Hassan_Ghasemzadeh1", "gender": ";M;;M;M", "homepage": ";https://www.cc.gatech.edu/~mfarajta/;;https://razp.info;https://ghasemzadeh.com/authors/hassan-ghasemzadeh/", "dblp": ";21/9988;g/DilanGorur;65/8368.html;62/6023-1", "google_scholar": ";shkKxnQAAAAJ;;https://scholar.google.ca/citations?user=eSPY8LwAAAAJ;https://scholar.google.com.tw/citations?user=29Tc_lEAAAAJ", "orcid": ";;;;0000-0002-1844-1416", "linkedin": ";;dilan-gorur-6298124a;;hassan-ghasemzadeh-13a26a57/", "or_profile": "seyediman.mirzadeh@wsu.edu;~Mehrdad_Farajtabar1;~Dilan_Gorur1;~Razvan_Pascanu1;~Hassan_Ghasemzadeh1", "aff": ";Google;Google;Google DeepMind;Washington State University", "aff_domain": ";google.com;google.com;google.com;wsu.edu", "position": ";Research Scientist;Researcher;Research Scientist;Associate Professor", "bibtex": "@inproceedings{\nmirzadeh2021linear,\ntitle={Linear Mode Connectivity in Multitask and Continual Learning},\nauthor={Seyed Iman Mirzadeh and Mehrdad Farajtabar and Dilan Gorur and Razvan Pascanu and Hassan Ghasemzadeh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Fmg_fQYUejf}\n}", "github": "[![github](/images/github_icon.svg) imirzadeh/MC-SGD](https://github.com/imirzadeh/MC-SGD)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "7;7;7", "confidence": "5;4;5", "wc_review": "451;1029;323", "wc_reply_reviewers": "0;119;0", "wc_reply_authors": "383;735;285", "reply_reviewers": "0;1;0", "reply_authors": "1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 601.0, 307.1199548493498 ], "wc_reply_reviewers_avg": [ 39.666666666666664, 56.09713797413277 ], "wc_reply_authors_avg": [ 467.6666666666667, 193.2206568206987 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 155, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10468811797723946398&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=Fmg_fQYUejf", "email": ";google.com;google.com;google.com;wsu.edu", "author_num": 5, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Google;Washington State University", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://wsu.edu", "aff_unique_abbr": "Google;WSU", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "Fn5wiAq2SR", "title": "Adversarial Training using Contrastive Divergence", "track": "main", "status": "Reject", "tldr": "", "abstract": "To protect the security of machine learning models against adversarial examples, adversarial training becomes the most popular and powerful strategy against various adversarial attacks by injecting adversarial examples into training data. However, it is time-consuming and requires high computation complexity to generate suitable adversarial examples for ensuring the robustness of models, which impedes the spread and application of adversarial training. In this work, we reformulate adversarial training as a combination of stationary distribution exploring, sampling, and training. Each updating of parameters of DNN is based on several transitions from the data samples as the initial states in a Hamiltonian system. Inspired by our new paradigm, we design a new generative method for adversarial training by using Contrastive Divergence (ATCD), which approaches the equilibrium distribution of adversarial examples with only few iterations by building from small modifications of the standard Contrastive Divergence (CD). Our adversarial training algorithm achieves much higher robustness than any other state-of-the-art adversarial training acceleration method on the ImageNet, CIFAR-10, and MNIST datasets and reaches a balance between performance and efficiency.", "keywords": "Adversarial Training;Contrastive Divergence", "primary_area": "", "supplementary_material": "", "author": "Hongjun Wang;Guanbin Li;Liang Lin", "authorids": "~Hongjun_Wang2;~Guanbin_Li2;~Liang_Lin1", "gender": "M;M;M", "homepage": "https://whj363636.github.io/;http://guanbinli.com;http://www.linliang.net", "dblp": "65/3627-5;126/4457;", "google_scholar": "DNi-nB0AAAAJ;2A2Bx2UAAAAJ;https://scholar.google.com.hk/citations?user=Nav8m8gAAAAJ", "orcid": ";0000-0002-2486-2890;", "linkedin": ";;", "or_profile": "~Hongjun_Wang2;~Guanbin_Li2;~Liang_Lin1", "aff": "SUN YAT-SEN UNIVERSITY;SUN YAT-SEN UNIVERSITY;SUN YAT-SEN UNIVERSITY", "aff_domain": "sysu.edu.cn;sysu.edu.cn;sysu.edu.cn", "position": "MS student;Associate Professor;Full Professor", "bibtex": "@misc{\nwang2021adversarial,\ntitle={Adversarial Training using Contrastive Divergence},\nauthor={Hongjun Wang and Guanbin Li and Liang Lin},\nyear={2021},\nurl={https://openreview.net/forum?id=Fn5wiAq2SR}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=Fn5wiAq2SR", "pdf_size": 0, "rating": "5;5;6", "confidence": "3;4;2", "wc_review": "204;380;224", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "344;271;208", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 269.3333333333333, 78.67796529030362 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 274.3333333333333, 55.57177541002467 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:k4GCbNK_4TgJ:scholar.google.com/&scioq=Adversarial+Training+using+Contrastive+Divergence&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Sun Yat-sen University", "aff_unique_dep": "", "aff_unique_url": "http://www.sysu.edu.cn", "aff_unique_abbr": "SYSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "Fo6S5-3Dx_", "title": "Deep Evolutionary Learning for Molecular Design", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we propose a deep evolutionary learning (DEL) process that integrates fragment-based deep generative model and multi-objective evolutionary computation for molecular design. Our approach enables (1) evolutionary operations in the latent space of the generative model, rather than the structural space, to generate novel promising molecular structures for the next evolutionary generation, and (2) generative model fine-tuning using newly generated high-quality samples. Thus, DEL implements a data-model co-evolution concept which improves both sample population and generative model learning. Experiments on two public datasets indicate that sample population obtained by DEL exhibits improved property distributions, and dominates samples generated by multi-objective Bayesian optimization algorithms.", "keywords": "Deep Evolutionary Learning;Fragment-Based Drug Design;Deep Generative Model;Drug Design;Multi-objective Optimization", "primary_area": "", "supplementary_material": "", "author": "Yifeng Li;Hsu Kiang Ooi;Alain Tchagang", "authorids": "~Yifeng_Li1;~Hsu_Kiang_Ooi1;alain.tchagang@nrc-cnrc.gc.ca", "gender": "M;M;", "homepage": "https://sites.google.com/view/yifengli;;", "dblp": "65/2432-1;;", "google_scholar": "https://scholar.google.ca/citations?user=HTE-3E4AAAAJ;;", "orcid": ";;", "linkedin": ";jamesooi/;", "or_profile": "~Yifeng_Li1;~Hsu_Kiang_Ooi1;alain.tchagang@nrc-cnrc.gc.ca", "aff": "Brock University;National Research Council Canada;", "aff_domain": "brocku.ca;nrc.ca;", "position": "Assistant Professor;Research Officer;", "bibtex": "@misc{\nli2021deep,\ntitle={Deep Evolutionary Learning for Molecular Design},\nauthor={Yifeng Li and Hsu Kiang Ooi and Alain Tchagang},\nyear={2021},\nurl={https://openreview.net/forum?id=Fo6S5-3Dx_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=Fo6S5-3Dx_", "pdf_size": 0, "rating": "4;4;4;4", "confidence": "3;3;4;4", "wc_review": "415;703;408;1457", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "117;146;70;467", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 745.75, 427.54378430752564 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 200.0, 156.5199667774051 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 35, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10604104274877332396&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Brock University;National Research Council Canada", "aff_unique_dep": ";", "aff_unique_url": "https://www.brocku.ca;https://www.nrc-cnrc.gc.ca", "aff_unique_abbr": "Brock;NRC-CNRC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Canada" }, { "id": "FoM-RnF6SNe", "title": "Evaluating Agents Without Rewards", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement learning has enabled agents to solve challenging control tasks from raw image inputs. However, manually crafting reward functions can be time consuming, expensive, and prone to human error. Competing objectives have been proposed for agents to learn without external supervision, such as artificial input entropy, information gain, and empowerment. Estimating these objectives can be challenging and it remains unclear how well they reflect task rewards or human behavior. We study these objectives across seven agents and three Atari games. Retrospectively computing the objectives from the agent's lifetime of experience simplifies accurate estimation. We find that all three objectives correlate more strongly with a human behavior similarity metric than with task reward. Moreover, input entropy and information gain both correlate more strongly with human similarity than task reward does.", "keywords": "reinforcement learning;task-agnostic;agent evaluation;exploration;information gain;empowerment;curiosity", "primary_area": "", "supplementary_material": "/attachment/11ab865d9cc77cfaee62981ec00fd01ed1adbc25.zip", "author": "Brendon Matusch;Jimmy Ba;Danijar Hafner", "authorids": "~Brendon_Matusch1;~Jimmy_Ba1;~Danijar_Hafner1", "gender": "M;M;", "homepage": ";http://jimmylba.github.io;https://danijar.com", "dblp": ";https://dblp.org/pers/b/Ba:Jimmy.html;184/8088", "google_scholar": ";https://scholar.google.ca/citations?user=ymzxRhAAAAAJ;VINmGpYAAAAJ", "orcid": ";;0000-0002-9534-7271", "linkedin": "brendon-matusch-0aa302167;;", "or_profile": "~Brendon_Matusch1;~Jimmy_Ba1;~Danijar_Hafner1", "aff": "Stanford University;Department of Computer Science, University of Toronto;University of Toronto", "aff_domain": "stanford.edu;cs.toronto.edu;cs.toronto", "position": "Undergrad student;Assistant Professor;PhD student", "bibtex": "@misc{\nmatusch2021evaluating,\ntitle={Evaluating Agents Without Rewards},\nauthor={Brendon Matusch and Jimmy Ba and Danijar Hafner},\nyear={2021},\nurl={https://openreview.net/forum?id=FoM-RnF6SNe}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=FoM-RnF6SNe", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "4;4;3;5", "wc_review": "1336;1667;330;612", "wc_reply_reviewers": "0;337;222;314", "wc_reply_authors": "686;1425;553;704", "reply_reviewers": "0;1;1;1", "reply_authors": "1;2;2;2", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 986.25, 537.6971150192271 ], "wc_reply_reviewers_avg": [ 218.25, 133.15099511456907 ], "wc_reply_authors_avg": [ 842.0, 341.6101579285956 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3565551577173653299&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;1", "aff_unique_norm": "Stanford University;University of Toronto", "aff_unique_dep": ";Department of Computer Science", "aff_unique_url": "https://www.stanford.edu;https://www.utoronto.ca", "aff_unique_abbr": "Stanford;U of T", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Stanford;Toronto;", "aff_country_unique_index": "0;1;1", "aff_country_unique": "United States;Canada" }, { "id": "FsLTUzZlsgT", "title": "Learning Curves for Analysis of Deep Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "A learning curve models a classifier\u2019s test error as a function of the number of training samples. Prior works show that learning curves can be used to select model parameters and extrapolate performance. We investigate how to use learning curves to analyze the impact of design choices, such as pre-training, architecture, and data augmentation. We propose a method to robustly estimate learning curves, abstract their parameters into error and data-reliance, and evaluate the effectiveness of different parameterizations. We also provide several interesting observations based on learning curves for a variety of image classification models.", "keywords": "learning curve;deep network;analysis;asymptotic error;learning efficiency;power law", "primary_area": "", "supplementary_material": "", "author": "Derek Hoiem;Tanmay Gupta;Zhizhong Li;Michal M Shlapentokh-Rothman", "authorids": "~Derek_Hoiem1;~Tanmay_Gupta1;~Zhizhong_Li1;~Michal_M_Shlapentokh-Rothman1", "gender": "M;M;M;F", "homepage": "http://dhoiem.cs.illinois.edu/;http://tanmaygupta.info/;http://zli115.web.engr.illinois.edu/;https://michalmsr.web.illinois.edu/", "dblp": "08/6948;62/1086;;269/4751", "google_scholar": "8Sfj7q8AAAAJ;https://scholar.google.co.in/citations?user=zblQKM8AAAAJ;qIdGcLUAAAAJ;x9szIWsAAAAJ", "orcid": ";;0000-0002-6068-7209;", "linkedin": ";;;michal-shlapentokh-rothman/", "or_profile": "~Derek_Hoiem1;~Tanmay_Gupta1;~Zhizhong_Li1;~Michal_M_Shlapentokh-Rothman1", "aff": "Reconstruct;Allen Institute for Artificial Intelligence;Amazon;University of Illinois, Urbana Champaign", "aff_domain": "reconstructinc.com;allenai.org;amazon.com;illinois.edu", "position": "Chief Scientist;Research Scientist;Applied Scientist;PhD student", "bibtex": "@misc{\nhoiem2021learning,\ntitle={Learning Curves for Analysis of Deep Networks},\nauthor={Derek Hoiem and Tanmay Gupta and Zhizhong Li and Michal M Shlapentokh-Rothman},\nyear={2021},\nurl={https://openreview.net/forum?id=FsLTUzZlsgT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=FsLTUzZlsgT", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "3;4;3;4", "wc_review": "759;643;261;344", "wc_reply_reviewers": "388;314;0;106", "wc_reply_authors": "1182;1120;85;218", "reply_reviewers": "1;2;0;1", "reply_authors": "3;3;1;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 501.75, 205.5326920467885 ], "wc_reply_reviewers_avg": [ 202.0, 155.85249436566616 ], "wc_reply_authors_avg": [ 651.25, 502.43575459953087 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.40824829046386296, "gs_citation": 31, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16115346742858062645&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "1;2;3", "aff_unique_norm": ";Allen Institute for Artificial Intelligence;Amazon;University of Illinois Urbana-Champaign", "aff_unique_dep": ";;Amazon.com, Inc.;", "aff_unique_url": ";https://allenai.org;https://www.amazon.com;https://illinois.edu", "aff_unique_abbr": ";AI2;Amazon;UIUC", "aff_campus_unique_index": "1", "aff_campus_unique": ";Urbana-Champaign", "aff_country_unique_index": "1;1;1", "aff_country_unique": ";United States" }, { "id": "FyucNzzMba-", "title": "Forward Prediction for Physical Reasoning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Physical reasoning requires forward prediction: the ability to forecast what will happen next given some initial world state. We study the performance of state-of-the-art forward-prediction models in the complex physical-reasoning tasks of the PHYRE benchmark (Bakhtin et al., 2019). We do so by incorporating models that operate on object or pixel-based representations of the world into simple physical-reasoning agents. We find that forward-prediction models can improve physical-reasoning performance, particularly on complex tasks that involve many objects. However, we also find that these improvements are contingent on the test tasks being small variations of train tasks, and that generalization to completely new task templates is challenging. Surprisingly, we observe that forward predictors with better pixel accuracy do not necessarily lead to better physical-reasoning performance. Nevertheless, our best models set a new state-of-the-art on the PHYRE benchmark.", "keywords": "Forward prediction;physical reasoning", "primary_area": "", "supplementary_material": "", "author": "Rohit Girdhar;Laura Gustafson;Aaron B. Adcock;Laurens van der Maaten", "authorids": "~Rohit_Girdhar5;~Laura_Gustafson1;~Aaron_B._Adcock1;~Laurens_van_der_Maaten3", "gender": "M;;;M", "homepage": "http://rohitgirdhar.github.io;;;https://lvdmaaten.github.io/", "dblp": "161/2631;;;53/2650.html", "google_scholar": "https://scholar.google.co.in/citations?user=7cuwdr8AAAAJ;c8IpF9gAAAAJ;;6GDfcqEAAAAJ", "orcid": ";;;", "linkedin": "rohit-girdhar-53382881/;;;", "or_profile": "~Rohit_Girdhar5;~Laura_Gustafson1;~Aaron_B._Adcock1;~Laurens_van_der_Maaten1", "aff": "Meta, Inc.;Meta;;Meta", "aff_domain": "meta.com;meta.com;;meta.com", "position": "Research Scientist;Research Engineer;;Research Scientist", "bibtex": "@misc{\ngirdhar2021forward,\ntitle={Forward Prediction for Physical Reasoning},\nauthor={Rohit Girdhar and Laura Gustafson and Aaron B. Adcock and Laurens van der Maaten},\nyear={2021},\nurl={https://openreview.net/forum?id=FyucNzzMba-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer5;AnonReviewer4", "site": "https://openreview.net/forum?id=FyucNzzMba-", "pdf_size": 0, "rating": "5;5;5;5;6", "confidence": "4;3;4;3;5", "wc_review": "676;392;777;924;355", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "794;380;541;766;77", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.2, 0.39999999999999997 ], "confidence_avg": [ 3.8, 0.7483314773547882 ], "wc_review_avg": [ 624.8, 220.13395921574661 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 511.6, 265.1524844311288 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.8017837257372733, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13511587896370675862&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta Platforms, Inc.", "aff_unique_url": "https://www.meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "FzGiUKN4aBp", "title": "Out-of-Distribution Generalization with Maximal Invariant Predictor", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Out-of-Distribution (OOD) generalization is a problem of seeking the predictor function whose performance in the worst environment is optimal. \nThis paper makes both theoretical and algorithmic contributions to the OOD problem.\nWe consider a set of all invariant features conditioned to which the target variable and the environment variable becomes independent, and theoretically prove that one can seek an OOD optimal predictor by looking for the mutual-information maximizing feature amongst the invariant features. \nWe establish this result as \\textit{Maximal Invariant Predictor condition}. \nOur theoretical work is closely related to approaches like Invariant Risk Minimization and Invariant Rationalization.\nWe also derive from our theory the \\textit{Inter Gradient Alignment}(IGA) algorithm that uses a parametrization trick to conduct \\textit{feature searching} and \\textit{predictor training} at once. \nWe develop an extension of the Colored-MNIST that can more accurately represent the pathological OOD situation than the original version, and demonstrate the superiority of IGA over previous methods on both the original and the extended version of Colored-MNIST. ", "keywords": "out-of-distribution generalization;extrapolation", "primary_area": "", "supplementary_material": "/attachment/aabe839b0953e6ef3c5e99385fba34c46107aa64.zip", "author": "Masanori Koyama;Shoichiro Yamaguchi", "authorids": "~Masanori_Koyama1;~Shoichiro_Yamaguchi1", "gender": ";M", "homepage": ";", "dblp": "151/6113;76/9374", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Masanori_Koyama1;~Shoichiro_Yamaguchi1", "aff": "Preferred Networks, Inc.;Preferred Networks, Inc.", "aff_domain": "preferred.jp;preferred.jp", "position": "Researcher;Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=FzGiUKN4aBp", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;3;3;2", "wc_review": "518;349;315;212", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 348.5, 110.09654853808996 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 96, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3739927832644196561&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Preferred Networks, Inc.", "aff_unique_dep": "", "aff_unique_url": "https://www.preferred-networks.com", "aff_unique_abbr": "PFN", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "id": "G0VouKj9HUG", "title": "On Learning Read-once DNFs With Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning functions over Boolean variables is a fundamental problem in machine learning. But not much is known about learning such functions by neural networks. Because learning these functions in the distribution free setting is NP-Hard, they are unlikely to be efficiently learnable by networks in this case. However, assuming the inputs are sampled from the uniform distribution, an important subset of functions that are known to be efficiently learnable is read-once DNFs. Here we focus on this setting where the functions are learned by a convex neural network and gradient descent. \nWe first observe empirically that the learned neurons are aligned with the terms of the DNF, despite the fact that there are many zero-error networks that do not have this property. Thus, the learning process has a clear inductive bias towards such logical formulas. To gain a better theoretical understanding of this phenomenon we focus on minimizing the population risk. We show that this risk can be minimized by multiple networks: from ones that memorize data to ones that compactly represent the DNF. We then set out to understand why gradient descent ``\"chooses\" the compact representation. \nWe use a computer assisted proof to prove the inductive bias for relatively small DNFs, and use it to design a process for reconstructing the DNF from the learned network. We then continue to provide theoretical insights on the learning process and the loss surface to better understand the resulting inductive bias. For example, we show that the neurons in solutions with minimum $l_2$-norm of the weights are also aligned with the terms of the DNF. Finally, we empirically show that our results are validated in the empirical case for high dimensional DNFs, more general network architectures and tabular datasets.", "keywords": "neural network;DNF;read-once;inductive bias;reconstruction;alignment", "primary_area": "", "supplementary_material": "/attachment/fa9a8a5dad6a531766ef4a13850ffef164fb1f73.zip", "author": "Ido Bronstein;Alon Brutzkus;Amir Globerson", "authorids": "~Ido_Bronstein2;~Alon_Brutzkus1;~Amir_Globerson1", "gender": "M;M;M", "homepage": ";http://www.cs.tau.ac.il/~gamir/;https://www.linkedin.com/in/ido-bronstein-0722ab1b8/", "dblp": "161/7411;08/4162.html;", "google_scholar": "m1wmXdgAAAAJ;https://scholar.google.com.tw/citations?user=5JserkUAAAAJ;", "orcid": ";;", "linkedin": ";;ido-bronstein-0722ab1b8/", "or_profile": "~Alon_Brutzkus1;~Amir_Globerson1;~ido_bronstein1", "aff": "Tel Aviv University;Tel Aviv University;", "aff_domain": "tau.ac.il;tau.ac.il;", "position": "PhD student;Associate Professor;", "bibtex": "@misc{\nbronstein2021on,\ntitle={On Learning Read-once {\\{}DNF{\\}}s With Neural Networks},\nauthor={Ido Bronstein and Alon Brutzkus and Amir Globerson},\nyear={2021},\nurl={https://openreview.net/forum?id=G0VouKj9HUG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=G0VouKj9HUG", "pdf_size": 0, "rating": "4;5;7", "confidence": "3;4;2", "wc_review": "194;495;378", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "548;734;414", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 1.247219128924647 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 355.6666666666667, 123.89332328885021 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 565.3333333333334, 131.21314297313697 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6546536707079772, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:xIbWuf_MyN0J:scholar.google.com/&scioq=On+Learning+Read-once+DNFs+With+Neural+Networks&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Tel Aviv University", "aff_unique_dep": "", "aff_unique_url": "https://www.tau.ac.il", "aff_unique_abbr": "TAU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Israel" }, { "id": "G1KjzLWU4ci", "title": "Logit As Auxiliary Weak-supervision for More Reliable and Accurate Prediction", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "When a person identifies objects, he or she can think by associating objects to many classes and conclude by taking inter-class relations into account. This cognitive system can make a more reliable prediction. Inspired by these observations, we propose a new network training strategy to consider inter-class relations, namely LogitMix. Specifically, we use recent data augmentation techniques (e.g., Mixup, Manifold Mixup, or Cutmix) as baselines for generating mixed samples. Then, LogitMix suggests using the mixed logit (ie., the mixture of two logits) as an auxiliary training objective. Because using logit before softmax activation preserves rich class relationships, it can serve as a weak-supervision signal concerning inter-class relations. Our experimental results demonstrate that LogitMix achieves state-of-the-art performance among recent data augmentation techniques in terms of both calibration error and prediction accuracy. The source code is attached as the supplementary material.", "keywords": "Inter-class correlation;Human cognitive system;Weak supervision;Calibration;Regularization", "primary_area": "", "supplementary_material": "/attachment/9485b30ce9093fe3b90b897c33f039fc860ea323.zip", "author": "Duhyeon Bang;Yunho Jeon;Jin-Hwa Kim;Jiwon Kim;Hyunjung Shim", "authorids": "~Duhyeon_Bang1;~Yunho_Jeon1;~Jin-Hwa_Kim1;~Jiwon_Kim3;~Hyunjung_Shim1", "gender": ";M;Unspecified;M;F", "homepage": ";https://effailab.hanbat.ac.kr;http://wityworks.com;;https://sites.google.com/view/cvml-kaist", "dblp": "182/0549;126/4768;48/258;;72/4620", "google_scholar": ";-FEJAZAAAAAJ;https://scholar.google.co.kr/citations?user=3f2wPekAAAAJ;https://scholar.google.co.kr/citations?user=xhvzHFAAAAAJ;KB5XZGIAAAAJ", "orcid": ";0000-0001-8043-480X;0000-0002-0423-0415;;", "linkedin": ";yh-jeon;;;", "or_profile": "~Duhyeon_Bang1;~Yunho_Jeon1;~Jin-Hwa_Kim1;~Jiwon_Kim3;~Hyunjung_Shim1", "aff": "Yonsei university;mofl Inc.;SK Telecom;SK Telecom;Yonsei University", "aff_domain": "yonsei.ac.kr;mofl.ai;sk.com;sk.com;yonsei.ac.kr", "position": "PhD student;Researcher;Research Scientist;Vice President;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=G1KjzLWU4ci", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "4;4;4;2", "wc_review": "628;461;504;155", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 437.0, 173.9755729980505 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8783100656536799, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:TNh4Fgmp6bUJ:scholar.google.com/&scioq=Logit+As+Auxiliary+Weak-supervision+for+More+Reliable+and+Accurate+Prediction&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;2;2;0", "aff_unique_norm": "Yonsei University;mofl Inc.;SK Telecom", "aff_unique_dep": ";;", "aff_unique_url": "https://www.yonsei.ac.kr;;https://www.sktelecom.com", "aff_unique_abbr": "Yonsei;;SKT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0;0", "aff_country_unique": "South Korea;United States" }, { "id": "G67PtYbCImX", "title": "Similarity Search for Efficient Active Learning and Search of Rare Concepts", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many active learning and search approaches are intractable for industrial settings with billions of unlabeled examples. Existing approaches, such as uncertainty sampling or information density, search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. However, in practice, data is often heavily skewed; only a small fraction of collected data will be relevant for a given learning task. For example, when identifying rare classes, detecting malicious content, or debugging model performance, positive examples can appear in less than 1% of the data. In this work, we exploit this skew in large training datasets to reduce the number of unlabeled examples considered in each selection round by only looking at the nearest neighbors to the labeled examples. Empirically, we observe that learned representations can effectively cluster unseen concepts, making active learning very effective and substantially reducing the number of viable unlabeled examples. We evaluate several selection strategies in this setting on three large-scale computer vision datasets: ImageNet, OpenImages, and a proprietary dataset of 10 billion images from a large internet company. For rare classes, active learning methods need as little as 0.31% of the labeled data to match the average precision of full supervision. By limiting the selection strategies to the immediate neighbors of the labeled data as candidates for labeling, we process as little as 0.1% of the unlabeled data while achieving similar reductions in labeling costs as the traditional global approach. This process of expanding the candidate pool with the nearest neighbors of the labeled set can be done efficiently and reduces the computational complexity of selection by orders of magnitude. ", "keywords": "active learning;active search", "primary_area": "", "supplementary_material": "", "author": "Cody Coleman;Edward Chou;Sean Culatana;Peter Bailis;Alexander C. Berg;Roshan Sumbaly;Matei Zaharia;I. Zeki Yalniz", "authorids": "~Cody_Coleman1;ejchou@fb.com;seanchang.stat@gmail.com;~Peter_Bailis1;~Alexander_C._Berg1;rsumbaly@gmail.com;~Matei_Zaharia1;~I._Zeki_Yalniz1", "gender": "M;;;;M;;M;M", "homepage": "http://www.codycoleman.com/;;;;http://acberg.com;;https://cs.stanford.edu/~matei/;https://research.fb.com/people/yalniz-i-zeki/", "dblp": "https://dblp.uni-trier.de/pers/hd/c/Coleman:Cody;;;;http://dblp.uni-trier.de/pers/hd/b/Berg:Alexander_C=;;36/2133;91/8461", "google_scholar": "https://scholar.google.com/citations?hl=en;;;;jjEht8wAAAAJ;;I1EvjZsAAAAJ;xq7MwfgAAAAJ", "orcid": ";;;;;;0000-0002-7547-7204;", "linkedin": ";;;;;;mateizaharia/;i-zeki-yalniz-88094555", "or_profile": "~Cody_Coleman1;ejchou@fb.com;seanchang.stat@gmail.com;~Peter_Bailis1;~Alexander_C._Berg1;rsumbaly@gmail.com;~Matei_Zaharia1;~I._Zeki_Yalniz1", "aff": "Stanford University;;;;;;Stanford University;Meta", "aff_domain": "stanford.edu;;;;;;stanford.edu;meta.com", "position": "PhD student;;;;;;Assistant Professor;Research Scientist", "bibtex": "@misc{\ncoleman2021similarity,\ntitle={Similarity Search for Efficient Active Learning and Search of Rare Concepts},\nauthor={Cody Coleman and Edward Chou and Sean Culatana and Peter Bailis and Alexander C. Berg and Roshan Sumbaly and Matei Zaharia and I. Zeki Yalniz},\nyear={2021},\nurl={https://openreview.net/forum?id=G67PtYbCImX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=G67PtYbCImX", "pdf_size": 0, "rating": "4;5;8", "confidence": "4;5;3", "wc_review": "893;737;355", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "709;849;205", "reply_reviewers": "0;0;0", "reply_authors": "2;2;2", "rating_avg": [ 5.666666666666667, 1.699673171197595 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 661.6666666666666, 226.0049163673707 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 587.6666666666666, 276.55660461387566 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.7205766921228921, "gs_citation": 42, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2571011099367124842&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;0;1", "aff_unique_norm": "Stanford University;Meta", "aff_unique_dep": ";Meta Platforms, Inc.", "aff_unique_url": "https://www.stanford.edu;https://meta.com", "aff_unique_abbr": "Stanford;Meta", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Stanford;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "G70Z8ds32C9", "title": "Deep Networks from the Principle of Rate Reduction", "track": "main", "status": "Reject", "tldr": "", "abstract": "This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for maximizing the rate reduction of learned features naturally leads to a deep network, one iteration per layer. The architectures, operators (linear or nonlinear), and parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion. All components of this ``white box'' network have precise optimization, statistical, and geometric interpretation. Our preliminary experiments indicate that such a network can already learn a good discriminative deep representation without any back propagation training. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation also indicates that such a convolutional network is significantly more efficient to learn and construct in the spectral domain.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Kwan Ho Ryan Chan;Yaodong Yu;Chong You;Haozhi Qi;John Wright;Yi Ma", "authorids": "ryanchankh@berkeley.edu;~Yaodong_Yu4;~Chong_You2;~Haozhi_Qi1;~John_Wright1;~Yi_Ma4", "gender": ";M;M;M;;M", "homepage": ";https://yaodongyu.github.io;https://sites.google.com/view/cyou;https://haozhi.io/;http://www.columbia.edu/~jw2966;http://people.eecs.berkeley.edu/~yima/", "dblp": ";;164/7311;190/7802;;", "google_scholar": ";bZ9oyW8AAAAJ;Mfrpm_IAAAAJ;https://scholar.google.com.hk/citations?user=iyVHKkcAAAAJ;nujTx04AAAAJ;https://scholar.google.com.hk/citations?user=XqLiBQMAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "ryanchankh@berkeley.edu;~Yaodong_Yu4;~Chong_You2;~Haozhi_Qi1;~John_Wright1;~Yi_Ma4", "aff": ";Electrical Engineering & Computer Science Department, University of California Berkeley;University of California, Berkeley;University of California, Berkeley;Columbia University;University of California, Berkeley", "aff_domain": ";eecs.berkeley.edu;berkeley.edu;berkeley.edu;columbia.edu;berkeley.edu", "position": ";PhD student;Postdoc;PhD student;Associate Professor;Full Professor", "bibtex": "@misc{\nchan2021deep,\ntitle={Deep Networks from the Principle of Rate Reduction},\nauthor={Kwan Ho Ryan Chan and Yaodong Yu and Chong You and Haozhi Qi and John Wright and Yi Ma},\nyear={2021},\nurl={https://openreview.net/forum?id=G70Z8ds32C9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=G70Z8ds32C9", "pdf_size": 0, "rating": "4;6;6;6;9", "confidence": "3;3;2;4;3", "wc_review": "789;462;127;371;96", "wc_reply_reviewers": "447;0;0;0;0", "wc_reply_authors": "1327;989;64;567;69", "reply_reviewers": "1;0;0;0;0", "reply_authors": "3;2;1;1;1", "rating_avg": [ 6.2, 1.6 ], "confidence_avg": [ 3.0, 0.6324555320336759 ], "wc_review_avg": [ 369.0, 252.24829038072784 ], "wc_reply_reviewers_avg": [ 89.4, 178.79999999999998 ], "wc_reply_authors_avg": [ 603.2, 500.02895916136697 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 1.6, 0.8 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 20, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11409871484234320854&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0;1;0", "aff_unique_norm": "University of California, Berkeley;Columbia University", "aff_unique_dep": "Electrical Engineering & Computer Science Department;", "aff_unique_url": "https://www.berkeley.edu;https://www.columbia.edu", "aff_unique_abbr": "UC Berkeley;Columbia", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "GA87kjyd-f", "title": "A Unified Paths Perspective for Pruning at Initialization", "track": "main", "status": "Reject", "tldr": "", "abstract": "A number of recent approaches have been proposed for pruning neural network parameters at initialization with the goal of reducing the size and computational burden of models while minimally affecting their training dynamics and generalization performance. While each of these approaches have some amount of well-founded motivation, a rigorous analysis of the effect of these pruning methods on network training dynamics and their formal relationship to each other has thus far received little attention. Leveraging recent theoretical approximations provided by the Neural Tangent Kernel, we unify a number of popular approaches for pruning at initialization under a single path-centric framework. We introduce the Path Kernel as the data-independent factor in a decomposition of the Neural Tangent Kernel and show the global structure of the Path Kernel can be computed efficiently. This Path Kernel decomposition separates the architectural effects from the data-dependent effects within the Neural Tangent Kernel, providing a means to predict the convergence dynamics of a network from its architecture alone. We analyze the use of this structure in approximating training and generalization performance of networks in the absence of data across a number of initialization pruning approaches. Observing the relationship between input data and paths and the relationship between the Path Kernel and its natural norm, we additionally propose two augmentations of the SynFlow algorithm for pruning at initialization.", "keywords": "Pruning;Paths;Neural Networks;Neural Tangent Kernel", "primary_area": "", "supplementary_material": "/attachment/d31fbc9d7c1ee04ebef5e3f4868f455038d355eb.zip", "author": "Thomas Gebhart;Udit Saxena;Paul R. Schrater", "authorids": "~Thomas_Gebhart1;udit.umass@gmail.com;~Paul_R._Schrater1", "gender": "M;;M", "homepage": "http://gebhartom.com;;", "dblp": ";;s/PaulRSchrater", "google_scholar": ";;https://scholar.google.com.tw/citations?user=_IIpR0EAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Thomas_Gebhart1;udit.umass@gmail.com;~Paul_R._Schrater1", "aff": "University of Minnesota, Minneapolis;;University of Minnesota ", "aff_domain": "umn.edu;;umn.edu", "position": "PhD student;;Full Professor", "bibtex": "@misc{\ngebhart2021a,\ntitle={A Unified Paths Perspective for Pruning at Initialization},\nauthor={Thomas Gebhart and Udit Saxena and Paul R. Schrater},\nyear={2021},\nurl={https://openreview.net/forum?id=GA87kjyd-f}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=GA87kjyd-f", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "3;5;3;3", "wc_review": "273;630;254;165", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "323;688;346;417", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 330.5, 177.6576764454607 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 443.5, 145.35215856670308 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 20, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=271718386989508171&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0", "aff_unique_norm": "University of Minnesota", "aff_unique_dep": "", "aff_unique_url": "https://www.minnesota.edu", "aff_unique_abbr": "UMN", "aff_campus_unique_index": "0", "aff_campus_unique": "Minneapolis;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "GBjukBaBLXK", "title": "Conditional Coverage Estimation for High-quality Prediction Intervals", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep learning has achieved state-of-the-art performance to generate high-quality prediction intervals (PIs) for uncertainty quantification in regression tasks. The high-quality criterion requires PIs to be as narrow as possible, whilst maintaining a pre-specified level of data (marginal) coverage. However, most existing works for high-quality PIs lack accurate information on conditional coverage, which may cause unreliable predictions if it is significantly smaller than the marginal coverage. To address this problem, we propose a novel end-to-end framework which could output high-quality PIs and simultaneously provide their conditional coverage estimation. In doing so, we design a new loss function that is both easy-to-implement and theoretically justified via an exponential concentration bound. Our evaluation on real-world benchmark datasets and synthetic examples shows that our approach not only outperforms the state-of-the-arts on high-quality PIs in terms of average PI width, but also accurately estimates conditional coverage information that is useful in assessing model uncertainty. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Ziyi Huang;Henry Lam;Haofeng Zhang", "authorids": "~Ziyi_Huang1;~Henry_Lam2;~Haofeng_Zhang1", "gender": "F;;", "homepage": "https://structurefunctionlab.ee.columbia.edu/people/ziyi-huang;;", "dblp": ";;", "google_scholar": "KWfiGJUAAAAJ;;", "orcid": "0000-0001-6985-0298;;", "linkedin": "ziyi-huang-083683135/;;", "or_profile": "~Ziyi_Huang1;~Henry_Lam2;~Haofeng_Zhang1", "aff": "Columbia University;;", "aff_domain": "columbia.edu;;", "position": "Researcher;;", "bibtex": "@misc{\nhuang2021conditional,\ntitle={Conditional Coverage Estimation for High-quality Prediction Intervals},\nauthor={Ziyi Huang and Henry Lam and Haofeng Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=GBjukBaBLXK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=GBjukBaBLXK", "pdf_size": 0, "rating": "4;4;7;8", "confidence": "3;4;4;4", "wc_review": "1082;1421;782;183", "wc_reply_reviewers": "434;424;0;0", "wc_reply_authors": "1142;3655;1078;113", "reply_reviewers": "2;1;0;0", "reply_authors": "2;8;2;1", "rating_avg": [ 5.75, 1.7853571071357126 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 867.0, 455.0335152491517 ], "wc_reply_reviewers_avg": [ 214.5, 214.52913555039558 ], "wc_reply_authors_avg": [ 1497.0, 1310.9162826054148 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 3.25, 2.7726341266023544 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5659164584181102, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=543787524127245693&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Columbia University", "aff_unique_dep": "", "aff_unique_url": "https://www.columbia.edu", "aff_unique_abbr": "Columbia", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "GCXq4UHH7h4", "title": "Selective Sensing: A Data-driven Nonuniform Subsampling Approach for Computation-free On-Sensor Data Dimensionality Reduction", "track": "main", "status": "Reject", "tldr": "", "abstract": "Designing an on-sensor data dimensionality reduction scheme for efficient signal sensing has always been a challenging task. Compressive sensing is a state-of-the-art sensing technique used for on-sensor data dimensionality reduction. However, the undesired computational complexity involved in the sensing stage of compressive sensing limits its practical application in resource-constrained sensor devices or high-data-rate sensor devices dealing with high-dimensional signals. In this paper, we propose a selective sensing framework that adopts the novel concept of data-driven nonuniform subsampling to reduce the dimensionality of acquired signals while retaining the information of interest in a computation-free fashion. Selective sensing adopts a co-optimization methodology to co-train a selective sensing operator with a subsequent information decoding neural network. We take image as the sensing modality and reconstruction as the information decoding task to demonstrate the 1st proof-of-concept of selective sensing. The experiment results on CIFAR10, Set5 and Set14 datasets show that selective sensing can achieve an average reconstruction accuracy improvement in terms of PSNR/SSIM by 3.73dB/0.07 and 9.43dB/0.16 over compressive sensing and uniform subsampling counterparts across the compression ratios of 4-32x, respectively. Source code is available at https://figshare.com/s/519a923fae8f386d7f5b", "keywords": "Compressive sensing;nonuniform subsampling;machine learning", "primary_area": "", "supplementary_material": "/attachment/a2e17ed5fce6bd9e0e60eb552f93e51c04ef5b45.zip", "author": "Zhikang Zhang;Kai Xu;Fengbo Ren", "authorids": "~Zhikang_Zhang1;kaixu@asu.edu;~Fengbo_Ren1", "gender": ";;M", "homepage": ";;https://cidse.engineering.asu.edu/directory/ren-fengbo/", "dblp": ";;23/11242", "google_scholar": ";;f-wp99AAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Zhikang_Zhang1;kaixu@asu.edu;~Fengbo_Ren1", "aff": ";;Arizona State University", "aff_domain": ";;asu.edu", "position": ";;Assistant Professor", "bibtex": "@misc{\nzhang2021selective,\ntitle={Selective Sensing: A Data-driven Nonuniform Subsampling Approach for Computation-free On-Sensor Data Dimensionality Reduction},\nauthor={Zhikang Zhang and Kai Xu and Fengbo Ren},\nyear={2021},\nurl={https://openreview.net/forum?id=GCXq4UHH7h4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=GCXq4UHH7h4", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "5;5;5;4", "wc_review": "415;187;231;367", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1019;698;712;1107", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;2", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 300.0, 93.8669270829721 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 884.0, 181.75120357235602 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13457559426435520675&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "Arizona State University", "aff_unique_dep": "", "aff_unique_url": "https://www.asu.edu", "aff_unique_abbr": "ASU", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "GEpTemgn7cq", "title": "Dependency Structure Discovery from Interventions", "track": "main", "status": "Reject", "tldr": "", "abstract": "Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data. However, there are theoretical limitations on the identifiability of underlying structures obtained from observational data alone. Interventional data provides much richer information about the underlying data-generating process. However, the extension and application of methods designed for observational data to include interventions is not straightforward and remains an open problem. In this paper we provide a general framework based on continuous optimization and neural networks to create models for the combination of observational and interventional data. The proposed method is applicable even in the challenging and realistic case that the identity of the intervened upon variable is unknown. We examine the proposed method in the setting of graph recovery both de novo and from a partially-known edge set. We establish strong benchmark results on several structure learning tasks, including structure recovery of both synthetic graphs as well as standard graphs from the Bayesian Network Repository.", "keywords": "structure learning;deep learning;continuous;optimization", "primary_area": "", "supplementary_material": "/attachment/84d7442e68e6d53f1364007c6bd50440c0cbc65a.zip", "author": "Nan Rosemary Ke;Olexa Bilaniuk;Anirudh Goyal;Stefan Bauer;Bernhard Sch\u00f6lkopf;Michael Curtis Mozer;Hugo Larochelle;Christopher Pal;Yoshua Bengio", "authorids": "~Nan_Rosemary_Ke1;~Olexa_Bilaniuk1;~Anirudh_Goyal1;~Stefan_Bauer1;~Bernhard_Sch\u00f6lkopf1;~Michael_Curtis_Mozer1;~Hugo_Larochelle1;~Christopher_Pal1;~Yoshua_Bengio1", "gender": "F;M;M;;;M;M;;M", "homepage": "https://nke001.github.io/;;https://anirudh9119.github.io/;https://cifar.ca/bios/stefan-bauer/;;https://www.cs.colorado.edu/~mozer;https://mila.quebec/en/directory/hugo-larochelle;https://scholar.google.ca/citations?user=1ScWJOoAAAAJ&hl=en&oi=ao;http://yoshuabengio.org", "dblp": "120/5291;158/5760;172/1039;;;m/MichaelCMozer;86/3862.html;45/1217;56/953", "google_scholar": "https://scholar.google.ca/citations?user=dxwPYhQAAAAJ;;krrh6OUAAAAJ;O-oICE8AAAAJ;;lmjR_qMAAAAJ;https://scholar.google.ca/citations?user=U89FHq4AAAAJ;https://scholar.google.ca/citations?user=1ScWJOoAAAAJ;kukA0LcAAAAJ", "orcid": ";;;;;;;;", "linkedin": ";;;;;;;;yoshuabengio/?originalSubdomain=ca", "or_profile": "~Nan_Rosemary_Ke1;~Olexa_Bilaniuk1;~Anirudh_Goyal1;~Stefan_Bauer1;~Bernhard_Sch\u00f6lkopf1;~Michael_Curtis_Mozer1;~Hugo_Larochelle1;~Christopher_Pal1;~Yoshua_Bengio1", "aff": "Mila;;University of Montreal;Max Planck Institute for Intelligent Systems, Max-Planck Institute;;Google DeepMind;Universit\u00e9 de Sherbrooke;Polytechnique Montreal;University of Montreal", "aff_domain": "mila.quebec;;umontreal.ca;tuebingen.mpg.de;;google.com;usherbrooke.ca;polymtl.ca;umontreal.ca", "position": "PhD student;;PhD student;Research Group Leader;;Research Scientist;Adjunct Professor;Full Professor;Full Professor", "bibtex": "@misc{\nke2021dependency,\ntitle={Dependency Structure Discovery from Interventions},\nauthor={Nan Rosemary Ke and Olexa Bilaniuk and Anirudh Goyal and Stefan Bauer and Bernhard Sch{\\\"o}lkopf and Michael Curtis Mozer and Hugo Larochelle and Christopher Pal and Yoshua Bengio},\nyear={2021},\nurl={https://openreview.net/forum?id=GEpTemgn7cq}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=GEpTemgn7cq", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;5;4;3", "wc_review": "397;1486;142;206", "wc_reply_reviewers": "0;388;0;0", "wc_reply_authors": "768;2297;563;297", "reply_reviewers": "0;1;0;0", "reply_authors": "2;6;3;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 557.75, 544.0736967544011 ], "wc_reply_reviewers_avg": [ 97.0, 168.0089283341811 ], "wc_reply_authors_avg": [ 981.25, 777.7860808088558 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 3.0, 1.8708286933869707 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:mZGFhgTU9d4J:scholar.google.com/&scioq=Dependency+Structure+Discovery+from+Interventions&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2;3;4;5;1", "aff_unique_norm": "Mila;University of Montreal;Max Planck Institute for Intelligent Systems;Google;Universit\u00e9 de Sherbrooke;Polytechnique Montreal", "aff_unique_dep": "Quebec Artificial Intelligence Institute;;Intelligent Systems;Google DeepMind;;", "aff_unique_url": "https://mila.quebec;https://wwwumontreal.ca;https://www.mpi-is.mpg.de;https://deepmind.com;https://www.usherbrooke.ca;https://www.polymtl.ca", "aff_unique_abbr": "Mila;UM;MPI-IS;DeepMind;UdeS;PolyMTL", "aff_campus_unique_index": "1", "aff_campus_unique": ";Montreal", "aff_country_unique_index": "0;0;1;2;0;0;0", "aff_country_unique": "Canada;Germany;United Kingdom" }, { "title": "Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2670", "id": "GFsU8a0sGB", "poster": "", "openreview": "https://openreview.net/forum?id=GFsU8a0sGB", "slides": "https://iclr.cc/virtual/2021/poster/2670", "video": "https://iclr.cc/virtual/2021/poster/2670", "author_site": "Maruan Al-Shedivat, Jennifer Gillenwater, Eric P Xing, Afshin Rostamizadeh", "tldr": "", "abstract": "Federated learning is typically approached as an optimization problem, where the goal is to minimize a global loss function by distributing computation across client devices that possess local data and specify different parts of the global objective. We present an alternative perspective and formulate federated learning as a posterior inference problem, where the goal is to infer a global posterior distribution by having client devices each infer the posterior of their local data. While exact inference is often intractable, this perspective provides a principled way to search for global optima in federated settings. Further, starting with the analysis of federated quadratic objectives, we develop a computation- and communication-efficient approximate posterior inference algorithm\u2014federated posterior averaging (FedPA). Our algorithm uses MCMC for approximate inference of local posteriors on the clients and efficiently communicates their statistics to the server, where the latter uses them to refine a global estimate of the posterior mode. Finally, we show that FedPA generalizes federated averaging (FedAvg), can similarly benefit from adaptive optimizers, and yields state-of-the-art results on four realistic and challenging benchmarks, converging faster, to better optima.", "keywords": "federated learning;posterior inference;MCMC", "primary_area": "", "supplementary_material": "", "author": "Maruan Al-Shedivat;Jennifer Gillenwater;Eric Xing;Afshin Rostamizadeh", "authorids": "~Maruan_Al-Shedivat1;~Jennifer_Gillenwater1;~Eric_Xing1;~Afshin_Rostamizadeh1", "gender": "M;F;M;", "homepage": "http://maruan.alshedivat.com;http://jgillenw.com;http://www.cs.cmu.edu/~epxing/;", "dblp": "149/1273;73/3828;36/3855;97/4479", "google_scholar": "iUe4TdgAAAAJ;5lUnZgsAAAAJ;https://scholar.google.com.tw/citations?user=5pKTRxEAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Maruan_Al-Shedivat1;~Jennifer_Gillenwater1;~Eric_Xing1;~Afshin_Rostamizadeh1", "aff": "Carnegie Mellon University;Google;School of Computer Science, Carnegie Mellon University;Google", "aff_domain": "cmu.edu;google.com;cs.cmu.edu;google.com", "position": "PhD student;Research Scientist;Full Professor;Researcher", "bibtex": "@inproceedings{\nal-shedivat2021federated,\ntitle={Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms},\nauthor={Maruan Al-Shedivat and Jennifer Gillenwater and Eric Xing and Afshin Rostamizadeh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=GFsU8a0sGB}\n}", "github": "[![github](/images/github_icon.svg) alshedivat/fedpa](https://github.com/alshedivat/fedpa)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;2;3", "wc_review": "356;216;442", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "878;368;476", "reply_reviewers": "0;0;0", "reply_authors": "2;1;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 338.0, 93.13789060670564 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 574.0, 219.43563976710803 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 146, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2486025806014234529&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=GFsU8a0sGB", "email": "cmu.edu;google.com;cs.cmu.edu;google.com", "author_num": 4, "aff_unique_index": "0;1;0;1", "aff_unique_norm": "Carnegie Mellon University;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.cmu.edu;https://www.google.com", "aff_unique_abbr": "CMU;Google", "aff_campus_unique_index": "1;2;1", "aff_campus_unique": ";Mountain View;Pittsburgh", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "A Geometric Analysis of Deep Generative Image Models and Its Applications", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3366", "id": "GH7QRzUDdXG", "poster": "", "openreview": "https://openreview.net/forum?id=GH7QRzUDdXG", "slides": "https://iclr.cc/virtual/2021/poster/3366", "video": "https://iclr.cc/virtual/2021/poster/3366", "author_site": "Binxu Wang, Carlos Ponce", "tldr": "", "abstract": "Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets, such as natural images. These networks are trained to map random inputs in their latent space to new samples representative of the learned data. However, the structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator, which limits the usefulness of the models. Understanding the latent space requires a way to identify input codes for existing real-world images (inversion), and a way to identify directions with known image transformations (interpretability). Here, we use a geometric framework to address both issues simultaneously. We develop an architecture-agnostic method to compute the Riemannian metric of the image manifold created by GANs. The eigen-decomposition of the metric isolates axes that account for different levels of image variability. An empirical analysis of several pretrained GANs shows that image variation around each position is concentrated along surprisingly few major axes (the space is highly anisotropic) and the directions that create this large variation are similar at different positions in the space (the space is homogeneous). We show that many of the top eigenvectors correspond to interpretable transforms in the image space, with a substantial part of eigenspace corresponding to minor transforms which could be compressed out. This geometric understanding unifies key previous results related to GAN interpretability. We show that the use of this metric allows for more efficient optimization in the latent space (e.g. GAN inversion) and facilitates unsupervised discovery of interpretable axes. Our results illustrate that defining the geometry of the GAN image manifold can serve as a general framework for understanding GANs. ", "keywords": "Deep generative model;Interpretability;GAN;Differential Geometry;Optimization;Model Inversion;Feature Visualization", "primary_area": "", "supplementary_material": "/attachment/8a4efa3e6f5f6879ac469fabc9169971acf116ed.zip", "author": "Binxu Wang;Carlos R Ponce", "authorids": "~Binxu_Wang1;~Carlos_R_Ponce1", "gender": "F;M", "homepage": "https://scholar.harvard.edu/binxuw/home;", "dblp": "216/9752;", "google_scholar": "8-njUc8AAAAJ;", "orcid": "0000-0002-2741-169X;0000-0002-9887-3234", "linkedin": ";", "or_profile": "~Binxu_Wang1;~Carlos_R_Ponce1", "aff": "Washington University, St. Louis;Washington University, St. Louis", "aff_domain": "wustl.edu;wustl.edu", "position": "PhD student;Assistant Professor", "bibtex": "@inproceedings{\nwang2021a,\ntitle={A Geometric Analysis of Deep Generative Image Models and Its Applications},\nauthor={Binxu Wang and Carlos R Ponce},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=GH7QRzUDdXG}\n}", "github": "[![github](/images/github_icon.svg) Animadversio/GAN-Geometry](https://github.com/Animadversio/GAN-Geometry)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;4;4;3", "wc_review": "886;382;425;361", "wc_reply_reviewers": "0;302;0;0", "wc_reply_authors": "2073;2206;644;914", "reply_reviewers": "0;1;0;0", "reply_authors": "3;4;1;2", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 513.5, 216.29667126426148 ], "wc_reply_reviewers_avg": [ 75.5, 130.76983597145025 ], "wc_reply_authors_avg": [ 1459.25, 688.5228300499556 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.5, 1.118033988749895 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "pdf": "https://openreview.net/pdf?id=GH7QRzUDdXG", "email": "wustl.edu;wustl.edu", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Washington University in St. Louis", "aff_unique_dep": "", "aff_unique_url": "https://wustl.edu", "aff_unique_abbr": "WUSTL", "aff_campus_unique_index": "0;0", "aff_campus_unique": "St. Louis", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "GHCu1utcBvX", "title": "Transferability of Compositionality", "track": "main", "status": "Reject", "tldr": "", "abstract": "Compositional generalization is the algebraic capacity to understand and produce large amount of novel combinations from known components. It is a key element of human intelligence for out-of-distribution generalization. To equip neural networks with such ability, many algorithms have been proposed to extract compositional representations from the training distribution. However, it has not been discussed whether the trained model can still extract such representations in the test distribution. In this paper, we argue that the extraction ability does not transfer naturally, because the extraction network suffers from the divergence of distributions. To address this problem, we propose to use an auxiliary reconstruction network with regularized hidden representations as input, and optimize the representations during inference. The proposed approach significantly improves accuracy, showing more than a 20% absolute increase in various experiments compared with baselines. To our best knowledge, this is the first work to focus on the transferability of compositionality, and it is orthogonal to existing efforts of learning compositional representations in training distribution. We hope this work will help to advance compositional generalization and artificial intelligence research.", "keywords": "Compositionality", "primary_area": "", "supplementary_material": "/attachment/d2262ea1053f19c20bea2a523de01583340af765.zip", "author": "Yuanpeng Li;Liang Zhao;Joel Hestness;Ka Yee Lun;Kenneth Church;Mohamed Elhoseiny", "authorids": "~Yuanpeng_Li2;~Liang_Zhao2;~Joel_Hestness2;kayeelun@gmail.com;~Kenneth_Church1;~Mohamed_Elhoseiny1", "gender": "M;F;;;;M", "homepage": ";;;;;http://www.mohamed-elhoseiny.com", "dblp": ";63/5422-6;;;;125/2894", "google_scholar": ";9xMR_iQAAAAJ;;;;iRBUTOAAAAAJ", "orcid": ";;;;;0000-0001-9659-1551", "linkedin": ";liang-zhao-434b2664/;;;;mohamed-elhoseiny-8a836215/", "or_profile": "~Yuanpeng_Li2;~Liang_Zhao2;~Joel_Hestness2;kayeelun@gmail.com;~Kenneth_Church1;~Mohamed_Elhoseiny1", "aff": ";Samsung Research America;;;;KAUST", "aff_domain": ";samsung.com;;;;kaust.edu.sa", "position": ";Staff Researcher;;;;Associate Professor", "bibtex": "@misc{\nli2021transferability,\ntitle={Transferability of Compositionality},\nauthor={Yuanpeng Li and Liang Zhao and Joel Hestness and Ka Yee Lun and Kenneth Church and Mohamed Elhoseiny},\nyear={2021},\nurl={https://openreview.net/forum?id=GHCu1utcBvX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=GHCu1utcBvX", "pdf_size": 0, "rating": "2;3;3;4", "confidence": "5;5;3;3", "wc_review": "361;666;364;539", "wc_reply_reviewers": "0;0;131;0", "wc_reply_authors": "410;476;403;363", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 3.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 482.5, 128.12981698262118 ], "wc_reply_reviewers_avg": [ 32.75, 56.72466394788073 ], "wc_reply_authors_avg": [ 413.0, 40.55243519198323 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7316700808213125536&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Samsung;King Abdullah University of Science and Technology", "aff_unique_dep": "Samsung Research America;", "aff_unique_url": "https://www.samsung.com/us/careers/research/;https://www.kaust.edu.sa", "aff_unique_abbr": "SRA;KAUST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Saudi Arabia" }, { "id": "GIeGTl8EYx", "title": "Deep Graph Neural Networks with Shallow Subgraph Samplers", "track": "main", "status": "Reject", "tldr": "", "abstract": "While Graph Neural Networks (GNNs) are powerful models for learning representations on graphs, most state-of-the-art models do not have significant accuracy gain beyond two to three layers. Deep GNNs fundamentally need to address: 1). expressivity challenge due to oversmoothing, and 2). computation challenge due to neighborhood explosion. We propose a simple \"deep GNN, shallow sampler\" design principle to improve both the GNN accuracy and efficiency --- to generate representation of a target node, we use a deep GNN to pass messages only within a shallow, localized subgraph. A properly sampled subgraph may exclude irrelevant or even noisy nodes, and still preserve the critical neighbor features and graph structures. The deep GNN then smooths the informative local signals to enhance feature learning, rather than oversmoothing the global graph signals into just \"white noise\". We theoretically justify why the combination of deep GNNs with shallow samplers yields the best learning performance. We then propose various sampling algorithms and neural architecture extensions to achieve good empirical results. Experiments on five large graphs show that our models achieve significantly higher accuracy and efficiency, compared with state-of-the-art. ", "keywords": "Graph Neural Networks;Graph Sampling;Network Embedding", "primary_area": "", "supplementary_material": "", "author": "Hanqing Zeng;Muhan Zhang;Yinglong Xia;Ajitesh Srivastava;Rajgopal Kannan;Viktor Prasanna;Long Jin;Andrey Malevich;Ren Chen", "authorids": "~Hanqing_Zeng1;~Muhan_Zhang1;yxia@fb.com;~Ajitesh_Srivastava1;rajgopak@usc.edu;~Viktor_Prasanna1;longjin@fb.com;amalevich@fb.com;renchen@fb.com", "gender": "M;M;;M;;M;;;", "homepage": "https://hanqingzeng.com;https://muhanzhang.github.io/;;https://www.ajitesh-srivastava.com/;;https://sites.usc.edu/prasanna;;;", "dblp": "136/2474;157/5518;;77/9528;;p/ViktorKPrasanna;;;", "google_scholar": "ubUx3R0AAAAJ;https://scholar.google.com.hk/citations?user=OBBqkosAAAAJ;;5NtH-JcAAAAJ;;https://scholar.google.com.tw/citations?user=4FQXSP8AAAAJ;;;", "orcid": ";0000-0002-7680-6401;;;;0000-0002-1609-8589;;;", "linkedin": "hanqing-zeng-a9477995/;jerry-muhan-zhang-a33a1777/;;;;;;;", "or_profile": "~Hanqing_Zeng1;~Muhan_Zhang1;yxia@fb.com;~Ajitesh_Srivastava1;rajgopak@usc.edu;~Viktor_Prasanna1;longjin@fb.com;amalevich@fb.com;renchen@fb.com", "aff": "Facebook AI;Meta Facebook;;University of Southern California;;University of Southern California;;;", "aff_domain": "fb.com;fb.com;;usc.edu;;usc.edu;;;", "position": "Intern;Research Scientist;;Assistant Professor;;Full Professor;;;", "bibtex": "@misc{\nzeng2021deep,\ntitle={Deep Graph Neural Networks with Shallow Subgraph Samplers},\nauthor={Hanqing Zeng and Muhan Zhang and Yinglong Xia and Ajitesh Srivastava and Rajgopal Kannan and Viktor Prasanna and Long Jin and Andrey Malevich and Ren Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=GIeGTl8EYx}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=GIeGTl8EYx", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "2;4;4;3", "wc_review": "279;559;387;548", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1252;1591;1124;1262", "reply_reviewers": "0;0;0;0", "reply_authors": "3;4;3;3", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 443.25, 116.73982825068744 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1307.25, 172.62296342028196 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 3.25, 0.4330127018922193 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.0909090909090909, "gs_citation": 33, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7158620918758044890&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1;1", "aff_unique_norm": "Meta;University of Southern California", "aff_unique_dep": "Facebook AI;", "aff_unique_url": "https://www.facebook.com;https://www.usc.edu", "aff_unique_abbr": "Facebook AI;USC", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Los Angeles", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "GJkTaYTmzVS", "title": "Play to Grade: Grading Interactive Coding Games as Classifying Markov Decision Process", "track": "main", "status": "Reject", "tldr": "", "abstract": "Contemporary coding education often present students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games. While pedagogically compelling, grading such student programs requires dynamic user inputs, therefore they are difficult to grade by unit tests. In this paper we formalize the challenge of grading interactive programs as a task of classifying Markov Decision Processes (MDPs). Each student's program fully specifies an MDP where the agent needs to operate and decide, under reasonable generalization, if the dynamics and reward model of the input MDP conforms to a set of latent MDPs. We demonstrate that by experiencing a handful of latent MDPs millions of times, we can use the agent to sample trajectories from the input MDP and use a classifier to determine membership. Our method drastically reduces the amount of data needed to train an automatic grading system for interactive code assignments and present a challenge to state-of-the-art reinforcement learning generalization methods. Together with Code.org, we curated a dataset of 700k student submissions, one of the largest dataset of anonymized student submissions to a single assignment. This Code.org assignment had no previous solution for automatically providing correctness feedback to students and as such this contribution could lead to meaningful improvement in educational experience.", "keywords": "Deep Reinforcement Learning;Education;Automated Grading;Program Testing", "primary_area": "", "supplementary_material": "/attachment/99f4277caebc237dacf997bbdcd8a325a9d0e682.zip", "author": "Allen Nie;Emma Brunskill;Chris Piech", "authorids": "~Allen_Nie1;~Emma_Brunskill2;chrisjpiech@gmail.com", "gender": "M;;", "homepage": "https://anie.me;;", "dblp": "207/7996;;", "google_scholar": "r90OelAAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Allen_Nie1;~Emma_Brunskill2;chrisjpiech@gmail.com", "aff": "Stanford University;;", "aff_domain": "stanford.edu;;", "position": "PhD student;;", "bibtex": "@misc{\nnie2021play,\ntitle={Play to Grade: Grading Interactive Coding Games as Classifying Markov Decision Process},\nauthor={Allen Nie and Emma Brunskill and Chris Piech},\nyear={2021},\nurl={https://openreview.net/forum?id=GJkTaYTmzVS}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=GJkTaYTmzVS", "pdf_size": 0, "rating": "3;4;5", "confidence": "4;4;4", "wc_review": "751;919;551", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 740.3333333333334, 150.4245842791515 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:qurTUYfB12UJ:scholar.google.com/&scioq=Play+to+Grade:+Grading+Interactive+Coding+Games+as+Classifying+Markov+Decision+Process&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Stanford University", "aff_unique_dep": "", "aff_unique_url": "https://www.stanford.edu", "aff_unique_abbr": "Stanford", "aff_campus_unique_index": "0", "aff_campus_unique": "Stanford", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "GJkY3ptA3vJ", "title": "Towards Robust Textual Representations with Disentangled Contrastive Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Although the self-supervised pre-training of transformer models has resulted in the revolutionizing of natural language processing (NLP) applications and the achievement of state-of-the-art results with regard to various benchmarks, this process is still vulnerable to small and imperceptible permutations originating from legitimate inputs. Intuitively, the representations should be similar in the feature space with subtle input permutations, while large variations occur with different meanings. This motivates us to investigate the learning of robust textual representation in a contrastive manner. However, it is non-trivial to obtain opposing semantic instances for textual samples. In this study, we propose a disentangled contrastive learning method that separately optimizes the uniformity and alignment of representations without negative sampling. Specifically, we introduce the concept of momentum representation consistency to align features and leverage power normalization while conforming the uniformity. Our experimental results for the NLP benchmarks demonstrate that our approach can obtain better results compared with the baselines, as well as achieve promising improvements with invariance tests and adversarial attacks.", "keywords": "Robustness;Contrastive Learning;Textual Representation Learning;Natural Language Processing", "primary_area": "", "supplementary_material": "/attachment/4edcd9d3d77a4b3ee9f9bb882cb0748b1406d6c3.zip", "author": "Ningyu Zhang;Xiang Chen;Xin Xie;Shumin Deng;Yantao Jia;Zonggang Yuan;Huajun Chen", "authorids": "~Ningyu_Zhang1;~Xiang_Chen5;~Xin_Xie2;~Shumin_Deng1;~Yantao_Jia1;~Zonggang_Yuan1;~Huajun_Chen1", "gender": "M;M;M;F;M;M;M", "homepage": "https://person.zju.edu.cn/en/ningyu;https://faculty.nuaa.edu.cn/ChenXiang/zh_CN/index.htm;http://www.cheasim.com;https://231sm.github.io/;http://www.bigdatalab.ac.cn/~jyt/;;", "dblp": "139/4181-1.html;64/3062-16;;213/1853;;94/5089;", "google_scholar": "xQDOPvsAAAAJ;pXivdn8AAAAJ;;3am3hL4AAAAJ;;;", "orcid": "0000-0002-1970-0678;0000-0002-2594-0600;;;;;", "linkedin": "ningyuzhang/;;;;;;bruce-yuan-78878019/", "or_profile": "~Ningyu_Zhang1;~Xiang_Chen5;~Xin_Xie2;~Shumin_Deng1;~Yantao_Jia1;~Huajun_Chen1;~Yuan_Zonggang1", "aff": "Zhejiang University;Zhejiang University;Zhejiang University;Zhejiang University;Huawei Technologies Ltd.;Zhejiang University;", "aff_domain": "zju.edu.cn;zju.edu.cn;zju.edu.cn;zju.edu.cn;huawei.com;zju.edu.cn;", "position": "Assistant Professor;PhD student;MS student;PhD student;Principal Researcher;Full Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=GJkY3ptA3vJ", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "3;4;4;4", "wc_review": "355;254;491;316", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 354.0, 86.910873888139 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:rSO2r2-dQ2EJ:scholar.google.com/&scioq=Towards+Robust+Textual+Representations+with+Disentangled+Contrastive+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;1;0", "aff_unique_norm": "Zhejiang University;Huawei", "aff_unique_dep": ";Huawei Technologies", "aff_unique_url": "https://www.zju.edu.cn;https://www.huawei.com", "aff_unique_abbr": "ZJU;Huawei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "China" }, { "id": "GJnpCsLQThe", "title": "Gradient Descent Ascent for Min-Max Problems on Riemannian Manifolds", "track": "main", "status": "Reject", "tldr": "", "abstract": "In the paper, we study a class of useful non-convex minimax optimization problems on Riemanian manifolds and propose a class of Riemanian gradient descent ascent algorithms to solve these minimax problems. Specifically, we propose a new\nRiemannian gradient descent ascent (RGDA) algorithm for the deterministic minimax optimization.\nMoreover, we prove that the RGDA has a sample complexity of $O(\\kappa^2\\epsilon^{-2})$ for finding an $\\epsilon$-stationary point of the nonconvex strongly-concave minimax problems, where $\\kappa$ denotes the condition number.\nAt the same time, we introduce a Riemannian stochastic gradient descent ascent (RSGDA) algorithm for the stochastic minimax optimization. In the theoretical analysis, we prove that the RSGDA can achieve a sample complexity of $O(\\kappa^4\\epsilon^{-4})$.\nTo further reduce the sample complexity, we propose a novel momentum variance-reduced Riemannian stochastic gradient descent ascent (MVR-RSGDA) algorithm based on a new momentum variance-reduced technique of STORM. We prove that the MVR-RSGDA algorithm achieves a lower sample complexity of $\\tilde{O}(\\kappa^{4}\\epsilon^{-3})$ without large batches, which reaches near the best known sample complexity for its Euclidean counterparts. Extensive experimental results on the robust deep neural networks training over Stiefel manifold demonstrate the efficiency of our proposed algorithms.", "keywords": "Min-Max Optimization;Riemannian Manifold;Robust Training", "primary_area": "", "supplementary_material": "", "author": "Feihu Huang;Shangqian Gao;Heng Huang", "authorids": "~Feihu_Huang1;~Shangqian_Gao1;~Heng_Huang1", "gender": "M;;M", "homepage": ";;https://www.cs.umd.edu/~heng/", "dblp": "169/6247;195/2523;03/281", "google_scholar": "tRQwlHUAAAAJ;9mNI83oAAAAJ;4OqLaDwAAAAJ", "orcid": "0000-0003-0806-6074;;", "linkedin": ";;", "or_profile": "~Feihu_Huang1;~Shangqian_Gao1;~Heng_Huang1", "aff": "University of Pittsburgh;University of Pittsburgh;University of Pittsburgh", "aff_domain": "pitt.edu;pitt.edu;pitt.edu", "position": "Senior Postdoc;PhD student;Full Professor", "bibtex": "@misc{\nhuang2021gradient,\ntitle={Gradient Descent Ascent for Min-Max Problems on Riemannian Manifolds},\nauthor={Feihu Huang and Shangqian Gao and Heng Huang},\nyear={2021},\nurl={https://openreview.net/forum?id=GJnpCsLQThe}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=GJnpCsLQThe", "pdf_size": 0, "rating": "4;4;5;7", "confidence": "4;3;4;4", "wc_review": "617;254;647;285", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "607;754;759;387", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 450.75, 181.89059211515035 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 626.75, 151.28842487117115 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.4714045207910316, "gs_citation": 36, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17791197849128419805&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Pittsburgh", "aff_unique_dep": "", "aff_unique_url": "https://www.pitt.edu", "aff_unique_abbr": "Pitt", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Unsupervised Object Keypoint Learning using Local Spatial Predictability", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3023", "id": "GJwMHetHc73", "poster": "", "openreview": "https://openreview.net/forum?id=GJwMHetHc73", "slides": "https://iclr.cc/virtual/2021/poster/3023", "video": "https://iclr.cc/virtual/2021/poster/3023", "author_site": "Anand Gopalakrishnan, Sjoerd van Steenkiste, J\u00fcrgen Schmidhuber", "tldr": "", "abstract": "We propose PermaKey, a novel approach to representation learning based on object keypoints. It leverages the predictability of local image regions from spatial neighborhoods to identify salient regions that correspond to object parts, which are then converted to keypoints. Unlike prior approaches, it utilizes predictability to discover object keypoints, an intrinsic property of objects. This ensures that it does not overly bias keypoints to focus on characteristics that are not unique to objects, such as movement, shape, colour etc. We demonstrate the efficacy of PermaKey on Atari where it learns keypoints corresponding to the most salient object parts and is robust to certain visual distractors. Further, on downstream RL tasks in the Atari domain we demonstrate how agents equipped with our keypoints outperform those using competing alternatives, even on challenging environments with moving backgrounds or distractor objects.\n", "keywords": "unsupervised representation learning;object-keypoint representations;visual saliency", "primary_area": "", "supplementary_material": "", "author": "Anand Gopalakrishnan;Sjoerd van Steenkiste;J\u00fcrgen Schmidhuber", "authorids": "~Anand_Gopalakrishnan1;~Sjoerd_van_Steenkiste1;~J\u00fcrgen_Schmidhuber1", "gender": "M;M;M", "homepage": "https://agopal42.github.io/;http://www.sjoerdvansteenkiste.com/;http://people.idsia.ch/~juergen/", "dblp": "191/1040;183/9326;s/JurgenSchmidhuber", "google_scholar": "SsbgJ1UAAAAJ;i-AStBYAAAAJ;https://scholar.google.ch/citations?user=gLnCTgIAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Anand_Gopalakrishnan1;~Sjoerd_van_Steenkiste1;~J\u00fcrgen_Schmidhuber1", "aff": "Dalle Molle Institute for Artificial Intelligence Research;Dalle Molle Institute for Artificial Intelligence Research (IDSIA);IDSIA", "aff_domain": "idsia.ch;idsia.ch;idsia.ch", "position": "PhD student;Postdoc;Scientific Director", "bibtex": "@inproceedings{\ngopalakrishnan2021unsupervised,\ntitle={Unsupervised Object Keypoint Learning using Local Spatial Predictability},\nauthor={Anand Gopalakrishnan and Sjoerd van Steenkiste and J{\\\"u}rgen Schmidhuber},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=GJwMHetHc73}\n}", "github": "[![github](/images/github_icon.svg) agopal42/permakey](https://github.com/agopal42/permakey)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;7;9", "confidence": "1;4;3", "wc_review": "404;603;482", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "623;1705;988", "reply_reviewers": "0;0;0", "reply_authors": "1;3;2", "rating_avg": [ 7.333333333333333, 1.247219128924647 ], "confidence_avg": [ 2.6666666666666665, 1.247219128924647 ], "wc_review_avg": [ 496.3333333333333, 81.87117251443813 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1105.3333333333333, 449.4487982209122 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.816496580927726 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.49999999999999994, "gs_citation": 32, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2846223975982040461&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=GJwMHetHc73", "email": "idsia.ch;idsia.ch;idsia.ch", "author_num": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Dalle Molle Institute for Artificial Intelligence Research;Institute of Digital Technologies", "aff_unique_dep": "Artificial Intelligence Research;", "aff_unique_url": "http://www.dallemolle.ch/;https://www.idsia.ch", "aff_unique_abbr": "DMI;IDSIA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Switzerland" }, { "id": "GKLLd9FOe5l", "title": "Online Testing of Subgroup Treatment Effects Based on Value Difference", "track": "main", "status": "Reject", "tldr": "", "abstract": "Online A/B testing plays a critical role in high-tech industry to guide product development and accelerate innovation. It performs a null hypothesis statistical test to determine which variant is better. However, a typical A/B test presents two problems: (i) a fixed-horizon framework inflates the false positive errors under continuous monitoring; (ii) the homogeneous effects assumption fails to identify a subgroup with a beneficial treatment effect. In this paper, we propose a sequential test for subgroup treatment effects based on value difference, named SUBTLE, to address these two problems simultaneously. The SUBTLE allows the experimenters to \"peek\" the results during the experiment without harming the statistical guarantees. It assumes heterogeneous treatment effects and aims to test if some subgroup of the population will benefit from the investigative treatment. If the testing result indicates the existence of such subgroup, a subgroup will be identified using a readily available estimated optimal treatment rule. We examine the empirical performance of our proposed test on both simulations and a real data set. The results show that the SUBTLE has high detection power with controlled type I error at any time, is more robust to noise covariates, and can achieve early stopping compared with the corresponding fixed-horizon test.", "keywords": "online A/B testing;subgroup treatment effects testing;continuous monitoring;supervised representation learning;classification", "primary_area": "", "supplementary_material": "", "author": "Miao Yu;Wenbin Lu;Rui Song", "authorids": "~Miao_Yu5;~Wenbin_Lu1;~Rui_Song2", "gender": "F;M;", "homepage": "https://www.linkedin.com/in/miao-yu-4a5349125/;https://statistics.sciences.ncsu.edu/people/wlu4/;https://song-ray.github.io/", "dblp": ";;01/2743-6.html", "google_scholar": "qq0bbcgAAAAJ;;", "orcid": ";;0000-0003-1875-2115", "linkedin": ";;", "or_profile": "~Miao_Yu5;~Wenbin_Lu1;~Rui_Song2", "aff": "North Carolina State University;North Carolina State University;North Carolina State University", "aff_domain": "ncsu.edu;ncsu.edu;ncsu.edu", "position": "PhD student;Full Professor;Full Professor", "bibtex": "@misc{\nyu2021online,\ntitle={Online Testing of Subgroup Treatment Effects Based on Value Difference},\nauthor={Miao Yu and Wenbin Lu and Rui Song},\nyear={2021},\nurl={https://openreview.net/forum?id=GKLLd9FOe5l}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=GKLLd9FOe5l", "pdf_size": 0, "rating": "3;5;7;7", "confidence": "4;4;3;5", "wc_review": "247;510;389;88", "wc_reply_reviewers": "35;0;0;0", "wc_reply_authors": "560;757;466;92", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.6583123951777 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 308.5, 157.70621420857202 ], "wc_reply_reviewers_avg": [ 8.75, 15.155444566227676 ], "wc_reply_authors_avg": [ 468.75, 241.53816986141135 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18372573824484735241&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "North Carolina State University", "aff_unique_dep": "", "aff_unique_url": "https://www.ncsu.edu", "aff_unique_abbr": "NCSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "A Design Space Study for LISTA and Beyond", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2808", "id": "GMgHyUPrXa", "poster": "", "openreview": "https://openreview.net/forum?id=GMgHyUPrXa", "slides": "https://iclr.cc/virtual/2021/poster/2808", "video": "https://iclr.cc/virtual/2021/poster/2808", "author_site": "Tianjian Meng, Xiaohan Chen, Yifan Jiang, Zhangyang Wang", "tldr": "", "abstract": "In recent years, great success has been witnessed in building problem-specific deep networks from unrolling iterative algorithms, for solving inverse problems and beyond. Unrolling is believed to incorporate the model-based prior with the learning capacity of deep learning. This paper revisits \\textit{the role of unrolling as a design approach for deep networks}: to what extent its resulting special architecture is superior, and can we find better? Using LISTA for sparse recovery as a representative example, we conduct the first thorough \\textit{design space study} for the unrolled models. Among all possible variations, we focus on extensively varying the connectivity patterns and neuron types, leading to a gigantic design space arising from LISTA. To efficiently explore this space and identify top performers, we leverage the emerging tool of neural architecture search (NAS). We carefully examine the searched top architectures in a number of settings, and are able to discover networks that consistently better than LISTA. We further present more visualization and analysis to ``open the black box\", and find that the searched top architectures demonstrate highly consistent and potentially transferable patterns. We hope our study to spark more reflections and explorations on how to better mingle model-based optimization prior and data-driven learning.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Tianjian Meng;Xiaohan Chen;Yifan Jiang;Zhangyang Wang", "authorids": "~Tianjian_Meng2;~Xiaohan_Chen1;~Yifan_Jiang2;~Zhangyang_Wang1", "gender": "M;M;M;M", "homepage": "http://xiaohanchen.com;https://yifanjiang19.github.io/;https://vita-group.github.io;https://mengtianjian.github.io/", "dblp": "94/3802;81/7246-1;119/4026;237/9822", "google_scholar": "https://scholar.google.com/citations?authuser=1;PMeFEOIAAAAJ;pxFyKAIAAAAJ;CWNV8TQAAAAJ", "orcid": "0000-0002-0360-0402;;;", "linkedin": "xiaohan-chen-400b00147/;;;mengtianjian/", "or_profile": "~Xiaohan_Chen1;~Yifan_Jiang2;~Zhangyang_Wang1;~Tianjian_Meng1", "aff": "University of Texas, Austin;University of Texas, Austin;University of Texas, Austin;Google Brain", "aff_domain": "utexas.edu;utexas.edu;utexas.edu;google.com", "position": "PhD student;PhD student;Assistant Professor;Software Engineer", "bibtex": "@inproceedings{\nmeng2021a,\ntitle={A Design Space Study for {\\{}LISTA{\\}} and Beyond},\nauthor={Tianjian Meng and Xiaohan Chen and Yifan Jiang and Zhangyang Wang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=GMgHyUPrXa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "4;6;7;8", "confidence": "3;4;4;4", "wc_review": "497;267;150;681", "wc_reply_reviewers": "0;0;0;158", "wc_reply_authors": "1236;988;312;606", "reply_reviewers": "0;0;0;1", "reply_authors": "2;2;1;1", "rating_avg": [ 6.25, 1.479019945774904 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 398.75, 205.27588143763992 ], "wc_reply_reviewers_avg": [ 39.5, 68.41600689897065 ], "wc_reply_authors_avg": [ 785.5, 353.68736194554646 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.8783100656536799, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6072113213462339634&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=GMgHyUPrXa", "email": "utexas.edu;utexas.edu;utexas.edu;google.com", "author_num": 4, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "University of Texas at Austin;Google", "aff_unique_dep": ";Google Brain", "aff_unique_url": "https://www.utexas.edu;https://brain.google.com", "aff_unique_abbr": "UT Austin;Google Brain", "aff_campus_unique_index": "0;0;0;1", "aff_campus_unique": "Austin;Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "GNv-TyWu3PY", "title": "Robust Learning for Congestion-Aware Routing", "track": "main", "status": "Reject", "tldr": "", "abstract": "We consider the problem of routing users through a network with unknown congestion functions over an infinite time horizon. On each time step $t$, the algorithm receives a routing request and must select a valid path. For each edge $e$ in the selected path, the algorithm incurs a cost $c_e^t = f_e(x_e^t) + \\eta_e^t$, where $x_e^t$ is the flow on edge $e$ at time $t$, $f_e$ is the congestion function, and $\\eta_e^t$ is a noise sample drawn from an unknown distribution. The algorithm observes $c_e^t$, and can use this observation in future routing decisions. The routing requests are supplied adversarially. \n\nWe present an algorithm with cumulative regret $\\tilde{O}(|E| t^{2/3})$, where the regret on each time step is defined as the difference between the total cost incurred by our chosen path and the minimum cost among all valid paths. Our algorithm has space complexity $O(|E| t^{1/3})$ and time complexity $O(|E| \\log t)$. We also validate our algorithm empirically using graphs from New York City road networks.", "keywords": "routing algorithms;adversarial learning;congestion functions", "primary_area": "", "supplementary_material": "", "author": "Sreenivas Gollapudi;Kostas Kollias;Benjamin Plaut;Ameya Velingker", "authorids": "~Sreenivas_Gollapudi2;kostaskollias@google.com;~Benjamin_Plaut2;~Ameya_Velingker1", "gender": "M;;M;M", "homepage": "https://www.sreenivasgollapudi.com;;https://cs.stanford.edu/people/bplaut/;http://www.ameyavelingker.com", "dblp": "https://dblp.uni-trier.de/pers/g/Gollapudi:Sreenivas.html;;178/8624;117/3666.html", "google_scholar": "Ysd-WJgAAAAJ;;6ndk_nAAAAAJ;6dFFudUAAAAJ", "orcid": ";;;", "linkedin": ";;;ameya-velingker-5811b711", "or_profile": "~Sreenivas_Gollapudi2;kostaskollias@google.com;~Benjamin_Plaut2;~Ameya_Velingker1", "aff": "Google;;Stanford University;Google", "aff_domain": "google.com;;stanford.edu;google.com", "position": "Researcher;;PhD student;Research Scientist", "bibtex": "@misc{\ngollapudi2021robust,\ntitle={Robust Learning for Congestion-Aware Routing},\nauthor={Sreenivas Gollapudi and Kostas Kollias and Benjamin Plaut and Ameya Velingker},\nyear={2021},\nurl={https://openreview.net/forum?id=GNv-TyWu3PY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=GNv-TyWu3PY", "pdf_size": 0, "rating": "3;5;7;8", "confidence": "4;4;3;4", "wc_review": "759;447;257;446", "wc_reply_reviewers": "2485;0;0;0", "wc_reply_authors": "1733;533;29;318", "reply_reviewers": "5;0;0;0", "reply_authors": "4;1;1;1", "rating_avg": [ 5.75, 1.920286436967152 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 477.25, 180.1282529199681 ], "wc_reply_reviewers_avg": [ 621.25, 1076.036564202165 ], "wc_reply_authors_avg": [ 653.25, 648.5369592397954 ], "reply_reviewers_avg": [ 1.25, 2.165063509461097 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3758230140014144, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:baFhrVrw4hIJ:scholar.google.com/&scioq=Robust+Learning+for+Congestion-Aware+Routing&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Google;Stanford University", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.stanford.edu", "aff_unique_abbr": "Google;Stanford", "aff_campus_unique_index": "0;1;0", "aff_campus_unique": "Mountain View;Stanford", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "GPuvhWrEdUn", "title": "MixCon: Adjusting the Separability of Data Representations for Harder Data Recovery", "track": "main", "status": "Reject", "tldr": "", "abstract": "To address the issue that deep neural networks (DNNs) are vulnerable to model inversion attacks, we design an objective function to adjust the separability of the hidden data representations as a way to control the trade-off between data utility and vulnerability to inversion attacks. Our method is motivated by the theoretical insights of data separability in neural networking training and results on the hardness of model inversion. Empirically, we show that there exist sweet-spots by adjusting the separability of data representation, such that it is difficult to recover data during inference while maintaining data utility. ", "keywords": "Data Recovery;Data Separability;Distributed Deep Learning", "primary_area": "", "supplementary_material": "/attachment/1124c0ad1c7456ef69d177d0ed709720ce169b9c.zip", "author": "Xiaoxiao Li;Yangsibo Huang;Binghui Peng;Zhao Song;Kai Li", "authorids": "~Xiaoxiao_Li1;~Yangsibo_Huang2;~Binghui_Peng1;~Zhao_Song3;~Kai_Li8", "gender": "Unspecified;F;M;M;M", "homepage": "https://xxlya.github.io/;https://hazelsuko07.github.io/yangsibo/;http://www.cs.columbia.edu/~binghuip/;https://www.youtube.com/@zhaosong2031;https://www.cs.princeton.edu/~li/", "dblp": "71/8042;;210/2619;76/4051-2;l/KaiLi1.html", "google_scholar": "sdENOQ4AAAAJ;NMPUDa0AAAAJ;twlFI3sAAAAJ;yDZct7UAAAAJ;9MSpWOUAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Xiaoxiao_Li1;~Yangsibo_Huang2;~Binghui_Peng1;~Zhao_Song3;~Kai_Li8", "aff": "Princeton University;Princeton University;Columbia University;Princeton University;Princeton University", "aff_domain": "princeton.edu;princeton.edu;columbia.edu;princeton.edu;princeton.edu", "position": "Postdoc;PhD student;PhD student;Postdoc;Full Professor", "bibtex": "@misc{\nli2021mixcon,\ntitle={MixCon: Adjusting the Separability of Data Representations for Harder Data Recovery},\nauthor={Xiaoxiao Li and Yangsibo Huang and Binghui Peng and Zhao Song and Kai Li},\nyear={2021},\nurl={https://openreview.net/forum?id=GPuvhWrEdUn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=GPuvhWrEdUn", "pdf_size": 0, "rating": "5;5;5", "confidence": "4;3;3", "wc_review": "446;403;541", "wc_reply_reviewers": "0;44;0", "wc_reply_authors": "658;837;1197", "reply_reviewers": "0;1;0", "reply_authors": "1;2;2", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 463.3333333333333, 57.656068390258994 ], "wc_reply_reviewers_avg": [ 14.666666666666666, 20.741798914805393 ], "wc_reply_authors_avg": [ 897.3333333333334, 224.1433073331038 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16679921858837389924&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1;0;0", "aff_unique_norm": "Princeton University;Columbia University", "aff_unique_dep": ";", "aff_unique_url": "https://www.princeton.edu;https://www.columbia.edu", "aff_unique_abbr": "Princeton;Columbia", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "GRbZ91LKIya", "title": "Token-Level Contrast for Video and Language Alignment", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Building video and language understanding models requires grounding linguistic concepts and video contents into a shared space. Most of previous works learn a holistic alignment between them while neglecting the token-level grounding. Masked token prediction can be used to learn token-level multi-modal representation, but it does not necessarily force lexical grounding on perception and also introduce a domain-shift between pretraining and fine-tuning. This paper introduces a simple token-level contrastive loss (ToCo) informed by syntactic classes (e.g., nouns and verbs) to force the model to prioritize grounding concrete semantic bearing words. ToCo does not mask inputs but poses both local (contextual token) and global (lexical type) pressures for multi-modal alignment in a contrastive manner. Our approach enables a simple vanilla BERT-based multimodal transformer to compete with or outperform existing heavily engineered multi-loss or large models on three benchmarks (YouCook2, MSR-VTT and CrossTask). Further, it is plug-n-play such that gains are made in both pretraining and downstream tasks solely, regardless of the underlying visual or textual feature representations. ", "keywords": "token-level contrastive loss;video and language alignment;video retrieval;multi-modal representation learning", "primary_area": "", "supplementary_material": "", "author": "Jianwei Yang;Yonatan Bisk;Jianfeng Gao", "authorids": "~Jianwei_Yang1;~Yonatan_Bisk1;~Jianfeng_Gao1", "gender": "M;M;M", "homepage": "http://www.YonatanBisk.com;https://www.microsoft.com/en-us/research/people/jfgao/;https://jwyang.github.io/", "dblp": "38/9282;92/5339;", "google_scholar": "bWoGh8UAAAAJ;https://scholar.google.com/citations?hl=en;Cl9byD8AAAAJ", "orcid": "0000-0002-2111-9081;;", "linkedin": "yonatanbisk/;;", "or_profile": "~Yonatan_Bisk1;~Jianfeng_Gao1;~Jianwei_Yang2", "aff": "Carnegie Mellon University;Microsoft Research;Microsoft", "aff_domain": "cmu.edu;microsoft.com;microsoft.com", "position": "Assistant Professor;Principal Researcher;Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=GRbZ91LKIya", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;4;4", "wc_review": "542;835;805;733", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 728.75, 114.013979406036 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:0kAjqBoVhVsJ:scholar.google.com/&scioq=Token-Level+Contrast+for+Video+and+Language+Alignment&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;1", "aff_unique_norm": "Carnegie Mellon University;Microsoft", "aff_unique_dep": ";Microsoft Research", "aff_unique_url": "https://www.cmu.edu;https://www.microsoft.com/en-us/research", "aff_unique_abbr": "CMU;MSR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "GSTrduvZSjT", "title": "Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adaptive gradient methods are typically used for training over-parameterized models capable of exactly fitting the data; we thus study their convergence in this interpolation setting. Under an interpolation assumption, we prove that AMSGrad with a constant step-size and momentum can converge to the minimizer at the faster $O(1/T)$ rate for smooth, convex functions. Furthermore, in this setting, we show that AdaGrad can achieve an $O(1)$ regret in the online convex optimization framework. When interpolation is only approximately satisfied, we show that constant step-size AMSGrad converges to a neighbourhood of the solution. On the other hand, we prove that AdaGrad is robust to the violation of interpolation and converges to the minimizer at the optimal rate. However, we demonstrate that even for simple, convex problems satisfying interpolation, the empirical performance of these methods heavily depends on the step-size and requires tuning. We alleviate this problem by using stochastic line-search (SLS) and Polyak's step-sizes (SPS) to help these methods adapt to the function's local smoothness. By using these techniques, we prove that AdaGrad and AMSGrad do not require knowledge of problem-dependent constants and retain the convergence guarantees of their constant step-size counterparts. Experimentally, we show that these techniques help improve the convergence and generalization performance across tasks, from binary classification with kernel mappings to classification with deep neural networks.", "keywords": "Adaptive gradient methods;Over-parameterization;Stochastic line-search;Momentum", "primary_area": "", "supplementary_material": "/attachment/52a9215fec6726e01cd794e7b27a8c1d81720c5e.zip", "author": "Sharan Vaswani;Issam H. Laradji;Frederik Kunstner;Si Yi Meng;Mark Schmidt;Simon Lacoste-Julien", "authorids": "~Sharan_Vaswani1;~Issam_H._Laradji1;~Frederik_Kunstner1;~Si_Yi_Meng1;~Mark_Schmidt1;~Simon_Lacoste-Julien1", "gender": "M;M;;;M;F", "homepage": "http://vaswanis.github.io;https://issamlaradji.github.io/;https://fkunstner.github.io/;;http://www.iro.umontreal.ca/~slacoste/;https://www.cs.cornell.edu/~siyimeng/", "dblp": "136/5916;142/0043;230/3921;35/2638;94/446.html;250/9468", "google_scholar": "https://scholar.google.ca/citations?user=bDb2zWwAAAAJ;https://scholar.google.ca/citations?user=8vRS7F0AAAAJ;EhpYjPAAAAAJ;https://scholar.google.com/citations?hl=en;oejm5IUAAAAJ;https://scholar.google.ca/citations?user=Fey3yDgAAAAJ", "orcid": ";;;;0000-0001-6485-6180;", "linkedin": "sharan-vaswani-05b8ab35/;issam-laradji-67ba1a99/;;;simon-lacoste-julien-355b9a3;", "or_profile": "~Sharan_Vaswani1;~Issam_H._Laradji1;~Frederik_Kunstner1;~Mark_Schmidt1;~Simon_Lacoste-Julien1;~Si_Yi_Meng2", "aff": "University of Alberta;Element AI;University of British Columbia;University of British Columbia;Samsung - SAIT AI Lab, Montreal;Cornell University", "aff_domain": "ualberta.ca;elementai.com;cs.ubc.ca;ubc.ca;samsung.com;cornell.edu", "position": "Postdoc;Researcher;PhD student;Assistant Professor;VP Lab Director;PhD student", "bibtex": "@misc{\nvaswani2021adaptive,\ntitle={Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)},\nauthor={Sharan Vaswani and Issam H. Laradji and Frederik Kunstner and Si Yi Meng and Mark Schmidt and Simon Lacoste-Julien},\nyear={2021},\nurl={https://openreview.net/forum?id=GSTrduvZSjT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=GSTrduvZSjT", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;4;4;4", "wc_review": "294;456;218;693", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "296;1036;342;559", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 415.25, 181.9441878708963 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 558.25, 293.1658020642926 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6092367410257667906&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;2;3;4", "aff_unique_norm": "University of Alberta;Element AI;University of British Columbia;Samsung;Cornell University", "aff_unique_dep": ";;;SAIT AI Lab;", "aff_unique_url": "https://www.ualberta.ca;https://www.elementai.com;https://www.ubc.ca;https://www.samsung.com;https://www.cornell.edu", "aff_unique_abbr": "UAlberta;Element AI;UBC;Samsung;Cornell", "aff_campus_unique_index": "1", "aff_campus_unique": ";Montreal", "aff_country_unique_index": "0;0;0;0;0;1", "aff_country_unique": "Canada;United States" }, { "title": "DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2753", "id": "GTGb3M_KcUl", "poster": "", "openreview": "https://openreview.net/forum?id=GTGb3M_KcUl", "slides": "https://iclr.cc/virtual/2021/poster/2753", "video": "https://iclr.cc/virtual/2021/poster/2753", "author_site": "Minjia Zhang, Menghao Li, Chi Wang, Mingqin Li", "tldr": "", "abstract": "Recently, the DL compiler, together with Learning to Compile has proven to be a powerful technique for optimizing deep learning models. However, existing methods focus on accelerating the convergence speed of the individual tensor operator rather than the convergence speed of the entire model, which results in long optimization time to obtain a desired latency.\n\nIn this paper, we present a new method called DynaTune, which provides significantly faster convergence speed to optimize a DNN model. In particular, we consider a Multi-Armed Bandit (MAB) model for the tensor program optimization problem. We use UCB to handle the decision-making of time-slot-based optimization, and we devise a Bayesian belief model that allows predicting the potential performance gain of each operator with uncertainty quantification, which guides the optimization process. We evaluate and compare DynaTune with the state-of-the-art DL compiler. The experiment results show that DynaTune is 1.2--2.4 times faster to achieve the same optimization quality for a range of models across different hardware architectures. ", "keywords": "Efficient Deep Learning Inference;Scalability;Code Compilation;Bayesian Inference", "primary_area": "", "supplementary_material": "", "author": "Minjia Zhang;Menghao Li;Chi Wang;Mingqin Li", "authorids": "~Minjia_Zhang1;t-meli@microsoft.com;~Chi_Wang3;mingqli@microsoft.com", "gender": "M;;M;", "homepage": "https://minjiazhang.github.io/;;http://chiwang.cc;", "dblp": "58/9033;;09/404-1;", "google_scholar": "https://scholar.google.com/citations?hl=en;;https://scholar.google.com/citations?hl=en;", "orcid": "0000-0002-8165-166X;;;", "linkedin": "minjia-zhang-05857226/;;chi-wang-autogen/;", "or_profile": "~Minjia_Zhang1;t-meli@microsoft.com;~Chi_Wang3;mingqli@microsoft.com", "aff": "Microsoft ;;Microsoft Research;", "aff_domain": "microsoft.com;;microsoft.com;", "position": "Principle Researcher;;Principal Researcher;", "bibtex": "@inproceedings{\nzhang2021dynatune,\ntitle={DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation},\nauthor={Minjia Zhang and Menghao Li and Chi Wang and Mingqin Li},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=GTGb3M_KcUl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "5;4;1;2", "wc_review": "369;248;333;207", "wc_reply_reviewers": "108;125;33;17", "wc_reply_authors": "1030;383;841;643", "reply_reviewers": "1;1;1;1", "reply_authors": "2;1;2;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.0, 1.5811388300841898 ], "wc_review_avg": [ 289.25, 64.69302512636119 ], "wc_reply_reviewers_avg": [ 70.75, 46.48857386498321 ], "wc_reply_authors_avg": [ 724.25, 239.87848486264875 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7302967433402215, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13097215821522135018&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=GTGb3M_KcUl", "email": "microsoft.com;;microsoft.com;", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Microsoft Corporation", "aff_unique_url": "https://www.microsoft.com", "aff_unique_abbr": "Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "GThGi8P9Vz", "title": "A Unified Framework for Proximal Methods", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex.", "keywords": "Stochastic Optimization;Deep Learning;Proximal Gradient Descent", "primary_area": "", "supplementary_material": "/attachment/ac045fe3996d16f30acf28a6ef8d0b3ec3ffe7f2.zip", "author": "Jihun Yun;Aurelie Lozano;Eunho Yang", "authorids": "~Jihun_Yun2;~Aurelie_Lozano1;~Eunho_Yang1", "gender": "M;F;M", "homepage": "https://github.com/abcdxyzpqrst;https://research.ibm.com/people/aurelie-lozano;https://sites.google.com/site/hleehome2/", "dblp": "241/9676;06/274;96/2621", "google_scholar": "ELv5qfEAAAAJ;4wTGaDsAAAAJ;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Jihun_Yun2;~Aurelie_Lozano1;~Eunho_Yang1", "aff": "Korea Advanced Institute of Science & Technology;IBM Research;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;us.ibm.com;kaist.ac.kr", "position": "PhD student;Principal Researcher;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=GThGi8P9Vz", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "5;4;4;4", "wc_review": "288;226;564;318", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "714;482;309;300", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 349.0, 128.48735346328837 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 451.25, 168.14781443717905 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Z3EuZ8oWySwJ:scholar.google.com/&scioq=A+Unified+Framework+for+Proximal+Methods&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology;IBM", "aff_unique_dep": ";IBM Research", "aff_unique_url": "https://www.kaist.ac.kr;https://www.ibm.com/research", "aff_unique_abbr": "KAIST;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "South Korea;United States" }, { "id": "GVNGAaY2Dr1", "title": "Multi-Agent Collaboration via Reward Attribution Decomposition", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent advances in multi-agent reinforcement learning (MARL) have achieved super-human performance in games like Quake 3 and Dota 2. Unfortunately, these techniques require orders-of-magnitude more training rounds than humans and don't generalize to new agent configurations even on the same game. In this work, we propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge and supports ad hoc team play. We first formulate multi-agent collaboration as a joint optimization on reward assignment and show that each agent has an approximately optimal policy that decomposes into two parts: one part that only relies on the agent's own state, and the other part that is related to states of nearby agents. Following this novel finding, CollaQ decomposes the Q-function of each agent into a self term and an interactive term, with a Multi-Agent Reward Attribution (MARA) loss that regularizes the training. CollaQ is evaluated on various StarCraft maps and shows that it outperforms existing state-of-the-art techniques (i.e., QMIX, QTRAN, and VDN) by improving the win rate by 40% with the same number of samples. In the more challenging ad hoc team play setting (i.e., reweight/add/remove units without re-training or finetuning), CollaQ outperforms previous SoTA by over 30%. ", "keywords": "multi-agent reinforcement leanring;ad hoc team play", "primary_area": "", "supplementary_material": "/attachment/2c09f3f7f528eb100ca46dda5cb93719c424886b.zip", "author": "Tianjun Zhang;Huazhe Xu;Xiaolong Wang;Yi Wu;Kurt Keutzer;Joseph E. Gonzalez;Yuandong Tian", "authorids": "~Tianjun_Zhang1;~Huazhe_Xu1;~Xiaolong_Wang3;~Yi_Wu1;~Kurt_Keutzer1;~Joseph_E._Gonzalez1;~Yuandong_Tian1", "gender": ";M;M;M;M;M;M", "homepage": "https://tianjunz.github.io;http://hxu.rocks;https://xiaolonw.github.io/;https://jxwuyi.weebly.com;https://people.eecs.berkeley.edu/~keutzer/;http://eecs.berkeley.edu/~jegonzal;http://yuandong-tian.com", "dblp": ";164/9006;91/952-4;;k/KurtKeutzer.html;61/8262;t/YuandongTian", "google_scholar": "UE9jz_MAAAAJ;t9HPFawAAAAJ;Y8O9N_0AAAAJ;dusV5HMAAAAJ;ID9QePIAAAAJ;https://scholar.google.com.tw/citations?user=gM2WW9UAAAAJ;0mgEF28AAAAJ", "orcid": ";;;;0000-0003-3868-8501;0000-0003-2921-956X;0000-0003-4202-4847", "linkedin": ";;;;kurtkeutzer/;;yuandongtian", "or_profile": "~Tianjun_Zhang1;~Huazhe_Xu1;~Xiaolong_Wang3;~Yi_Wu1;~Kurt_Keutzer1;~Joseph_E._Gonzalez1;~Yuandong_Tian1", "aff": "University of California, Berkeley;University of California, Berkeley;University of California, San Diego;Tsinghua University;University of California, Berkeley;University of California, Berkeley;Meta AI (FAIR)", "aff_domain": "berkeley.edu;berkeley.edu;ucsd.edu;tsinghua.edu.cn;berkeley.edu;berkeley.edu;meta.com", "position": "PhD student;Ph.D. Student;Assistant Professor;Assistant Professor;Full Professor;Assistant Professor;Research Scientist", "bibtex": "@misc{\nzhang2021multiagent,\ntitle={Multi-Agent Collaboration via Reward Attribution Decomposition},\nauthor={Tianjun Zhang and Huazhe Xu and Xiaolong Wang and Yi Wu and Kurt Keutzer and Joseph E. Gonzalez and Yuandong Tian},\nyear={2021},\nurl={https://openreview.net/forum?id=GVNGAaY2Dr1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=GVNGAaY2Dr1", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "2;3;4;3", "wc_review": "904;381;224;197", "wc_reply_reviewers": "360;0;0;0", "wc_reply_authors": "633;329;153;45", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 426.5, 284.49648503979796 ], "wc_reply_reviewers_avg": [ 90.0, 155.88457268119896 ], "wc_reply_authors_avg": [ 290.0, 222.46572769754894 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 45, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1500168595958678717&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1;2;0;0;3", "aff_unique_norm": "University of California, Berkeley;University of California, San Diego;Tsinghua University;Meta", "aff_unique_dep": ";;;Facebook AI Research (FAIR)", "aff_unique_url": "https://www.berkeley.edu;https://www.ucsd.edu;https://www.tsinghua.edu.cn;https://ai.facebook.com", "aff_unique_abbr": "UC Berkeley;UCSD;THU;Meta AI", "aff_campus_unique_index": "0;0;1;0;0", "aff_campus_unique": "Berkeley;San Diego;", "aff_country_unique_index": "0;0;0;1;0;0;0", "aff_country_unique": "United States;China" }, { "id": "GWMFRQXSbw", "title": "Adaptive Learning Rates with Maximum Variation Averaging", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Adaptive gradient methods such as RMSProp and Adam use exponential moving estimate of the squared gradient to compute coordinate-wise adaptive step sizes, achieving better convergence than SGD in face of ill-conditioned or noisy objectives. However, Adam can have undesirable convergence behavior due to unstable or extreme adaptive learning rates. Methods such as AMSGrad and AdaBound have been proposed to stabilize the adaptive learning rates of Adam in the later stage of training, but they do not outperform Adam in some practical tasks such as training Transformers. In this paper, we propose an adaptive learning rate principle, in which the running mean of squared gradient is replaced by a weighted mean, with weights chosen to maximize the estimated variance of each coordinate. This gives a worst-case estimate for the local gradient variance, taking smaller steps when large curvatures or noisy gradients are present, which leads to more desirable convergence behavior than Adam. We prove the proposed algorithm converges under mild assumptions for nonconvex stochastic optimization problems, and demonstrate the improved efficacy of our adaptive averaging approach on image classification, machine translation and natural language understanding tasks. Moreover, our method overcomes the non-convergence issue of Adam in BERT pretraining at large batch sizes, while achieving better test performance than \\lamb~in the same setting.", "keywords": "Adaptive Step Size;Large Batch Optimization;Transformers", "primary_area": "", "supplementary_material": "", "author": "Chen Zhu;Yu Cheng;Zhe Gan;Furong Huang;Jingjing Liu;Tom Goldstein", "authorids": "~Chen_Zhu2;~Yu_Cheng1;~Zhe_Gan1;~Furong_Huang1;~Jingjing_Liu2;~Tom_Goldstein1", "gender": "M;M;M;F;;M", "homepage": "http://www.cs.umd.edu/~chenzhu/;https://ych133.github.io;http://zhegan27.github.io/;https://furong-huang.com;https://air.tsinghua.edu.cn/en/info/1046/1194.htm#:~:text=Jingjing%20Liu%20is%20Professor%2C%20Principal,CVPR%2C%20ACL%2C%20etc.);https://www.cs.umd.edu/~tomg/", "dblp": "59/10522-1.html;96/3060-1.html;41/7845;72/8513;30/3008-1;25/8184", "google_scholar": "m-om5O8AAAAJ;https://scholar.google.com/citations?hl=en;E64XWyMAAAAJ;13yyuCcAAAAJ;BzJ_GboAAAAJ;KmSuVtgAAAAJ", "orcid": ";;;;;", "linkedin": ";chengyu05/;zhe-gan-a2229a78/;;jingjing-liu-65703431/;", "or_profile": "~Chen_Zhu2;~Yu_Cheng1;~Zhe_Gan1;~Furong_Huang1;~Jingjing_Liu2;~Tom_Goldstein1", "aff": "Department of Computer Science, University of Maryland, College Park;Microsoft Research;Microsoft;University of Maryland;Microsoft;University of Maryland, College Park", "aff_domain": "cs.umd.edu;microsoft.com;microsoft.com;cs.umd.edu;microsoft.com;umd.edu", "position": "PhD student;Principal Researcher;Principal Researcher;Assistant Professor;Sr Principal Research Manager;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=GWMFRQXSbw", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "4;4;4;5", "wc_review": "666;740;814;226", "wc_reply_reviewers": "234;0;0;0", "wc_reply_authors": "513;666;480;283", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 611.5, 228.6367205852988 ], "wc_reply_reviewers_avg": [ 58.5, 101.32497224277932 ], "wc_reply_authors_avg": [ 485.5, 136.35706802362685 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=288194687698311407&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;1;2;1;2", "aff_unique_norm": "University of Maryland, College Park;Microsoft;University of Maryland", "aff_unique_dep": "Department of Computer Science;Microsoft Research;", "aff_unique_url": "https://www/umd.edu;https://www.microsoft.com/en-us/research;https://www/umd.edu", "aff_unique_abbr": "UMD;MSR;UMD", "aff_campus_unique_index": "0;0", "aff_campus_unique": "College Park;", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "GXJPLbB5P-y", "title": "Simplifying Models with Unlabeled Output Data", "track": "main", "status": "Reject", "tldr": "", "abstract": "We focus on prediction problems with high-dimensional outputs that are subject to output validity constraints, e.g. a pseudocode-to-code translation task where the code must compile. For these problems, labeled input-output pairs are expensive to obtain, but \"unlabeled\" outputs, i.e. outputs without corresponding inputs, are freely available and provide information about output validity (e.g. code on GitHub). In this paper, we present predict-and-denoise, a framework that can leverage unlabeled outputs. Specifically, we first train a denoiser to map possibly invalid outputs to valid outputs using synthetic perturbations of the unlabeled outputs. Second, we train a predictor composed with this fixed denoiser. We show theoretically that for a family of functions with a high-dimensional discrete valid output space, composing with a denoiser reduces the complexity of a 2-layer ReLU network needed to represent the function and that this complexity gap can be arbitrarily large. We evaluate the framework empirically on several datasets, including image generation from attributes and pseudocode-to-code translation. On the SPoC pseudocode-to-code dataset, our framework improves the proportion of code outputs that pass all test cases by 3-5% over a baseline Transformer.", "keywords": "semi-supervised learning;structured prediction", "primary_area": "", "supplementary_material": "/attachment/4ec4bb3f00369ecf766ebdaa57128062c7e3a860.zip", "author": "Sang Michael Xie;Tengyu Ma;Percy Liang", "authorids": "~Sang_Michael_Xie1;~Tengyu_Ma1;~Percy_Liang1", "gender": ";M;", "homepage": "https://cs.stanford.edu/~eix/;http://ai.stanford.edu/~tengyuma/;https://cs.stanford.edu/~pliang/", "dblp": "220/3987;54/9061;04/1701", "google_scholar": "EBNa5IEAAAAJ;i38QlUwAAAAJ;pouyVyUAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Sang_Michael_Xie1;~Tengyu_Ma1;~Percy_Liang1", "aff": "Stanford University;Facebook AI Research;Stanford University", "aff_domain": "stanford.edu;fb.com;stanford.edu", "position": "PhD student;Visiting Scientist;Associate Professor", "bibtex": "@misc{\nxie2021simplifying,\ntitle={Simplifying Models with Unlabeled Output Data},\nauthor={Sang Michael Xie and Tengyu Ma and Percy Liang},\nyear={2021},\nurl={https://openreview.net/forum?id=GXJPLbB5P-y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=GXJPLbB5P-y", "pdf_size": 0, "rating": "6;6;6", "confidence": "3;4;3", "wc_review": "426;260;292", "wc_reply_reviewers": "199;0;0", "wc_reply_authors": "798;491;81", "reply_reviewers": "1;0;0", "reply_authors": "2;1;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 326.0, 71.90734779330042 ], "wc_reply_reviewers_avg": [ 66.33333333333333, 93.80949963741531 ], "wc_reply_authors_avg": [ 456.6666666666667, 293.7190645649153 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6710465900875164194&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Stanford University;Meta", "aff_unique_dep": ";Facebook AI Research", "aff_unique_url": "https://www.stanford.edu;https://research.facebook.com", "aff_unique_abbr": "Stanford;FAIR", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Stanford;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3188", "id": "GY6-6sTvGaf", "poster": "", "openreview": "https://openreview.net/forum?id=GY6-6sTvGaf", "slides": "https://iclr.cc/virtual/2021/poster/3188", "video": "https://iclr.cc/virtual/2021/poster/3188", "author_site": "Denis Yarats, Ilya Kostrikov, Rob Fergus", "tldr": "", "abstract": "We propose a simple data augmentation technique that can be applied to standard model-free reinforcement learning algorithms, enabling robust learning directly from pixels without the need for auxiliary losses or pre-training. The approach leverages input perturbations commonly used in computer vision tasks to transform input examples, as well as regularizing the value function and policy. Existing model-free approaches, such as Soft Actor-Critic (SAC), are not able to train deep networks effectively from image pixels. However, the addition of our augmentation method dramatically improves SAC\u2019s performance, enabling it to reach state-of-the-art performance on the DeepMind control suite, surpassing model-based (Hafner et al., 2019; Lee et al., 2019; Hafner et al., 2018) methods and recently proposed contrastive learning (Srinivas et al., 2020). Our approach, which we dub DrQ: Data-regularized Q, can be combined with any model-free reinforcement learning algorithm. We further demonstrate this by applying it to DQN and significantly improve its data-efficiency on the Atari 100k benchmark.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/2c42ae7c30a17bec6500f00a82b788a9708009f8.zip", "author": "Denis Yarats;Ilya Kostrikov;Rob Fergus", "authorids": "~Denis_Yarats1;~Ilya_Kostrikov1;~Rob_Fergus1", "gender": "M;M;M", "homepage": "http://denis-yarats.info/;;http://cs.nyu.edu/fergus/", "dblp": "200/8142;https://dblp.org/pers/k/Kostrikov:Ilya.html;77/3763", "google_scholar": "7kaXqgMAAAAJ;PTS2AOgAAAAJ;https://scholar.google.com.tw/citations?user=GgQ9GEkAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Denis_Yarats1;~Ilya_Kostrikov1;~Rob_Fergus1", "aff": "New York University;New York University;Google", "aff_domain": "cs.nyu.edu;nyu.edu;google.com", "position": "PhD student;PhD student;Research scientist", "bibtex": "@inproceedings{\nyarats2021image,\ntitle={Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels},\nauthor={Denis Yarats and Ilya Kostrikov and Rob Fergus},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=GY6-6sTvGaf}\n}", "github": "[![github](/images/github_icon.svg) denisyarats/drq](https://github.com/denisyarats/drq) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=GY6-6sTvGaf)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "4;3;5;5", "wc_review": "275;116;330;235", "wc_reply_reviewers": "0;0;23;0", "wc_reply_authors": "776;168;363;147", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 239.0, 78.61615610038436 ], "wc_reply_reviewers_avg": [ 5.75, 9.959292143521045 ], "wc_reply_authors_avg": [ 363.5, 252.61086674963133 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 533, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1389162255773390362&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=GY6-6sTvGaf", "email": "cs.nyu.edu;nyu.edu;google.com", "author_num": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "New York University;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.nyu.edu;https://www.google.com", "aff_unique_abbr": "NYU;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "GafvgJTFkgb", "title": "A Technical and Normative Investigation of Social Bias Amplification", "track": "main", "status": "Reject", "tldr": "", "abstract": "The conversation around the fairness of machine learning models is growing and evolving. In this work, we focus on the issue of bias amplification: the tendency of models trained from data containing social biases to further amplify these biases. This problem is brought about by the algorithm, on top of the level of bias already present in the data. We make two main contributions regarding its measurement. First, building off of Zhao et al. (2017), we introduce and analyze a new, decoupled metric for measuring bias amplification, $\\text{BiasAmp}_{\\rightarrow}, which possesses a number of attractive properties, including the ability to pinpoint the cause of bias amplification. Second, we thoroughly analyze and discuss the normative implications of this metric. We provide suggestions about its measurement by cautioning against predicting sensitive attributes, encouraging the use of confidence intervals due to fluctuations in the fairness of models across runs, and discussing what bias amplification means in the context of domains where labels either don't exist at test time or correspond to uncertain future events. Throughout this paper, we work to provide a deeply interrogative look at the technical measurement of bias amplification, guided by our normative ideas of what we want it to encompass.", "keywords": "bias amplification;fairness;societal considerations", "primary_area": "", "supplementary_material": "", "author": "Angelina Wang;Olga Russakovsky", "authorids": "~Angelina_Wang1;~Olga_Russakovsky1", "gender": "F;F", "homepage": "https://angelina-wang.github.io/;http://cs.princeton.edu/~olgarus", "dblp": "210/1014.html;52/6883", "google_scholar": "cGemfcYAAAAJ;TB5OwW8AAAAJ", "orcid": ";0000-0001-5272-3241", "linkedin": "angelina-wang-282816108/;", "or_profile": "~Angelina_Wang1;~Olga_Russakovsky1", "aff": "Princeton University;Princeton University", "aff_domain": "princeton.edu;princeton.edu", "position": "PhD student;Assistant Professor", "bibtex": "@misc{\nwang2021a,\ntitle={A Technical and Normative Investigation of Social Bias Amplification},\nauthor={Angelina Wang and Olga Russakovsky},\nyear={2021},\nurl={https://openreview.net/forum?id=GafvgJTFkgb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=GafvgJTFkgb", "pdf_size": 0, "rating": "5;5;7", "confidence": "4;4;4", "wc_review": "357;505;272", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "398;517;160", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 378.0, 96.2739147779224 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 358.3333333333333, 148.419076342033 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:8ntEqbNGKxgJ:scholar.google.com/&scioq=A+Technical+and+Normative+Investigation+of+Social+Bias+Amplification&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Princeton University", "aff_unique_dep": "", "aff_unique_url": "https://www.princeton.edu", "aff_unique_abbr": "Princeton", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "GbCkSfstOIA", "title": "Semi-Supervised Learning via Clustering Representation Space", "track": "main", "status": "Reject", "tldr": "", "abstract": "We proposed a novel loss function that combines supervised learning with clustering in deep neural networks. Taking advantage of the data distribution and the existence of some labeled data, we construct a meaningful latent space. Our loss function consists of three parts, the quality of the clustering result, the margin between clusters, and the classification error of labeled instances. Our proposed model is trained to minimize our loss function by backpropagation, avoiding the need for pre-training or additional networks. This guides our network to classify labeled samples correctly while able to find good clusters simultaneously. We applied our proposed method on MNIST, USPS, ETH-80, and COIL-100; the comparison results confirm our model's outstanding performance over semi-supervised learning.", "keywords": "semi-supervised learning;deep learning;clustering;embedding latent space", "primary_area": "", "supplementary_material": "", "author": "Yen-Chieh Huang;Yuh-Jye Lee;Chih-Chi Wu;Yi-Wei Chiu;Yong-Xiang Lin;\bCHENG-YING LI;Po-Hung Ko", "authorids": "jeffpapapa@gmail.com;~Yuh-Jye_Lee1;~Chih-Chi_Wu1;~Yi-Wei_Chiu1;george851101@gmail.com;chuck30621@gmail.com;kphong19.iie08g@nctu.edu.tw", "gender": ";M;;M;;;", "homepage": ";https://jupiter.math.nycu.edu.tw/~yuhjye/;;https://github.com/yiwei0730;;;", "dblp": ";12/180.html;;;;;", "google_scholar": ";P-qgPcIAAAAJ;https://scholar.google.com.tw/citations?view_op=list_works;;;;", "orcid": ";;;;;;", "linkedin": ";;;;;;", "or_profile": "jeffpapapa@gmail.com;~Yuh-Jye_Lee1;~Chih-Chi_Wu1;~Yi-Wei_Chiu1;george851101@gmail.com;chuck30621@gmail.com;kphong19.iie08g@nctu.edu.tw", "aff": ";National Chiao Tung University;National Chiao Tung University;National Chiao Tung University;;;", "aff_domain": ";nctu.edu.tw;nctu.edu.tw;nctu.edu.tw;;;", "position": ";Full Professor;PhD student;MS student;;;", "bibtex": "@misc{\nhuang2021semisupervised,\ntitle={Semi-Supervised Learning via Clustering Representation Space},\nauthor={Yen-Chieh Huang and Yuh-Jye Lee and Chih-Chi Wu and Yi-Wei Chiu and Yong-Xiang Lin and \bCHENG-YING LI and Po-Hung Ko},\nyear={2021},\nurl={https://openreview.net/forum?id=GbCkSfstOIA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=GbCkSfstOIA", "pdf_size": 0, "rating": "2;4;4;4", "confidence": "5;5;4;5", "wc_review": "388;457;832;613", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.8660254037844386 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 572.5, 170.55864094205253 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Jg0VtrLGjl0J:scholar.google.com/&scioq=Semi-Supervised+Learning+via+Clustering+Representation+Space&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "National Chiao Tung University", "aff_unique_dep": "", "aff_unique_url": "https://www.nctu.edu.tw", "aff_unique_abbr": "NCTU", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Taiwan", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "GboAslXYiBr", "title": "Semi-Supervised Speech-Language Joint Pre-Training for Spoken Language Understanding", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Spoken language understanding (SLU) requires a model to analyze input acoustic signals to understand its linguistic content and make predictions. To boost the models' performance, various pre-training methods have been proposed to utilize large-scale unlabeled text and speech data. However, the inherent disparities between the two modalities necessitate a mutual analysis. In this paper, we propose a novel semi-supervised learning method, AlignNet, to jointly pre-train the speech and language modules. Besides a self-supervised masked language modeling of the two individual modules, AlignNet aligns representations from paired speech and transcripts in a shared latent semantic space. Thus, during fine-tuning, the speech module alone can produce representations carrying both acoustic information and contextual semantic knowledge. Experimental results verify the effectiveness of our approach on various SLU tasks. For example, AlignNet improves the previous state-of-the-art accuracy on the Spoken SQuAD dataset by 6.2%.", "keywords": "joint pre-training;multimodal representation learning;spoken language understanding;speech representation learning", "primary_area": "", "supplementary_material": "", "author": "Yu-An Chung;Chenguang Zhu;Michael Zeng", "authorids": "~Yu-An_Chung1;~Chenguang_Zhu1;~Michael_Zeng1", "gender": ";M;M", "homepage": "http://people.csail.mit.edu/andyyuan/;;https://www.microsoft.com/en-us/research/people/nzeng/", "dblp": "https://dblp.org/pers/hd/c/Chung:Yu=An;48/7536-1.html;232/1866-1.html", "google_scholar": "DmIG_WoAAAAJ;1b2kKWoAAAAJ;", "orcid": "0000-0001-9451-7956;;", "linkedin": ";;michaelnanshanzeng/", "or_profile": "~Yu-An_Chung1;~Chenguang_Zhu1;~Michael_Zeng1", "aff": "Massachusetts Institute of Technology;;Microsoft", "aff_domain": "mit.edu;;microsoft.com", "position": "PhD student;;Partner Research Manager", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=GboAslXYiBr", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;4;3", "wc_review": "653;1043;670", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 788.6666666666666, 179.9746895785318 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13057105555255630291&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Massachusetts Institute of Technology;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "https://web.mit.edu;https://www.microsoft.com", "aff_unique_abbr": "MIT;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "Gc4MQq-JIgj", "title": "Reconnaissance for reinforcement learning with safety constraints", "track": "main", "status": "Reject", "tldr": "", "abstract": "Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this study, we consider a situation in which the agent has access to the generative model which provides us with a next state sample for any given state-action pair, and propose a model to solve a CMDP problem by decomposing the CMDP into a pair of MDPs; \\textit{reconnaissance} MDP (R-MDP) and \\textit{planning} MDP (P-MDP). In R-MDP, we train threat function, the Q-function analogue of danger that can determine whether a given state-action pair is safe or not. In P-MDP, we train a reward-seeking policy while using a fixed threat function to determine the safeness of each action. With the help of generative model, we can efficiently train the threat function by preferentially sampling rare dangerous events. Once the threat function for a baseline policy is computed, we can solve other CMDP problems with different reward and different danger-constraint without the need to re-train the model. We also present an efficient approximation method for the threat function that can greatly reduce the difficulty of solving R-MDP. We will demonstrate the efficacy of our method over classical approaches in benchmark dataset and complex collision-free navigation tasks.", "keywords": "Reinforcement Learning;Safety constraints;Constrained Markov Decision Process", "primary_area": "", "supplementary_material": "/attachment/af9890c74d40d638c2550437f71f22cb3a9e8965.zip", "author": "Shin-ichi Maeda;Hayato Watahiki;Yi Ouyang;Shintarou Okada;Masanori Koyama", "authorids": "~Shin-ichi_Maeda2;~Hayato_Watahiki1;~Yi_Ouyang1;okada@preferred.jp;~Masanori_Koyama1", "gender": "M;M;;;", "homepage": "https://maeyon.github.io/publication/index.html;;;;", "dblp": "90/4637;https://dblp.uni-trier.de/pid/249/2540;;;151/6113", "google_scholar": "https://scholar.google.ca/citations?user=Fv-ifUQAAAAJ;;dw_Sj_YAAAAJ;;", "orcid": "0000-0002-3254-9722;;;;", "linkedin": ";;;;", "or_profile": "~Shin-ichi_Maeda2;~Hayato_Watahiki1;~Yi_Ouyang1;okada@preferred.jp;~Masanori_Koyama1", "aff": "Preferred Networks, Inc.;The University of Tokyo;Preferred Networks, Inc.;;Preferred Networks, Inc.", "aff_domain": "preferred.jp;u-tokyo.ac.jp;preferred.jp;;preferred.jp", "position": "Senior Researcher;MS student;Researcher;;Researcher", "bibtex": "@misc{\nmaeda2021reconnaissance,\ntitle={Reconnaissance for reinforcement learning with safety constraints},\nauthor={Shin-ichi Maeda and Hayato Watahiki and Yi Ouyang and Shintarou Okada and Masanori Koyama},\nyear={2021},\nurl={https://openreview.net/forum?id=Gc4MQq-JIgj}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=Gc4MQq-JIgj", "pdf_size": 0, "rating": "4;5;7", "confidence": "4;2;3", "wc_review": "338;228;370", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "569;280;80", "reply_reviewers": "0;0;0", "reply_authors": "3;2;1", "rating_avg": [ 5.333333333333333, 1.247219128924647 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 312.0, 60.81666438293592 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 309.6666666666667, 200.73254732493072 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.816496580927726 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.3273268353539886, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2225432065501408167&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Preferred Networks, Inc.;University of Tokyo", "aff_unique_dep": ";", "aff_unique_url": "https://www.preferred-networks.com;https://www.u-tokyo.ac.jp", "aff_unique_abbr": "PFN;UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Japan" }, { "id": "GeOIKynj_V", "title": "Playing Atari with Capsule Networks: A systematic comparison of CNN and CapsNets-based agents.", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "In recent years, Capsule Networks (CapsNets) have achieved promising results in tasks in the object recognition task thanks to their invariance characteristics towards pose and lighting.\nThey have been proposed as an alternative to relational insensitive and translation invariant Convolutional Neural Networks (CNN). \nIt has been empirically proven that CapsNets are capable of achieving competitive performance while requiring significantly fewer parameters.\nThis is a desirable characteristic for Deep reinforcement learning which is known to be sample-inefficient during training.\nIn this paper, we conduct a systematic analysis to explore the potential of CapsNets-based agents in the deep reinforcement learning setting.\nMore specifically, we compare the performance of a CNN-based agent with a CapsNets-based agent in a deep Q-network using the Atari suite as the testbed of our analysis. \nTo the best of our knowledge, this work constitutes the first CapsNets based deep reinforcement learning model to learn state-action value functions without the need of task-specific adaptation.\nOur results show that, in this setting, CapsNets-based architectures require 92% fewer parameters compared to their CNN-based counterparts.\nMoreover, despite their smaller size, the CapsNets-based agents provide significant boosts in performance (score), ranging between 10% - 77%.\nThis is supported by our empirical results which shows that CapsNets-based agents outperform the CNN-based agent, in a Double-DQN with Prioritized experience replay setting, in eight out of the nine selected environments.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Akash Singh;Kevin Mets;Jose Oramas;Steven Latr\u00e9", "authorids": "~Akash_Singh3;~Kevin_Mets1;~Jose_Oramas1;steven.latre@uantwerpen.be", "gender": "M;M;M;", "homepage": "https://www.uantwerpen.be/en/staff/akash-singh/;;http://idlab.uantwerpen.be/~joramasmogrovejo;", "dblp": ";;47/9735;", "google_scholar": ";avinyLUAAAAJ;FurBYlUAAAAJ;", "orcid": ";0000-0002-4812-4841;0000-0002-8607-5067;", "linkedin": "akash-singh-a7b23519/;;https://linkedin.com/in/jos%C3%A9-oramas-m-3183501b;", "or_profile": "~Akash_Singh3;~Kevin_Mets1;~Jose_Oramas1;steven.latre@uantwerpen.be", "aff": "University of Antwerpen;University of Antwerp, IDLab, imec;University of Antwerp;", "aff_domain": "uantwerpen.be;uantwerpen.be;uantwerpen.be;", "position": "PhD student;Postdoc;Associate Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=GeOIKynj_V", "pdf_size": 0, "rating": "2;4;4;5", "confidence": "2;4;3;4", "wc_review": "151;255;269;220", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 223.75, 45.63647116068463 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.899228803025897, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:n8LrZNcA-koJ:scholar.google.com/&scioq=Playing+Atari+with+Capsule+Networks:+A+systematic+comparison+of+CNN+and+CapsNets-based+agents.&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Antwerp", "aff_unique_dep": "", "aff_unique_url": "https://www.uantwerp.be", "aff_unique_abbr": "UA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Belgium" }, { "id": "Ggx8fbKZ1-D", "title": "Adaptive Hierarchical Hyper-gradient Descent", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adaptive learning rates can lead to faster convergence and better final performance\nfor deep learning models. There are several widely known human-designed adap-\ntive optimizers such as Adam and RMSProp, gradient based adaptive methods\nsuch as hyper-descent and L4, and meta learning approaches including learning\nto learn. However, the issue of balancing adaptiveness and over-parameterization\nis still a topic to be addressed. In this study, we investigate different levels of\nlearning rate adaptation based on the framework of hyper-gradient descent, and\nfurther propose a method that adaptively learns the model parameters for combin-\ning different levels of adaptations. Meanwhile, we show the relationship between\nadding regularization on over-parameterized learning rates and building combi-\nnations of different levels of adaptive learning rates. The experiments on several\nnetwork architectures including feed-forward networks, LeNet-5 and ResNet-18/34\nshow that the proposed multi-level adaptive approach can outperform baseline\nadaptive methods in a variety circumstances with statistical significance.", "keywords": "learning rate adaptation;hyper-gradient descent;meta learning;optimisation;hierarchical system", "primary_area": "", "supplementary_material": "/attachment/069f8ab740e75c724c2e10b46ac817236928505f.zip", "author": "RENLONG JIE;Junbin Gao;Andrey Vasnev;Minh-Ngoc Tran", "authorids": "~RENLONG_JIE1;~Junbin_Gao1;andrey.vasnev@sydney.edu.au;~Minh-Ngoc_Tran1", "gender": "M;;;", "homepage": ";https://www.sydney.edu.au/business/about/our-people/academic-staff/junbin-gao.html;;https://sites.google.com/site/mntran26/home?authuser=0", "dblp": ";30/3983;;", "google_scholar": "qtBf5BAAAAAJ;https://scholar.google.com.au/citations?user=3-KJN8IAAAAJ;;https://scholar.google.com.au/citations?user=98A6Dq8AAAAJ", "orcid": ";0000-0001-9803-0256;;", "linkedin": ";;;", "or_profile": "~RENLONG_JIE1;~Junbin_Gao1;andrey.vasnev@sydney.edu.au;~Minh-Ngoc_Tran1", "aff": "Huawei Noah's Ark Lab;University of Sydney;;University of Sydney", "aff_domain": "huawei.com;sydney.edu.au;;sydney.edu.au", "position": "Postdoc;Full Professor;;Associate Professor", "bibtex": "@misc{\njie2021adaptive,\ntitle={Adaptive Hierarchical Hyper-gradient Descent},\nauthor={RENLONG JIE and Junbin Gao and Andrey Vasnev and Minh-Ngoc Tran},\nyear={2021},\nurl={https://openreview.net/forum?id=Ggx8fbKZ1-D}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=Ggx8fbKZ1-D", "pdf_size": 0, "rating": "5;5;5;5", "confidence": "2;4;3;3", "wc_review": "562;318;363;469", "wc_reply_reviewers": "27;0;0;0", "wc_reply_authors": "1132;648;944;944", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;2;2", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 428.0, 94.81824718903002 ], "wc_reply_reviewers_avg": [ 6.75, 11.691342951089922 ], "wc_reply_authors_avg": [ 917.0, 173.2368321114191 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9194811298040440760&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;1", "aff_unique_norm": "Huawei;University of Sydney", "aff_unique_dep": "Noah's Ark Lab;", "aff_unique_url": "https://www.huawei.com;https://www.sydney.edu.au", "aff_unique_abbr": "Huawei;USYD", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1", "aff_country_unique": "China;Australia" }, { "id": "GiEyS3CFHV_", "title": "Non-Asymptotic PAC-Bayes Bounds on Generalisation Error", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Constructing non-vacuous PAC-Bayes bounds on generalization errors for un- bounded risk functionals, especially in the non-asymptotic regime, is an active area of research. However, current state of the art results are applicable only in some very specialized cases. In this work, we give an integrability condition which exactly characterizes when any risk functional, for a given data set and model space, admits such bounds using the Levy-Khintchine theorem. Further, we de- rive a Bahadur-Rao type exact asymptotic bound, which is much sharper than a traditional Chernoff type inequality, especially in the under-sampled regime. These bounds give us the flexibility to construct data or model-dependent consistency promoting updates to a data-free prior, which provably improves the generalization performance.", "keywords": "PAC-Bayes Bounds;Large Deviation Theory;Concentration Inequalities;Generalisation Error", "primary_area": "", "supplementary_material": "", "author": "Arijit Das", "authorids": "~Arijit_Das2", "gender": "M", "homepage": "", "dblp": "37/4064", "google_scholar": "https://scholar.google.com/citations?hl=de", "orcid": "", "linkedin": "dr-arijit-das", "or_profile": "~Arijit_Das2", "aff": "University of Cologne", "aff_domain": "uni-koeln.de", "position": "Principal Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=GiEyS3CFHV_", "pdf_size": 0, "rating": "4;4;4;5;5", "confidence": "2;2;4;3;3", "wc_review": "842;441;459;324;376", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "155;166;72;70;65", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 4.4, 0.48989794855663565 ], "confidence_avg": [ 2.8, 0.7483314773547882 ], "wc_review_avg": [ 488.4, 183.19672486155423 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 105.6, 45.01821853427787 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.2182178902359924, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:GR3BPLr468gJ:scholar.google.com/&scioq=Non-Asymptotic+PAC-Bayes+Bounds+on+Generalisation+Error&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Cologne", "aff_unique_dep": "", "aff_unique_url": "https://www.uni-koeln.de/", "aff_unique_abbr": "UC", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "id": "Gj9aQfQEHRS", "title": "Transformers satisfy", "track": "main", "status": "Reject", "tldr": "", "abstract": "The Propositional Satisfiability Problem (SAT), and more generally, the Constraint Satisfaction Problem (CSP), are mathematical questions defined as finding an assignment to a set of objects that satisfies a series of constraints. The modern approach is trending to solve CSP through neural symbolic methods. Most recent works are sequential model-based and adopt neural embedding, i.e., reinforcement learning with neural graph networks, and graph recurrent neural networks. This work proposes a one-shot model derived from the eminent Transformer architecture for factor graph structure to solve the CSP problem. We define the heterogeneous attention mechanism based on meta-paths for the self-attention between literals, the cross-attention based on the bipartite graph links from literal to clauses, or vice versa. This model takes advantage of parallelism. Our model achieves high speed and very high accuracy on the factor graph for CSPs with arbitrary size.", "keywords": "constraint satisfaction problem;graph attention;transformers", "primary_area": "", "supplementary_material": "/attachment/2d2eb995198a745bdc926a972b6e92b7e827bd44.zip", "author": "Feng Shi;CHEN LI;Shijie Bian;Yiqiao Jin;Ziheng Xu;Tian Han;Song-Chun Zhu", "authorids": "~Feng_Shi1;~CHEN_LI14;~Shijie_Bian1;~Yiqiao_Jin1;~Ziheng_Xu1;~Tian_Han1;~Song-Chun_Zhu1", "gender": "M;M;M;M;;M;M", "homepage": ";https://github.com/CChenLi;;https://ahren09.github.io/;http://www.seas.ucla.edu/~zxu/;https://hthth0801.github.io/;https://zhusongchun.net/", "dblp": ";;;207/6631.html;;65/4065-1;10/10313", "google_scholar": ";;;eY85qm4AAAAJ;;Qtvu5t4AAAAJ;https://scholar.google.com.tw/citations?user=Al8dyb4AAAAJ", "orcid": ";;;0000-0002-6974-5970;;;", "linkedin": ";;shijie-bian-ab7a9b196/;ahren-jin/;;;", "or_profile": "~Feng_Shi1;~CHEN_LI14;~Shijie_Bian1;~Yiqiao_Jin1;~Ziheng_Xu1;~Tian_Han1;~Song-Chun_Zhu1", "aff": ";;University of California, Los Angeles;;University of California, Los Angeles;Stevens Institute of Technology;Peking University", "aff_domain": ";;ucla.edu;;ucla.edu;stevens.edu;pku.edu.cn", "position": ";;Undergrad student;;MS student;Assistant Professor;Full Professor", "bibtex": "@misc{\nshi2021transformers,\ntitle={Transformers satisfy},\nauthor={Feng Shi and CHEN LI and Shijie Bian and Yiqiao Jin and Ziheng Xu and Tian Han and Song-Chun Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=Gj9aQfQEHRS}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Gj9aQfQEHRS", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "5;4;3;4", "wc_review": "739;431;643;720", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 633.25, 122.17686974218975 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16877157664726842004&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "University of California, Los Angeles;Stevens Institute of Technology;Peking University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ucla.edu;https://www.stevens.edu;http://www.pku.edu.cn", "aff_unique_abbr": "UCLA;SIT;Peking U", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "United States;China" }, { "id": "GjqcL-v0J2A", "title": "Mixture Representation Learning with Coupled Autoencoding Agents", "track": "main", "status": "Reject", "tldr": "", "abstract": "Jointly identifying a mixture of discrete and continuous factors of variability can help unravel complex phenomena. We study this problem by proposing an unsupervised framework called coupled mixture VAE (cpl-mixVAE), which utilizes multiple interacting autoencoding agents. The individual agents operate on augmented copies of training samples to learn mixture representations, while being encouraged to reach consensus on the categorical assignments. We provide theoretical justification to motivate the use of a multi-agent framework, and formulate it as a variational inference problem. We benchmark our approach on MNIST and dSprites, achieving state-of-the-art categorical assignments while preserving interpretability of the continuous factors. We then demonstrate the utility of this approach in jointly identifying cell types and type-specific, activity-regulated genes for a single-cell gene expression dataset profiling over 100 cortical neuron types.", "keywords": "Multi-agent network;representation learning;collective decision making;type-preserving data augmentation", "primary_area": "", "supplementary_material": "/attachment/1541087e36fd63ba402ae87622d890420217edd2.zip", "author": "Yeganeh Marghi;Rohan Gala;Uygar S\u00fcmb\u00fcl", "authorids": "~Yeganeh_Marghi1;~Rohan_Gala1;~Uygar_S\u00fcmb\u00fcl2", "gender": "F;;M", "homepage": ";https://rhngla.github.io;", "dblp": ";;30/8374", "google_scholar": "https://scholar.google.com/citations?hl=en;https://scholar.google.com/citations?hl=en;dhiRjJIAAAAJ", "orcid": "0000-0002-5802-7439;;", "linkedin": "yeganehmarghi/;;", "or_profile": "~Yeganeh_Marghi1;~Rohan_Gala1;~Uygar_Sumbul1", "aff": "Allen Institute;Allen Institute;Allen Institute", "aff_domain": "alleninstitute.org;alleninstitute.org;alleninstitute.org", "position": "Researcher;Scientist I;Assistant Investigator", "bibtex": "@misc{\nmarghi2021mixture,\ntitle={Mixture Representation Learning with Coupled Autoencoding Agents},\nauthor={Yeganeh Marghi and Rohan Gala and Uygar S{\\\"u}mb{\\\"u}l},\nyear={2021},\nurl={https://openreview.net/forum?id=GjqcL-v0J2A}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=GjqcL-v0J2A", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;3;5;3", "wc_review": "493;431;356;931", "wc_reply_reviewers": "0;0;0;17", "wc_reply_authors": "749;650;456;812", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;2", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 552.75, 223.70558218336888 ], "wc_reply_reviewers_avg": [ 4.25, 7.361215932167728 ], "wc_reply_authors_avg": [ 666.75, 134.68365713775373 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:PDYvg9FfdEkJ:scholar.google.com/&scioq=Mixture+Representation+Learning+with+Coupled+Autoencoding+Agents&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Allen Institute for Artificial Intelligence", "aff_unique_dep": "", "aff_unique_url": "https://allenai.org", "aff_unique_abbr": "AI2", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "GtCq61UFDId", "title": "SoCal: Selective Oracle Questioning for Consistency-based Active Learning of Cardiac Signals", "track": "main", "status": "Reject", "tldr": "", "abstract": "The ubiquity and rate of collection of cardiac signals produce large, unlabelled datasets. Active learning (AL) can exploit such datasets by incorporating human annotators (oracles) to improve generalization performance. However, the over-reliance of existing algorithms on oracles continues to burden physicians. To minimize this burden, we propose SoCal, a consistency-based AL framework that dynamically determines whether to request a label from an oracle or to generate a pseudo-label instead. We show that our framework decreases the labelling burden while maintaining strong performance, even in the presence of a noisy oracle.", "keywords": "Active learning;consistency-training;cardiac signals;healthcare", "primary_area": "", "supplementary_material": "/attachment/e6a7d3570cffdeb175ec132d3737d9526a0a52ac.zip", "author": "Dani Kiyasseh;Tingting Zhu;David A. Clifton", "authorids": "~Dani_Kiyasseh1;tingting.zhu@eng.ox.ac.uk;~David_A._Clifton1", "gender": ";;M", "homepage": "https://danikiyasseh.github.io/;;http://www.eng.ox.ac.uk/chi", "dblp": ";;89/6424", "google_scholar": "UD1oO4MAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Dani_Kiyasseh1;tingting.zhu@eng.ox.ac.uk;~David_A._Clifton1", "aff": "University of Oxford;;University of Oxford", "aff_domain": "oxford.ac.uk;;ox.ac.uk", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nkiyasseh2021socal,\ntitle={SoCal: Selective Oracle Questioning for Consistency-based Active Learning of Cardiac Signals},\nauthor={Dani Kiyasseh and Tingting Zhu and David A. Clifton},\nyear={2021},\nurl={https://openreview.net/forum?id=GtCq61UFDId}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=GtCq61UFDId", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;3;4;3", "wc_review": "323;202;555;407", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "357;620;907;740", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 371.75, 128.46667855907228 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 656.0, 200.4706961129232 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=944185574108691775&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "aff_unique_index": "0;0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "GtiDFD1pxpz", "title": "Intelligent Matrix Exponentiation", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present a novel machine learning architecture that uses a single high-dimensional nonlinearity consisting of the exponential of a single input-dependent matrix. The mathematical simplicity of this architecture allows a detailed analysis of its behaviour, providing robustness guarantees via Lipschitz bounds. Despite its simplicity, a single matrix exponential layer already provides universal approximation properties and can learn and extrapolate fundamental functions of the input, such as periodic structure or geometric invariants. This architecture outperforms other general-purpose architectures on benchmark problems, including CIFAR-10, using fewer parameters.", "keywords": "matrix exponential;tensor methods;supervised learning;domain extrapolation;certified robustness", "primary_area": "", "supplementary_material": "/attachment/71d9a38d59c322a3b6aa3002f69a08827ee68783.zip", "author": "Thomas Fischbacher;Iulia Maria Comsa;Krzysztof Potempa;Moritz Firsching;Luca Versari;Jyrki Alakuijala", "authorids": "~Thomas_Fischbacher1;~Iulia_Maria_Comsa1;krzysztof.potempa@gmail.com;~Moritz_Firsching1;~Luca_Versari1;~Jyrki_Alakuijala1", "gender": ";F;;;M;", "homepage": "https://research.google/;;;https://mo271.github.io/;;https://research.google/people/105344/", "dblp": ";245/2444.html;;;184/0419;", "google_scholar": ";wGfxg1YAAAAJ;;;;SnJNeR4AAAAJ", "orcid": ";0000-0002-1322-4164;;;;", "linkedin": ";;;;;", "or_profile": "~Thomas_Fischbacher1;~Iulia_Maria_Comsa1;krzysztof.potempa@gmail.com;~Moritz_Firsching1;~Luca_Versari1;~Jyrki_Alakuijala1", "aff": ";Google Research;;;Google;Google", "aff_domain": ";google.com;;;google.com;google.com", "position": ";Researcher;;;Researcher;Senior Staff Software Engineer", "bibtex": "@misc{\nfischbacher2021intelligent,\ntitle={Intelligent Matrix Exponentiation},\nauthor={Thomas Fischbacher and Iulia Maria Comsa and Krzysztof Potempa and Moritz Firsching and Luca Versari and Jyrki Alakuijala},\nyear={2021},\nurl={https://openreview.net/forum?id=GtiDFD1pxpz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=GtiDFD1pxpz", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;3;3;4", "wc_review": "458;869;538;333", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1017;948;440;769", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 549.5, 198.40425902686667 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 793.5, 223.26273759855226 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13131695789655334720&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google Research", "aff_unique_url": "https://research.google", "aff_unique_abbr": "Google Research", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Learning Manifold Patch-Based Representations of Man-Made Shapes", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2959", "id": "Gu5WqN9J3Fn", "poster": "", "openreview": "https://openreview.net/forum?id=Gu5WqN9J3Fn", "slides": "https://iclr.cc/virtual/2021/poster/2959", "video": "https://iclr.cc/virtual/2021/poster/2959", "author_site": "Dmitriy Smirnov, Mikhail Bessmeltsev, Justin Solomon", "tldr": "", "abstract": "Choosing the right representation for geometry is crucial for making 3D models compatible with existing applications. Focusing on piecewise-smooth man-made shapes, we propose a new representation that is usable in conventional CAD modeling pipelines and can also be learned by deep neural networks. We demonstrate its benefits by applying it to the task of sketch-based modeling. Given a raster image, our system infers a set of parametric surfaces that realize the input in 3D. To capture piecewise smooth geometry, we learn a special shape representation: a deformable parametric template composed of Coons patches. Naively training such a system, however, is hampered by non-manifold artifacts in the parametric shapes and by a lack of data. To address this, we introduce loss functions that bias the network to output non-self-intersecting shapes and implement them as part of a fully self-supervised system, automatically generating both shape templates and synthetic training data. We develop a testbed for sketch-based modeling, demonstrate shape interpolation, and provide comparison to related work.", "keywords": "3D shape representations;CAD modeling;sketch-based modeling;computer graphics;computer vision;deep learning", "primary_area": "", "supplementary_material": "", "author": "Dmitriy Smirnov;Mikhail Bessmeltsev;Justin Solomon", "authorids": "~Dmitriy_Smirnov1;~Mikhail_Bessmeltsev1;~Justin_Solomon1", "gender": "M;M;M", "homepage": "https://dsmirnov.me;http://www-labs.iro.umontreal.ca/~bmpix/;http://people.csail.mit.edu/jsolomon/", "dblp": "181/4626-1.html;86/5662;80/5094", "google_scholar": "Dq0Fom8AAAAJ;EUzwk5cAAAAJ;pImSVwoAAAAJ", "orcid": ";;0000-0002-7701-7586", "linkedin": "dmsmir/;;justin-solomon-8a587914/", "or_profile": "~Dmitriy_Smirnov1;~Mikhail_Bessmeltsev1;~Justin_Solomon1", "aff": "Massachusetts Institute of Technology;University of Montreal;Massachusetts Institute of Technology", "aff_domain": "mit.edu;umontreal.ca;mit.edu", "position": "PhD student;Assistant Professor;Associate Professor", "bibtex": "@inproceedings{\nsmirnov2021learning,\ntitle={Learning Manifold Patch-Based Representations of Man-Made Shapes},\nauthor={Dmitriy Smirnov and Mikhail Bessmeltsev and Justin Solomon},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Gu5WqN9J3Fn}\n}", "github": "[![github](/images/github_icon.svg) dmsm/LearningPatches](https://github.com/dmsm/LearningPatches)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "5;4;5;3", "wc_review": "387;329;705;477", "wc_reply_reviewers": "0;91;209;95", "wc_reply_authors": "536;525;918;218", "reply_reviewers": "0;1;2;1", "reply_authors": "1;2;3;2", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 474.5, 143.14590458689344 ], "wc_reply_reviewers_avg": [ 98.75, 74.12953190193501 ], "wc_reply_authors_avg": [ 549.25, 248.22708857012364 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4923659639173309, "gs_citation": 40, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9102520552228338739&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Gu5WqN9J3Fn", "email": "mit.edu;umontreal.ca;mit.edu", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "Massachusetts Institute of Technology;University of Montreal", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://wwwumontreal.ca", "aff_unique_abbr": "MIT;UM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;Canada" }, { "id": "GvqjmSwUxkY", "title": "Rethinking the Truly Unsupervised Image-to-Image Translation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Every recent image-to-image translation model uses either image-level (i.e. input-output pairs) or set-level (i.e. domain labels) supervision at a minimum. However, even the set-level supervision can be a serious bottleneck for data collection in practice. In this paper, we tackle image-to-image translation in a fully unsupervised setting, i.e., neither paired images nor domain labels. To this end, we propose a truly unsupervised image-to-image translation model (TUNIT) that simultaneously learns to separate image domains and translate input images into the estimated domains. \nExperimental results show that our model achieves comparable or even better performance than the set-level supervised model trained with full labels, generalizes well on various datasets, and is robust against the choice of hyperparameters (e.g. the preset number of pseudo domains). In addition, TUNIT extends well to the semi-supervised scenario with various amount of labels provided. ", "keywords": "unsupervised approach;image-to-image translation;representation learning", "primary_area": "", "supplementary_material": "", "author": "Kyungjune Baek;Yunjey Choi;Youngjung Uh;Jaejun Yoo;Hyunjung Shim", "authorids": "~Kyungjune_Baek1;~Yunjey_Choi3;~Youngjung_Uh2;~Jaejun_Yoo1;~Hyunjung_Shim1", "gender": "M;;M;F;M", "homepage": "https://friedronaldo.github.io/;https://vilab.yonsei.ac.kr/member/professor;;https://sites.google.com/view/cvml-kaist;https://yunjey.github.io/", "dblp": "223/5659;57/10511;141/8878-1;72/4620;210/0980", "google_scholar": "jC6P1pQAAAAJ;BWBGrEEAAAAJ;https://scholar.google.co.kr/citations?user=7NBlQw4AAAAJ;KB5XZGIAAAAJ;v_4lOaAAAAAJ", "orcid": ";;0000-0001-5252-9668;;", "linkedin": ";youngjung-uh-78b459b5/;jaejunyoo/;;", "or_profile": "~Kyungjune_Baek1;~Youngjung_Uh2;~Jaejun_Yoo1;~Hyunjung_Shim1;~yunjey_choi1", "aff": "Yonsei University;Yonsei University;Swiss Federal Institute of Technology Lausanne;Yonsei University;NAVER", "aff_domain": "yonsei.ac.kr;yonsei.ac.kr;epfl.ch;yonsei.ac.kr;navercorp.com", "position": "PhD student;Associate Professor;Postdoc;Associate Professor;Research Scientist", "bibtex": "@misc{\nbaek2021rethinking,\ntitle={Rethinking the Truly Unsupervised Image-to-Image Translation},\nauthor={Kyungjune Baek and Yunjey Choi and Youngjung Uh and Jaejun Yoo and Hyunjung Shim},\nyear={2021},\nurl={https://openreview.net/forum?id=GvqjmSwUxkY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=GvqjmSwUxkY", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;4;5;5", "wc_review": "514;862;392;292", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1206;915;713;567", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 515.0, 215.21384713814305 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 850.25, 239.69707445023187 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8703882797784891, "gs_citation": 139, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10233348208595137703&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "aff_unique_index": "0;0;1;0;2", "aff_unique_norm": "Yonsei University;Swiss Federal Institute of Technology Lausanne;NAVER Corporation", "aff_unique_dep": ";;", "aff_unique_url": "https://www.yonsei.ac.kr;https://www.epfl.ch;https://www.naver.com", "aff_unique_abbr": "Yonsei;EPFL;NAVER", "aff_campus_unique_index": "1", "aff_campus_unique": ";Lausanne", "aff_country_unique_index": "0;0;1;0;0", "aff_country_unique": "South Korea;Switzerland" }, { "id": "GwjkaD3g-V1", "title": "Semi-Supervised Learning of Multi-Object 3D Scene Representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Representing scenes at the granularity of objects is a prerequisite for scene understanding and decision making. We propose a novel approach for learning multi-object 3D scene representations from images. A recurrent encoder regresses a latent representation of 3D shapes, poses and texture of each object from an input RGB image. The 3D shapes are represented continuously in function-space as signed distance functions (SDF) which we efficiently pre-train from example shapes. By differentiable rendering, we train our model to decompose scenes self-supervised from RGB-D images. Our approach learns to decompose images into the constituent objects of the scene and to infer their shape, pose and texture properties from a single view. In experiments, we evaluate the accuracy of our model in inferring the 3D scene layout and demonstrate the capabilities of the generative 3D scene model.", "keywords": "scene understanding;representation learning;multi-object scene decomposition;pose estimation;shape and appearance estimation", "primary_area": "", "supplementary_material": "/attachment/559f341715c333e13b4293b0617b410230c5bb58.zip", "author": "Cathrin Elich;Martin R. Oswald;Marc Pollefeys;Joerg Stueckler", "authorids": "~Cathrin_Elich1;~Martin_R._Oswald1;~Marc_Pollefeys2;~Joerg_Stueckler2", "gender": "F;;M;M", "homepage": "https://www.is.mpg.de/person/celich;;;https://is.mpg.de/employees/jstueckler", "dblp": "239/4275;37/7272;p/MarcPollefeys;99/3327", "google_scholar": ";https://scholar.google.ch/citations?user=biytQP8AAAAJ;YYH0BjEAAAAJ;https://scholar.google.de/citations?user=xrOzfucAAAAJ", "orcid": "0000-0002-3269-6976;0000-0002-1183-9958;;", "linkedin": ";martin-r-oswald-167461122/;marc-pollefeys-30a7075/;", "or_profile": "~Cathrin_Elich1;~Martin_R._Oswald1;~Marc_Pollefeys2;~Joerg_Stueckler1", "aff": "Max-Planck Institute;ETH Zurich;Swiss Federal Institute of Technology;Max Planck Institute for Intelligent Systems, Max-Planck Institute", "aff_domain": "mpg.de;ethz.ch;ethz.ch;tuebingen.mpg.de", "position": "PhD student;Post Doc;Full Professor;Group Leader", "bibtex": "@misc{\nelich2021semisupervised,\ntitle={Semi-Supervised Learning of Multi-Object 3D Scene Representations},\nauthor={Cathrin Elich and Martin R. Oswald and Marc Pollefeys and Joerg Stueckler},\nyear={2021},\nurl={https://openreview.net/forum?id=GwjkaD3g-V1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=GwjkaD3g-V1", "pdf_size": 0, "rating": "6;6;6", "confidence": "3;4;4", "wc_review": "699;216;418", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "774;533;384", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 444.3333333333333, 198.06115778269654 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 563.6666666666666, 160.68672074014773 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17984083450350518254&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Max-Planck-Gesellschaft zur F\u00f6rderung der Wissenschaften e.V.;ETH Zurich;Swiss Federal Institute of Technology;Max Planck Institute for Intelligent Systems", "aff_unique_dep": ";;;Intelligent Systems", "aff_unique_url": "https://www.mpg.de;https://www.ethz.ch;https://www.ethz.ch;https://www.mpi-is.mpg.de", "aff_unique_abbr": "MPG;ETHZ;ETH Zurich;MPI-IS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "Germany;Switzerland" }, { "id": "GzHjhdpk-YH", "title": "The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Comparing metric measure spaces (i.e. a metric space endowed with a probability distribution) is at the heart of many machine learning problems. This includes for instance predicting properties of molecules in quantum chemistry or generating graphs with varying connectivity. The most popular distance between such metric measure spaces is the Gromov-Wasserstein (GW) distance, which is the solution of a quadratic assignment problem. This distance has been successfully applied to supervised learning and generative modeling, for applications as diverse as quantum chemistry or natural language processing. The GW distance is however limited to the comparison of metric measure spaces endowed with a probability distribution. This strong limitation is problematic for many applications in ML where there is no a priori natural normalization on the total mass of the data. Furthermore, imposing an exact conservation of mass across spaces is not robust to outliers and often leads to irregular matching. To alleviate these issues, we introduce two Unbalanced Gromov-Wasserstein formulations: a distance and a more tractable upper-bounding relaxation. They both allow the comparison of metric spaces equipped with arbitrary positive measures up to isometries. The first formulation is a positive and definite divergence based on a relaxation of the mass conservation constraint using a novel type of quadratically-homogeneous divergence.This divergence works hand in hand with the entropic regularization approach which is popular to solve large scale optimal transport problems. We show that the underlying non-convex optimization problem can be efficiently tackled using a highly parallelizable and GPU-friendly iterative scheme. The second formulation is a distance between mm-spaces up to isometries based on a conic lifting. Lastly, we provide numerical simulations to highlight the salient features of the unbalanced divergence and its potential applications in ML.", "keywords": "Gromov-Wasserstein;Non-convex optimization;Optimal Transport;Partial matching", "primary_area": "", "supplementary_material": "", "author": "Thibault Sejourne;Fran\u00e7ois-Xavier Vialard;Gabriel Peyr\u00e9", "authorids": "~Thibault_Sejourne2;~Fran\u00e7ois-Xavier_Vialard2;~Gabriel_Peyr\u00e92", "gender": "M;M;M", "homepage": "https://thibsej.github.io/;http://angkor.univ-mlv.fr/~vialard/#about;http://gpeyre.com/", "dblp": ";09/8280;65/1759", "google_scholar": "ng54Q0wAAAAJ;https://scholar.google.fr/citations?user=_BrmEz8AAAAJ;https://scholar.google.fr/citations?user=KqA1dYcAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Thibault_Sejourne2;~Fran\u00e7ois-Xavier_Vialard2;~Gabriel_Peyr\u00e92", "aff": "Ecole Normale Superieure;Universit\u00e9 Gustave Eiffel;CNRS", "aff_domain": "ens.fr;u-pem.fr;cnrs.fr", "position": "PhD student;Professor;Researcher", "bibtex": "@misc{\nsejourne2021the,\ntitle={The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation},\nauthor={Thibault Sejourne and Fran{\\c{c}}ois-Xavier Vialard and Gabriel Peyr{\\'e}},\nyear={2021},\nurl={https://openreview.net/forum?id=GzHjhdpk-YH}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=GzHjhdpk-YH", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "5;4;4;4", "wc_review": "589;711;624;403", "wc_reply_reviewers": "125;0;46;15", "wc_reply_authors": "557;573;802;257", "reply_reviewers": "1;0;1;1", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 581.75, 112.35518457107354 ], "wc_reply_reviewers_avg": [ 46.5, 48.26230413065667 ], "wc_reply_authors_avg": [ 547.25, 193.58509110982695 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 95, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4621301821355236560&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1;2", "aff_unique_norm": "Ecole Normale Superieure;Universit\u00e9 Gustave Eiffel;Centre National de la Recherche Scientifique", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ens.fr;https://www.univ-gustave-eiffel.fr;https://www.cnrs.fr", "aff_unique_abbr": "ENS;UGE;CNRS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "France" }, { "id": "GzMUD_GGvJN", "title": "On the Importance of Distraction-Robust Representations for Robot Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Representation Learning methods can allow the application of Reinforcement Learning algorithms when a high dimensionality in a robot's perceptions would otherwise prove prohibitive. Consequently, unsupervised Representation Learning components often feature in robot control algorithms that assume high-dimensional camera images as the principal source of information.\nIn their design and performance, these algorithms often benefit from the controlled nature of the simulation or laboratory conditions they are evaluated in. However, these settings fail to acknowledge the stochasticity of most real-world environments.\nIn this work, we introduce the concept of Distraction-Robust Representation Learning. We argue that environment noise and other distractions require learned representations to encode the robot's expected perceptions rather than the observed ones. Our experimental evaluations demonstrate that representations learned with a traditional dimensionality reduction algorithm are strongly susceptible to distractions in a robot's environment.\nWe propose an Encoder-Decoder architecture that produces representations that allow the learning outcomes of robot control tasks to remain unaffected by these distractions.", "keywords": "Unsupervised Representation Learning;Robot Control;Quality-Diversity", "primary_area": "", "supplementary_material": "", "author": "Andy Wang;Antoine Cully", "authorids": "~Andy_Wang1;~Antoine_Cully1", "gender": "M;M", "homepage": "https://github.com/andwang1;", "dblp": ";https://dblp.org/pers/c/Cully:Antoine.html", "google_scholar": ";rZtJlPQAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Andy_Wang1;~Antoine_Cully1", "aff": ";Imperial College London", "aff_domain": ";imperial.ac.uk", "position": ";Assistant Professor", "bibtex": "@misc{\nwang2021on,\ntitle={On the Importance of Distraction-Robust Representations for Robot Learning},\nauthor={Andy Wang and Antoine Cully},\nyear={2021},\nurl={https://openreview.net/forum?id=GzMUD_GGvJN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=GzMUD_GGvJN", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "3;4;4;4", "wc_review": "673;384;782;268", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 526.75, 208.48905846590608 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:72OYmcpVNj0J:scholar.google.com/&scioq=On+the+Importance+of+Distraction-Robust+Representations+for+Robot+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Imperial College London", "aff_unique_dep": "", "aff_unique_url": "https://www.imperial.ac.uk", "aff_unique_abbr": "ICL", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "H-AAaJ9v_lE", "title": "Legendre Deep Neural Network (LDNN) and its application for approximation of nonlinear Volterra\u2013Fredholm\u2013Hammerstein integral equations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Various phenomena in biology, physics, and engineering are modeled by differential equations. These differential equations including partial differential equations and ordinary differential equations can be converted and represented as integral equations. In particular, Volterra\u2013Fredholm\u2013Hammerstein integral equations are the main type of these integral equations and researchers are interested in investigating and solving these equations. In this paper, we propose Legendre Deep Neural Network (LDNN) for solving nonlinear Volterra\u2013Fredholm\u2013Hammerstein integral equations (V-F-H-IEs). LDNN utilizes Legendre orthogonal polynomials as activation functions of the Deep structure. We present how LDNN can be used to solve nonlinear V-F-H-IEs. We show using the Gaussian quadrature collocation method in combination with LDNN results in a novel numerical solution for nonlinear V-F-H-IEs. Several examples are given to verify the performance and accuracy of LDNN.", "keywords": "Deep neural network;Volterra\u2013Fredholm\u2013Hammerstein integral equations;Legendre orthogonal polynomials;Gaussian quadrature method;Collocation method", "primary_area": "", "supplementary_material": "", "author": "Kourosh Parand;Zeinab Hajimohammadi;Ali Ghodsi", "authorids": "kparand@sbu.ac.ir;~Zeinab_Hajimohammadi1;~Ali_Ghodsi1", "gender": ";F;M", "homepage": ";;https://uwaterloo.ca/data-analytics/", "dblp": ";;71/4226-1", "google_scholar": ";;WXbhp_4AAAAJ", "orcid": ";0000-0001-5908-6046;", "linkedin": ";;ali-ghodsi-525b0a61/", "or_profile": "kparand@sbu.ac.ir;~Zeinab_Hajimohammadi1;~Ali_Ghodsi1", "aff": ";Shahid Beheshti University;University of Waterloo", "aff_domain": ";sbu.ac.ir;uwaterloo.ca", "position": ";PhD student;Full Professor", "bibtex": "@misc{\nparand2021legendre,\ntitle={Legendre Deep Neural Network ({\\{}LDNN{\\}}) and its application for approximation of nonlinear Volterra{\\textendash}Fredholm{\\textendash}Hammerstein integral equations},\nauthor={Kourosh Parand and Zeinab Hajimohammadi and Ali Ghodsi},\nyear={2021},\nurl={https://openreview.net/forum?id=H-AAaJ9v_lE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=H-AAaJ9v_lE", "pdf_size": 0, "rating": "3;4;5", "confidence": "5;4;2", "wc_review": "226;381;345", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 1.247219128924647 ], "wc_review_avg": [ 317.3333333333333, 66.23359335630892 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9819805060619659, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13968211482880723455&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1", "aff_unique_norm": "Shahid Beheshti University;University of Waterloo", "aff_unique_dep": ";", "aff_unique_url": "https://www.sbu.ac.ir;https://uwaterloo.ca", "aff_unique_abbr": "SBU;UW", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Iran;Canada" }, { "id": "H-BVtEaipej", "title": "Global Attention Improves Graph Networks Generalization", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper advocates incorporating a Low-Rank Global Attention (LRGA) module, a Computation and memory efficient variant of the dot-product attention (Vaswani et al., 2017), to Graph Neural Networks (GNNs) for improving their generalization power. \nTo theoretically quantify the generalization properties granted by adding the LRGA module to GNNs, we focus on a specific family of expressive GNNs and show that augmenting it with LRGA provides algorithmic alignment to a powerful graph isomorphism test, namely the 2-Folklore Weisfeiler-Lehman (2-FWL) algorithm. In more detail we: (i) consider the recent Random Graph Neural Network (RGNN) (Sato et al., 2020) framework and prove that it is universal in probability; (ii) show that RGNN augmented with LRGA aligns with 2-FWL update step via polynomial kernels; and (iii) bound the sample complexity of the kernel's feature map when learned with a randomly initialized two-layer MLP.\nFrom a practical point of view, augmenting existing GNN layers with LRGA produces state of the art results in current GNN benchmarks. Lastly, we observe that augmenting various GNN architectures with LRGA often closes the performance gap across different models.", "keywords": "Graph Neural Network;Self-Attention;Generalization of GNNs;Weisfeiler-Lehman", "primary_area": "", "supplementary_material": "", "author": "Omri Puny;Heli Ben-Hamu;Yaron Lipman", "authorids": "~Omri_Puny1;~Heli_Ben-Hamu1;~Yaron_Lipman1", "gender": "M;;", "homepage": "https://omri1348.github.io/;;", "dblp": "267/5465;;", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;;", "orcid": ";;", "linkedin": "omri-puny-0917771b2/;;", "or_profile": "~Omri_Puny1;~Heli_Ben-Hamu1;~Yaron_Lipman1", "aff": "Weizmann Institute of Science;;", "aff_domain": "weizmann.ac.il;;", "position": "PhD student;;", "bibtex": "@misc{\npuny2021global,\ntitle={Global Attention Improves Graph Networks Generalization},\nauthor={Omri Puny and Heli Ben-Hamu and Yaron Lipman},\nyear={2021},\nurl={https://openreview.net/forum?id=H-BVtEaipej}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=H-BVtEaipej", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "5;4;3;4", "wc_review": "498;304;308;651", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "807;315;213;351", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 440.25, 144.74525035385443 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 421.5, 228.25150601912793 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 33, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5444121415195617136&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Weizmann Institute of Science", "aff_unique_dep": "", "aff_unique_url": "https://www.weizmann.org.il", "aff_unique_abbr": "Weizmann", "aff_country_unique_index": "0", "aff_country_unique": "Israel" }, { "id": "H-SPvQtMwm", "title": "Synthesizer: Rethinking Self-Attention for Transformer Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models. But is it really required? This paper investigates the true importance and contribution of the dot product-based self-attention mechanism on the performance of Transformer models. Via extensive experiments, we find that (1) random alignment matrices surprisingly perform quite competitively and (2) learning attention weights from token-token (query-key) interactions is useful but not that important after all. To this end, we propose \\textsc{Synthesizer}, a model that learns synthetic attention weights without token-token interactions. In our experiments, we first show that simple Synthesizers achieve highly competitive performance when compared against vanilla Transformer models across a range of tasks, including machine translation, language modeling, text generation and GLUE/SuperGLUE benchmarks. When composed with dot product attention, we find that Synthesizers consistently outperform Transformers. Moreover, we conduct additional comparisons of Synthesizers against Dynamic Convolutions, showing that simple Random Synthesizer is not only $60\\%$ faster but also improves perplexity by a relative $3.5\\%$. Finally, we show that simple factorized Synthesizers can outperform Linformers on encoding only tasks. ", "keywords": "Transformers;Deep Learning;Attention", "primary_area": "", "supplementary_material": "", "author": "Yi Tay;Dara Bahri;Donald Metzler;Da-Cheng Juan;Zhe Zhao;Che Zheng", "authorids": "~Yi_Tay1;~Dara_Bahri1;metzler@google.com;~Da-Cheng_Juan1;~Zhe_Zhao3;chezheng@google.com", "gender": "M;M;;;M;", "homepage": "http://yitay.net;http://www.dara.run;;;https://sites.google.com/view/zhezhao;", "dblp": ";231/7656;;47/1564;28/6429-1.html;", "google_scholar": "VBclY_cAAAAJ;j5PpTOwAAAAJ;;https://scholar.google.com/citations?hl=en;TRZB0J4AAAAJ;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Yi_Tay1;~Dara_Bahri1;metzler@google.com;~Da-Cheng_Juan1;~Zhe_Zhao3;chezheng@google.com", "aff": "Google;Google Research;;Google Research;Google;", "aff_domain": "google.com;google.com;;google.com;google.com;", "position": "Research Scientist;Research Scientist;;Senior Software Engineer;Research Scientist;", "bibtex": "@misc{\ntay2021synthesizer,\ntitle={Synthesizer: Rethinking Self-Attention for Transformer Models},\nauthor={Yi Tay and Dara Bahri and Donald Metzler and Da-Cheng Juan and Zhe Zhao and Che Zheng},\nyear={2021},\nurl={https://openreview.net/forum?id=H-SPvQtMwm}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=H-SPvQtMwm", "pdf_size": 0, "rating": "4;5;7;7", "confidence": "4;4;4;4", "wc_review": "784;403;428;199", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "231;690;380;125", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 1.299038105676658 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 453.5, 210.4762456905767 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 356.5, 212.78921495226209 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 424, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13387830876140432247&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Average-case Acceleration for Bilinear Games and Normal Matrices", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3051", "id": "H0syOoy3Ash", "poster": "", "openreview": "https://openreview.net/forum?id=H0syOoy3Ash", "slides": "https://iclr.cc/virtual/2021/poster/3051", "video": "https://iclr.cc/virtual/2021/poster/3051", "author_site": "Carles Domingo i Enrich, Fabian Pedregosa, Damien Scieur", "tldr": "", "abstract": "Advances in generative modeling and adversarial learning have given rise to renewed interest in smooth games. However, the absence of symmetry in the matrix of second derivatives poses challenges that are not present in the classical minimization framework. While a rich theory of average-case analysis has been developed for minimization problems, little is known in the context of smooth games. In this work we take a first step towards closing this gap by developing average-case optimal first-order methods for a subset of smooth games. \nWe make the following three main contributions. First, we show that for zero-sum bilinear games the average-case optimal method is the optimal method for the minimization of the Hamiltonian. Second, we provide an explicit expression for the optimal method corresponding to normal matrices, potentially non-symmetric. Finally, we specialize it to matrices with eigenvalues located in a disk and show a provable speed-up compared to worst-case optimal algorithms. We illustrate our findings through benchmarks with a varying degree of mismatch with our assumptions.", "keywords": "Smooth games;First-order Methods;Acceleration;Bilinear games;Average-case Analysis;Orthogonal Polynomials", "primary_area": "", "supplementary_material": "/attachment/f01a3bef2d2ffddb195ae04ffcbce35ba6c20f5b.zip", "author": "Carles Domingo-Enrich;Fabian Pedregosa;Damien Scieur", "authorids": "~Carles_Domingo-Enrich1;~Fabian_Pedregosa1;~Damien_Scieur3", "gender": "M;M;M", "homepage": "https://cdenrich.github.io;http://fa.bianp.net;https://damienscieur.com/", "dblp": "216/7444.html;11/9764;191/6712", "google_scholar": "1ZHcGwIAAAAJ;https://scholar.google.fr/citations?hl=en;https://scholar.google.fr/citations?user=hNscQzgAAAAJ", "orcid": ";0000-0003-4025-3953;", "linkedin": ";http://www.linkedin.com/in/fabianpedregosa;damien-scieur-6873ba82/", "or_profile": "~Carles_Domingo-Enrich1;~Fabian_Pedregosa1;~Damien_Scieur3", "aff": "New York University;Google AI;Samsung", "aff_domain": "nyu.edu;google.com;samsung.com", "position": "PhD student;Research Scientist;Researcher", "bibtex": "@inproceedings{\ndomingo-enrich2021averagecase,\ntitle={Average-case Acceleration for Bilinear Games and Normal Matrices},\nauthor={Carles Domingo-Enrich and Fabian Pedregosa and Damien Scieur},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=H0syOoy3Ash}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7", "confidence": "3;2;4", "wc_review": "197;193;559", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "444;141;767", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 316.3333333333333, 171.59901579619336 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 450.6666666666667, 255.60690305406246 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7973079921158046715&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=H0syOoy3Ash", "email": "nyu.edu;google.com;samsung.com", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "New York University;Google;Samsung", "aff_unique_dep": ";Google AI;Samsung", "aff_unique_url": "https://www.nyu.edu;https://ai.google;https://www.samsung.com", "aff_unique_abbr": "NYU;Google AI;Samsung", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;South Korea" }, { "id": "H38f_9b90BO", "title": "Towards Robust Graph Neural Networks against Label Noise", "track": "main", "status": "Reject", "tldr": "", "abstract": "Massive labeled data have been used in training deep neural networks, thus label noise has become an important issue therein. Although learning with noisy labels has made great progress on image datasets in recent years, it has not yet been studied in connection with utilizing GNNs to classify graph nodes. In this paper, we proposed a method, named LPM, to address the problem using Label Propagation (LP) and Meta learning. Different from previous methods designed for image datasets, our method is based on a special attribute (label smoothness) of graph-structured data, i.e., neighboring nodes in a graph tend to have the same label. A pseudo label is computed from the neighboring labels for each node in the training set using LP; meta learning is utilized to learn a proper aggregation of the original and pseudo label as the final label. Experimental results demonstrate that LPM outperforms state-of-the-art methods in graph node classification task with both synthetic and real-world label noise. Source code to reproduce all results will be released.", "keywords": "Graph Neural Networks;Graph Node Classification;Label Noise", "primary_area": "", "supplementary_material": "/attachment/a5c59bad1a3c6aef2da8f19b4e7498938c51714a.zip", "author": "Jun Xia;Haitao Lin;Yongjie Xu;Lirong Wu;Zhangyang Gao;Siyuan Li;Stan Z. Li", "authorids": "~Jun_Xia1;~Haitao_Lin2;~Yongjie_Xu1;~Lirong_Wu1;~Zhangyang_Gao1;~Siyuan_Li6;~Stan_Z._Li2", "gender": "M;M;;;M;M;M", "homepage": "http://junxia97.github.io/;;;;;https://lupin1998.github.io/;https://en.westlake.edu.cn/academics/School_of_Engineering/About/Our_People/Faculty/201912/t20191206_2497.shtml", "dblp": ";34/1040;;15/10330;275/3266;63/9705-2;l/StanZLi", "google_scholar": "aPKKpSYAAAAJ;o5A23qIAAAAJ;;Tk7TrCoAAAAJ;4SclT-QAAAAJ;https://scholar.google.com/citations?hl=zh-CN;https://scholar.google.com/citations?hl=zh-CN", "orcid": ";;;;0000-0003-1026-6083;0000-0001-6806-2468;", "linkedin": ";;;;;https://www.linkedin.cn/incareer/in/siyuan-li-lupin1998/;stan-z-li-%E6%9D%8E%E5%AD%90%E9%9D%92-55753224/", "or_profile": "~Jun_Xia1;~Haitao_Lin2;~Yongjie_Xu1;~Lirong_Wu1;~Zhangyang_Gao1;~Siyuan_Li6;~Stan_Z._Li1", "aff": "Westlake University, China;Westlake University;;Westlake University;Westlake University, China;Nanjing University;Westlake University", "aff_domain": "westlake.edu.cn;westlake.edu.cn;;westlake.edu.cn;westlake.edu.cn;nju.edu.cn;westlake.edu.cn", "position": "PhD student;PhD student;;PhD student;PhD student;Undergrad student;Chair Professor", "bibtex": "@misc{\nxia2021towards,\ntitle={Towards Robust Graph Neural Networks against Label Noise},\nauthor={Jun Xia and Haitao Lin and Yongjie Xu and Lirong Wu and Zhangyang Gao and Siyuan Li and Stan Z. Li},\nyear={2021},\nurl={https://openreview.net/forum?id=H38f_9b90BO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=H38f_9b90BO", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "5;4;4;4", "wc_review": "433;244;493;135", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "789;620;579;262", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 326.25, 143.6512704433901 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 562.5, 190.51312290758347 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.7745966692414834, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5432735244692393734&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;1;0", "aff_unique_norm": "Westlake University;Nanjing University", "aff_unique_dep": ";", "aff_unique_url": "https://www.westlake.edu.cn;https://www.nju.edu.cn", "aff_unique_abbr": "WU;Nanjing U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "China" }, { "id": "H5B3lmpO1g", "title": "Goal-Auxiliary Actor-Critic for 6D Robotic Grasping with Point Clouds", "track": "main", "status": "Reject", "tldr": "", "abstract": "6D robotic grasping beyond top-down bin-picking scenarios is a challenging task. Previous solutions based on 6D grasp synthesis with robot motion planning usually operate in an open-loop setting without considering perception feedback and dynamics and contacts of objects, which makes them sensitive to grasp synthesis errors. In this work, we propose a novel method for learning closed-loop control policies for 6D robotic grasping using point clouds from an egocentric camera. We combine imitation learning and reinforcement learning in order to grasp unseen objects and handle the continuous 6D action space, where expert demonstrations are obtained from a joint motion and grasp planner. We introduce a goal-auxiliary actor-critic algorithm, which uses grasping goal prediction as an auxiliary task to facilitate policy learning. The supervision on grasping goals can be obtained from the expert planner for known objects or from hindsight goals for unknown objects. Overall, our learned closed-loop policy achieves over $90\\%$ success rates on grasping various ShapeNet objects and YCB objects in simulation. The policy also transfers well to the real world with only one failure among grasping of ten different unseen objects in the presence of perception noises.", "keywords": "Robotics;Reinforcement Learning;Learning from Demonstration", "primary_area": "", "supplementary_material": "/attachment/7c9443166a6cf707edbf67808fa4bbd3781dfa65.zip", "author": "Lirui Wang;Yu Xiang;Dieter Fox", "authorids": "~Lirui_Wang1;~Yu_Xiang3;~Dieter_Fox1", "gender": "M;M;M", "homepage": "https://liruiw.github.io/;https://homes.cs.washington.edu/~fox/;https://yuxng.github.io/", "dblp": "221/9612;f/DieterFox;00/6716-1", "google_scholar": "EM9YhH0AAAAJ;DqXsbPAAAAAJ;", "orcid": ";;0000-0001-9431-5131", "linkedin": ";;", "or_profile": "~Lirui_Wang1;~Dieter_Fox1;~Yu_Xiang1", "aff": "University of Washington, Seattle;Department of Computer Science;NVIDIA", "aff_domain": "uw.edu;cs.washington.edu;nvidia.com", "position": "MS student;Full Professor;Research Scientist", "bibtex": "@misc{\nwang2021goalauxiliary,\ntitle={Goal-Auxiliary Actor-Critic for 6D Robotic Grasping with Point Clouds},\nauthor={Lirui Wang and Yu Xiang and Dieter Fox},\nyear={2021},\nurl={https://openreview.net/forum?id=H5B3lmpO1g}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=H5B3lmpO1g", "pdf_size": 0, "rating": "5;6;7", "confidence": "4;4;4", "wc_review": "530;406;719", "wc_reply_reviewers": "0;0;58", "wc_reply_authors": "829;344;681", "reply_reviewers": "0;0;1", "reply_authors": "1;1;2", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 551.6666666666666, 128.69688764258788 ], "wc_reply_reviewers_avg": [ 19.333333333333332, 27.34146220587984 ], "wc_reply_authors_avg": [ 618.0, 202.94991171879497 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 55, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6010967143703049350&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Washington;Unknown Institution;NVIDIA", "aff_unique_dep": ";Department of Computer Science;NVIDIA Corporation", "aff_unique_url": "https://www.washington.edu;;https://www.nvidia.com", "aff_unique_abbr": "UW;;NVIDIA", "aff_campus_unique_index": "0", "aff_campus_unique": "Seattle;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States;" }, { "title": "Layer-adaptive Sparsity for the Magnitude-based Pruning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3108", "id": "H6ATjJ0TKdf", "poster": "", "openreview": "https://openreview.net/forum?id=H6ATjJ0TKdf", "slides": "https://iclr.cc/virtual/2021/poster/3108", "video": "https://iclr.cc/virtual/2021/poster/3108", "author_site": "Jaeho Lee, Sejun Park, Sangwoo Mo, Sungsoo Ahn, Jinwoo Shin", "tldr": "", "abstract": "Recent discoveries on neural network pruning reveal that, with a carefully chosen layerwise sparsity, a simple magnitude-based pruning achieves state-of-the-art tradeoff between sparsity and performance. However, without a clear consensus on ``how to choose,'' the layerwise sparsities are mostly selected algorithm-by-algorithm, often resorting to handcrafted heuristics or an extensive hyperparameter search. To fill this gap, we propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score; the score is a rescaled version of weight magnitude that incorporates the model-level $\\ell_2$ distortion incurred by pruning, and does not require any hyperparameter tuning or heavy computation.\nUnder various image classification setups, LAMP consistently outperforms popular existing schemes for layerwise sparsity selection.\nFurthermore, we observe that LAMP continues to outperform baselines even in weight-rewinding setups, while the connectivity-oriented layerwise sparsity (the strongest baseline overall) performs worse than a simple global magnitude-based pruning in this case. Code: https://github.com/jaeho-lee/layer-adaptive-sparsity", "keywords": "network pruning;layerwise sparsity;magnitude-based pruning", "primary_area": "", "supplementary_material": "/attachment/d58ab526ea48bf9d41dc018a7eed88dcd369e06b.zip", "author": "Jaeho Lee;Sejun Park;Sangwoo Mo;Sungsoo Ahn;Jinwoo Shin", "authorids": "~Jaeho_Lee3;~Sejun_Park1;~Sangwoo_Mo1;~Sungsoo_Ahn1;~Jinwoo_Shin1", "gender": "M;;M;M;M", "homepage": "https://jaeho-lee.github.io;;https://sites.google.com/view/sangwoomo;https://sungsooahn.super.site/;https://sites.google.com/site/mijirim/", "dblp": "78/6080-1;155/9882;198/0432;90/5164;31/7062", "google_scholar": "t91zoQMAAAAJ;;https://scholar.google.co.kr/citations?user=Sq9y3NMAAAAJ;XTenHs0AAAAJ;https://scholar.google.com.tw/citations?user=m3eDp7kAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Jaeho_Lee3;~Sejun_Park1;~Sangwoo_Mo1;~Sungsoo_Ahn1;~Jinwoo_Shin1", "aff": "Korea Advanced Institute of Science & Technology;Korea University;KAIST;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;korea.ac.kr;kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "position": "Postdoc;Assistant Professor;PhD student;PhD student;Associate Professor", "bibtex": "@inproceedings{\nlee2021layeradaptive,\ntitle={Layer-adaptive Sparsity for the Magnitude-based Pruning},\nauthor={Jaeho Lee and Sejun Park and Sangwoo Mo and Sungsoo Ahn and Jinwoo Shin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=H6ATjJ0TKdf}\n}", "github": "[![github](/images/github_icon.svg) jaeho-lee/layer-adaptive-sparsity](https://github.com/jaeho-lee/layer-adaptive-sparsity)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "4;4;5;3", "wc_review": "172;1270;942;226", "wc_reply_reviewers": "0;0;0;49", "wc_reply_authors": "477;615;908;217", "reply_reviewers": "0;0;0;1", "reply_authors": "2;2;2;2", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 652.5, 468.4813230001811 ], "wc_reply_reviewers_avg": [ 12.25, 21.21762239271875 ], "wc_reply_authors_avg": [ 554.25, 249.26629836381812 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.3162277660168379, "gs_citation": 280, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16870181998029600993&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=H6ATjJ0TKdf", "email": "kaist.ac.kr;korea.ac.kr;kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "author_num": 5, "aff_unique_index": "0;1;0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology;Korea University", "aff_unique_dep": ";", "aff_unique_url": "https://www.kaist.ac.kr;https://www.korea.ac.kr", "aff_unique_abbr": "KAIST;KU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "H6ZWlQrPGS2", "title": "Fast Binarized Neural Network Training with Partial Pre-training", "track": "main", "status": "Reject", "tldr": "", "abstract": "Binarized neural networks, networks with weights and activations constrained to lie in a 2-element set, allow for more time- and resource-efficient inference than standard floating-point networks. However, binarized neural networks typically take more training to plateau in accuracy than their floating-point counterparts, in terms of both iteration count and wall clock time. We demonstrate a technique, partial pre-training, that allows for faster from-scratch training of binarized neural networks by first training the network as a standard floating-point network for a short amount of time, then converting the network to a binarized neural network and continuing to train from there. Without tuning any hyperparameters across four networks on three different datasets, partial pre-training is able to train binarized neural networks between $1.26\\times$ and $1.61\\times$ faster than when training a binarized network from scratch using standard low-precision training.\n", "keywords": "binarized neural network;binary;quantized;1-bit;low precision", "primary_area": "", "supplementary_material": "", "author": "Alex Renda;Joshua Wolff Fromm", "authorids": "~Alex_Renda2;~Joshua_Wolff_Fromm1", "gender": "M;M", "homepage": "https://alexrenda.com;", "dblp": "206/6568;", "google_scholar": "4BCuJ2AAAAAJ;WExIZfoAAAAJ", "orcid": ";", "linkedin": ";josh-fromm-2a4a2258/", "or_profile": "~Alex_Renda2;~Joshua_Wolff_Fromm1", "aff": "Massachusetts Institute of Technology;OctoML", "aff_domain": "mit.edu;octoml.ai", "position": "PhD student;Architect", "bibtex": "@misc{\nrenda2021fast,\ntitle={Fast Binarized Neural Network Training with Partial Pre-training},\nauthor={Alex Renda and Joshua Wolff Fromm},\nyear={2021},\nurl={https://openreview.net/forum?id=H6ZWlQrPGS2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=H6ZWlQrPGS2", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "5;5;3;4", "wc_review": "211;424;335;220", "wc_reply_reviewers": "0;61;22;0", "wc_reply_authors": "237;451;415;247", "reply_reviewers": "0;1;1;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 297.5, 87.8877124517415 ], "wc_reply_reviewers_avg": [ 20.75, 24.913600703230355 ], "wc_reply_authors_avg": [ 337.5, 96.40928378532847 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Lbkn7eA34ngJ:scholar.google.com/&scioq=Fast+Binarized+Neural+Network+Training+with+Partial+Pre-training&hl=en&as_sdt=0,33", "gs_version_total": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Massachusetts Institute of Technology;OctoML", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://www.octoml.ai", "aff_unique_abbr": "MIT;OctoML", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2636", "id": "H8UHdhWG6A3", "poster": "", "openreview": "https://openreview.net/forum?id=H8UHdhWG6A3", "slides": "https://iclr.cc/virtual/2021/poster/2636", "video": "https://iclr.cc/virtual/2021/poster/2636", "author_site": "El Mahdi El Mhamdi, Rachid Guerraoui, S\u00e9bastien Rouault", "tldr": "", "abstract": "Byzantine-resilient Stochastic Gradient Descent (SGD) aims at shielding model training from Byzantine faults, be they ill-labeled training datapoints, exploited software/hardware vulnerabilities, or malicious worker nodes in a distributed setting.\nTwo recent attacks have been challenging state-of-the-art defenses though, often successfully precluding the model from even fitting the training set.\nThe main identified weakness in current defenses is their requirement of a sufficiently low variance-norm ratio for the stochastic gradients.\nWe propose a practical method which, despite increasing the variance, reduces the variance-norm ratio, mitigating the identified weakness.\nWe assess the effectiveness of our method over 736 different training configurations, comprising the 2 state-of-the-art attacks and 6 defenses.\nFor confidence and reproducibility purposes, each configuration is run 5 times with specified seeds (1 to 5), totalling 3680 runs.\nIn our experiments, when the attack is effective enough to decrease the highest observed top-1 cross-accuracy by at least 20% compared to the unattacked run, our technique systematically increases back the highest observed accuracy, and is able to recover at least 20% in more than 60% of the cases.", "keywords": "Byzantine SGD;Distributed ML;Momentum", "primary_area": "", "supplementary_material": "/attachment/51311b5dba1279ce111b2bdf5b7b3a6071f3ac41.zip", "author": "El Mahdi El Mhamdi;Rachid Guerraoui;S\u00e9bastien Rouault", "authorids": "el-mahdi.el-mhamdi@polytechnique.edu;~Rachid_Guerraoui1;~S\u00e9bastien_Rouault1", "gender": ";M;M", "homepage": ";https://lpdwww.epfl.ch/rachid/;https://sebastien.rouau.lt", "dblp": ";g/RachidGuerraoui;203/8639", "google_scholar": ";;5pSk6VAAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "el-mahdi.el-mhamdi@polytechnique.edu;~Rachid_Guerraoui1;~S\u00e9bastien_Rouault1", "aff": ";Swiss Federal Institute of Technology Lausanne;Swiss Federal Institute of Technology Lausanne", "aff_domain": ";epfl.ch;epfl.ch", "position": ";Full Professor;PhD student", "bibtex": "@inproceedings{\nmhamdi2021distributed,\ntitle={Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent},\nauthor={El Mahdi El Mhamdi and Rachid Guerraoui and S{\\'e}bastien Rouault},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=H8UHdhWG6A3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "4;4;6;7", "confidence": "3;3;2;4", "wc_review": "251;163;439;371", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "722;338;679;550", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 306.0, 106.52229813517918 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 572.25, 149.322427987225 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.2721655269759087, "gs_citation": 67, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6798314182032862837&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=H8UHdhWG6A3", "email": ";epfl.ch;epfl.ch", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Swiss Federal Institute of Technology Lausanne", "aff_unique_dep": "", "aff_unique_url": "https://www.epfl.ch", "aff_unique_abbr": "EPFL", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Lausanne", "aff_country_unique_index": "0;0", "aff_country_unique": "Switzerland" }, { "id": "H8VDvtm1ij8", "title": "Normalizing Flows for Calibration and Recalibration", "track": "main", "status": "Reject", "tldr": "", "abstract": "In machine learning, due to model misspecification and overfitting, estimates of the aleatoric uncertainty are often inaccurate.\nOne approach to fix this is isotonic regression, in which a monotonic function is fit on a validation set to map the model's CDF to an optimally calibrated CDF. However, this makes it infeasible to compute additional statistics of interest on the model distribution (such as the mean). In this paper, through a reframing of recalibration as MLE, we replace isotonic regression with normalizing flows. This allows us to retain the ability to compute the statistical properties of the model (such as closed-form likelihoods, mean, correlation, etc.) and provides an opportunity for additional capacity at the cost of possible overfitting. Most importantly, the fundamental properties of normalizing flows allow us to generalize recalibration to conditional and multivariate distributions. To aid in detecting miscalibration and measuring our success at fixing it, we use a simple extension of the calibration Q-Q plot.", "keywords": "recalibration;normalizing flows;uncertainty", "primary_area": "", "supplementary_material": "", "author": "Achintya Gopal;Aaron Key", "authorids": "~Achintya_Gopal1;~Aaron_Key1", "gender": ";M", "homepage": ";", "dblp": "274/7111;", "google_scholar": "hjsx568AAAAJ;", "orcid": ";", "linkedin": "achintya-gopal-278b2b76;aaron-key-b982a080", "or_profile": "~Achintya_Gopal1;~Aaron_Key1", "aff": "Bloomberg LP;", "aff_domain": "bloomberg.net;", "position": "Researcher;", "bibtex": "@misc{\ngopal2021normalizing,\ntitle={Normalizing Flows for Calibration and Recalibration},\nauthor={Achintya Gopal and Aaron Key},\nyear={2021},\nurl={https://openreview.net/forum?id=H8VDvtm1ij8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=H8VDvtm1ij8", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "5;4;3;1", "wc_review": "330;267;702;228", "wc_reply_reviewers": "0;0;91;0", "wc_reply_authors": "542;345;502;200", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.25, 1.479019945774904 ], "wc_review_avg": [ 381.75, 188.44412301793867 ], "wc_reply_reviewers_avg": [ 22.75, 39.40415587219196 ], "wc_reply_authors_avg": [ 397.25, 135.61226898772838 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:C677p6Q77MMJ:scholar.google.com/&scioq=Normalizing+Flows+for+Calibration+and+Recalibration&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Bloomberg", "aff_unique_dep": "", "aff_unique_url": "https://www.bloomberg.com", "aff_unique_abbr": "Bloomberg", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "H8hgu4XsTXi", "title": "Estimating Treatment Effects via Orthogonal Regularization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Decision-making often requires accurate estimation of causal effects from observational data. This is challenging as outcomes of alternative decisions are not observed and have to be estimated. Previous methods estimate outcomes based on unconfoundedness but neglect any constraints that unconfoundedness imposes on the outcomes. In this paper, we propose a novel regularization framework in which we formalize unconfoundedness as an orthogonality constraint. We provide theoretical guarantees that this yields an asymptotically normal estimator for the average causal effect. Compared to other estimators, its asymptotic variance is strictly smaller. Based on our regularization framework, we develop deep orthogonal networks for unconfounded treatments (DONUT) which learn outcomes that are orthogonal to the treatment assignment. Using a variety of benchmark datasets for causal inference, we demonstrate that DONUT outperforms the state-of-the-art substantially.", "keywords": "Treatment Effects;Regularization;Neural Networks", "primary_area": "", "supplementary_material": "/attachment/e5db5e0f85861665edb735d7168874c32b3622d2.zip", "author": "Tobias Hatt;Stefan Feuerriegel", "authorids": "~Tobias_Hatt1;sfeuerriegel@ethz.ch", "gender": "M;", "homepage": ";", "dblp": ";", "google_scholar": "71rMURwAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Tobias_Hatt1;sfeuerriegel@ethz.ch", "aff": "Swiss Federal Institute of Technology;", "aff_domain": "ethz.ch;", "position": "PhD student;", "bibtex": "@misc{\nhatt2021estimating,\ntitle={Estimating Treatment Effects via Orthogonal Regularization},\nauthor={Tobias Hatt and Stefan Feuerriegel},\nyear={2021},\nurl={https://openreview.net/forum?id=H8hgu4XsTXi}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=H8hgu4XsTXi", "pdf_size": 0, "rating": "3;5;5;7", "confidence": "4;4;3;4", "wc_review": "662;296;634;517", "wc_reply_reviewers": "315;0;0;0", "wc_reply_authors": "733;224;766;416", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;2;1", "rating_avg": [ 5.0, 1.4142135623730951 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 527.25, 144.16548650769366 ], "wc_reply_reviewers_avg": [ 78.75, 136.3990010960491 ], "wc_reply_authors_avg": [ 534.75, 225.52535888453875 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:p2aQkyyreCMJ:scholar.google.com/&scioq=Estimating+Treatment+Effects+via+Orthogonal+Regularization&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Swiss Federal Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich", "aff_country_unique_index": "0", "aff_country_unique": "Switzerland" }, { "id": "H92-E4kFwbR", "title": "Composite Adversarial Training for Multiple Adversarial Perturbations and Beyond", "track": "main", "status": "Reject", "tldr": "", "abstract": "One intriguing property of deep neural networks (DNNs) is their vulnerability to adversarial perturbations. Despite the plethora of work on defending against individual perturbation models, improving DNN robustness against the combinations of multiple perturbations is still fairly under-studied. In this paper, we propose \\underline{c}omposite \\underline{a}dversarial \\underline{t}raining (CAT), a novel training method that flexibly integrates and optimizes multiple adversarial losses, leading to significant robustness improvement with respect to individual perturbations as well as their ``compositions''. Through empirical evaluation on benchmark datasets and models, we show that CAT outperforms existing adversarial training methods by large margins in defending against the compositions of pixel perturbations and spatial transformations, two major classes of adversarial perturbation models, while incurring limited impact on clean inputs.", "keywords": "adversarial examples;deep learning;robustness", "primary_area": "", "supplementary_material": "", "author": "Xinyang Zhang;Zheng Zhang;Ting Wang", "authorids": "~Xinyang_Zhang5;zxz147@psu.edu;~Ting_Wang1", "gender": "M;;M", "homepage": ";;https://alps-lab.github.io/", "dblp": "29/2669-1;;12/2633-6.html", "google_scholar": "EvnQfLsAAAAJ;;cwcBTegAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Xinyang_Zhang5;zxz147@psu.edu;~Ting_Wang1", "aff": "Pennsylvania State University;;Pennsylvania State University", "aff_domain": "psu.edu;;psu.edu", "position": "PhD student;;Assistant Professor", "bibtex": "@misc{\nzhang2021composite,\ntitle={Composite Adversarial Training for Multiple Adversarial Perturbations and Beyond},\nauthor={Xinyang Zhang and Zheng Zhang and Ting Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=H92-E4kFwbR}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=H92-E4kFwbR", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "3;3;4;3", "wc_review": "568;429;429;367", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "236;0;296;174", "reply_reviewers": "0;0;0;0", "reply_authors": "1;0;1;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 448.25, 73.62531833547479 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 176.5, 110.65599848178137 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.75, 0.4330127018922193 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13419262170637225814&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Pennsylvania State University", "aff_unique_dep": "", "aff_unique_url": "https://www.psu.edu", "aff_unique_abbr": "PSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "HC5VgCHtU10", "title": "Disentangling style and content for low resource video domain adaptation: a case study on keystroke inference attacks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Keystroke inference attacks are a form of side-channels attacks in which an attacker leverages various techniques to recover a user\u2019s keystrokes as she inputs information into some display (for example, while sending a text message or entering her pin). Typically, these attacks leverage machine learning approaches, but assessing the realism of the threat space has lagged behind the pace of machine learning advancements, due in-part, to the challenges in curating large real-life datasets. This paper aims to overcome the challenge of having limited number of real data by introducing a video domain adaptation technique that is able to leverage synthetic data through supervised disentangled learning. Specifically, for a given domain, we decompose the observed data into two factors of variation: Style and Content. Doing so provides four learned representations: real-life style, synthetic style, real-life content and synthetic content. Then, we combine them into feature representations from all combinations of style-content pairings across domains, and train a model on these combined representations to classify the content (i.e., labels) of a given datapoint in the style of another domain. We evaluate our method on real-life data using a variety of metrics to quantify the amount of information an attacker is able to recover. We show that our method prevents our model from overfitting to a small real-life training set, indicating that our method is an effective form of data augmentation. Code and data will be released after reviewal. ", "keywords": "Applications;side channel attacks;supervised disentangled learning;video domain adaptation", "primary_area": "", "supplementary_material": "", "author": "John Lim;Fabian Monrose;Jan-Michael Frahm", "authorids": "~John_Lim1;~Fabian_Monrose1;~Jan-Michael_Frahm1", "gender": ";M;M", "homepage": ";https://www.cs.unc.edu/~fabian/;http://frahm.web.unc.edu/", "dblp": "92/7022;;19/6011", "google_scholar": ";https://scholar.google.com.tw/citations?user=a6u7NTgAAAAJ;https://scholar.google.com.tw/citations?user=3YOe4NMAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~John_Lim1;~Fabian_Monrose1;~Jan-Michael_Frahm1", "aff": "Department of Computer Science, University of North Carolina, Chapel Hill;University of North Carolina at Chapel Hill;Department of Computer Science, University of North Carolina, Chapel Hill", "aff_domain": "cs.unc.edu;;cs.unc.edu", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nlim2021disentangling,\ntitle={Disentangling style and content for low resource video domain adaptation: a case study on keystroke inference attacks},\nauthor={John Lim and Fabian Monrose and Jan-Michael Frahm},\nyear={2021},\nurl={https://openreview.net/forum?id=HC5VgCHtU10}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=HC5VgCHtU10", "pdf_size": 0, "rating": "5;5;7;7", "confidence": "2;4;3;2", "wc_review": "407;221;322;299", "wc_reply_reviewers": "188;0;64;0", "wc_reply_authors": "975;1513;313;509", "reply_reviewers": "2;0;1;0", "reply_authors": "4;3;2;1", "rating_avg": [ 6.0, 1.0 ], "confidence_avg": [ 2.75, 0.82915619758885 ], "wc_review_avg": [ 312.25, 66.2848964697087 ], "wc_reply_reviewers_avg": [ 63.0, 76.75285010994706 ], "wc_reply_authors_avg": [ 827.5, 463.092593333126 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.5, 1.118033988749895 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:BNHJg9iqzDEJ:scholar.google.com/&scioq=Disentangling+style+and+content+for+low+resource+video+domain+adaptation:+a+case+study+on+keystroke+inference+attacks&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of North Carolina", "aff_unique_dep": "Department of Computer Science", "aff_unique_url": "https://www.unc.edu", "aff_unique_abbr": "UNC", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Chapel Hill", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Learning and Evaluating Representations for Deep One-Class Classification", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3244", "id": "HCSgyPUfeDj", "poster": "", "openreview": "https://openreview.net/forum?id=HCSgyPUfeDj", "slides": "https://iclr.cc/virtual/2021/poster/3244", "video": "https://iclr.cc/virtual/2021/poster/3244", "author_site": "Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon, Minho Jin, Tomas Pfister", "tldr": "", "abstract": "We present a two-stage framework for deep one-class classification. We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations. The framework not only allows to learn better representations, but also permits building one-class classifiers that are faithful to the target task. We argue that classifiers inspired by the statistical perspective in generative or discriminative models are more effective than existing approaches, such as a normality score from a surrogate classifier. We thoroughly evaluate different self-supervised representation learning algorithms under the proposed framework for one-class classification. Moreover, we present a novel distribution-augmented contrastive learning that extends training distributions via data augmentation to obstruct the uniformity of contrastive representations. In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks, including novelty and anomaly detection. Finally, we present visual explanations, confirming that the decision-making process of deep one-class classifiers is intuitive to humans. The code is available at https://github.com/google-research/deep_representation_one_class.\n", "keywords": "deep one-class classification;self-supervised learning", "primary_area": "", "supplementary_material": "", "author": "Kihyuk Sohn;Chun-Liang Li;Jinsung Yoon;Minho Jin;Tomas Pfister", "authorids": "~Kihyuk_Sohn1;~Chun-Liang_Li1;~Jinsung_Yoon1;minhojin@google.com;~Tomas_Pfister1", "gender": "M;M;M;;M", "homepage": "https://sites.google.com/site/kihyuksml/;http://chunliangli.github.io;https://sites.google.com/corp/view/jinsungyoon;;http://tomas.pfister.fi", "dblp": "53/10771;;173/5409.html;;14/8360", "google_scholar": "VxpypngAAAAJ;https://scholar.google.com.tw/citations?user=vqHIt_sAAAAJ;kiFd6A8AAAAJ;;ahSpJOAAAAAJ", "orcid": ";;;;0009-0004-4088-8718", "linkedin": ";;jinsung-yoon-bb7751b8;;", "or_profile": "~Kihyuk_Sohn1;~Chun-Liang_Li1;~Jinsung_Yoon1;minhojin@google.com;~Tomas_Pfister1", "aff": "Google;Google;Google;;Google", "aff_domain": "google.com;google.com;google.com;;google.com", "position": "Research Scientist;Researcher;Research Scientist;;Head of Research @ Cloud AI", "bibtex": "@inproceedings{\nsohn2021learning,\ntitle={Learning and Evaluating Representations for Deep One-Class Classification},\nauthor={Kihyuk Sohn and Chun-Liang Li and Jinsung Yoon and Minho Jin and Tomas Pfister},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=HCSgyPUfeDj}\n}", "github": "[![github](/images/github_icon.svg) google-research/deep_representation_one_class](https://github.com/google-research/deep_representation_one_class)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "5;4;4;3", "wc_review": "531;420;312;404", "wc_reply_reviewers": "141;85;0;0", "wc_reply_authors": "1292;1206;112;16", "reply_reviewers": "1;1;0;0", "reply_authors": "3;2;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 416.75, 77.77973707849623 ], "wc_reply_reviewers_avg": [ 56.5, 59.86860613042532 ], "wc_reply_authors_avg": [ 656.5, 594.2497370634673 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 271, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6458276904017990971&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=HCSgyPUfeDj", "email": "google.com;google.com;google.com;;google.com", "author_num": 5, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "HCa8gC_COVk", "title": "Mutual Calibration between Explicit and Implicit Deep Generative Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep generative models are generally categorized into explicit models and implicit models. The former defines an explicit density form that allows likelihood inference; while the latter targets a flexible transformation from random noise to generated samples. To take full advantages of both models, we propose Stein Bridging, a novel joint training framework that connects an explicit (unnormalized) density estimator and an implicit sample generator via Stein discrepancy. We show that the Stein bridge 1) induces novel mutual regularization via kernel Sobolev norm penalization and Moreau-Yosida regularization, and 2) stabilizes the training dynamics. Empirically, we demonstrate that Stein Bridging can facilitate the density estimator to accurately identify data modes and guide the sample generator to output more high-quality samples especially when the training samples are contaminated or limited.", "keywords": "deep generative models;generative adversarial networks;density estimation", "primary_area": "", "supplementary_material": "", "author": "Qitian Wu;Rui Gao;Hongyuan Zha", "authorids": "~Qitian_Wu1;~Rui_Gao3;~Hongyuan_Zha1", "gender": ";;", "homepage": ";https://faculty.mccombs.utexas.edu/rui.gao/index.html;", "dblp": ";43/2694-1;z/HongyuanZha", "google_scholar": ";LWJj85wAAAAJ;n1DQMIsAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Qitian_Wu1;~Rui_Gao3;~Hongyuan_Zha1", "aff": ";University of Texas, Austin;The Chinese University of Hong Kong, Shenzhen", "aff_domain": ";utexas.edu;cuhk.edu.cn", "position": ";Assistant Professor;Full Professor", "bibtex": "@misc{\nwu2021mutual,\ntitle={Mutual Calibration between Explicit and Implicit Deep Generative Models},\nauthor={Qitian Wu and Rui Gao and Hongyuan Zha},\nyear={2021},\nurl={https://openreview.net/forum?id=HCa8gC_COVk}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=HCa8gC_COVk", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "5;3;3;4", "wc_review": "335;759;277;465", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 459.0, 186.10212250267324 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6225430174794673, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:v-8SiUmnbQQJ:scholar.google.com/&scioq=Mutual+Calibration+between+Explicit+and+Implicit+Deep+Generative+Models&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Texas at Austin;Chinese University of Hong Kong", "aff_unique_dep": ";", "aff_unique_url": "https://www.utexas.edu;https://www.cuhk.edu.cn", "aff_unique_abbr": "UT Austin;CUHK", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Austin;Shenzhen", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;China" }, { "id": "HFJWWQP3ado", "title": "Max-Affine Spline Insights Into Deep Network Pruning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "In this paper, we study the importance of pruning in Deep Networks (DNs) and motivate it based on the current absence of data aware weight initialization. Current DN initializations, focusing primarily at maintaining first order statistics of the feature maps through depth, force practitioners to overparametrize a model in order to reach high performances. This overparametrization can then be pruned a posteriori, leading to a phenomenon known as \"winning tickets\". However, the pruning literature still relies on empirically investigations, lacking a theoretical understanding of (1) how pruning affects the decision boundary, (2) how to interpret pruning, (3) how to design principled pruning techniques, and (4) how to theoretically study pruning. To tackle those questions, we propose to employ recent advances in theoretical analysis of Continuous Piecewise Affine (CPA) DNs. From this viewpoint, we can study the DNs' input space partitioning and detect the early-bird (EB) phenomenon, guide practitioners by identifying when to stop the first training step, provide interpretability into current pruning techniques, and develop a principled pruning criteria towards efficient DN training. Finally, we conduct extensive experiments to shown the effectiveness of the proposed spline pruning criteria in terms of both layerwise and global pruning over state-of-the-art pruning methods.\nAll the codes will be released publicly upon acceptance.", "keywords": "Network pruning;Spline theory", "primary_area": "", "supplementary_material": "", "author": "Randall Balestriero;Haoran You;Zhihan Lu;Yutong Kou;Yingyan Lin;Richard Baraniuk", "authorids": "~Randall_Balestriero1;~Haoran_You1;~Zhihan_Lu1;~Yutong_Kou1;~Yingyan_Lin1;~Richard_Baraniuk1", "gender": "M;M;M;M;F;", "homepage": "https://randallbalestriero.github.io/;http://haoranyou.com/;;https://kou-99.github.io/;https://eiclab.scs.gatech.edu/;http://richb.rice.edu/", "dblp": "175/5364;230/4247;;247/4139;120/6981;32/2804", "google_scholar": "S1x_xqcAAAAJ;z5Eku1sAAAAJ;;;dio8IesAAAAJ;https://scholar.google.com.tw/citations?user=N-BBA20AAAAJ", "orcid": ";0000-0002-2873-2153;;;;", "linkedin": "randallbalestriero/;haoran-you-b4b958165/;zhihan-lu/;;yingyan-celine-lin-a281211a/;richard-baraniuk", "or_profile": "~Randall_Balestriero1;~Haoran_You1;~Zhihan_Lu1;~Yutong_Kou1;~Yingyan_Lin1;~Richard_Baraniuk1", "aff": "Rice University;Rice University;Rice University;Huazhong University of Science and Technology;Rice University;William Marsh Rice University", "aff_domain": "rice.edu;rice.edu;rice.edu;hust.edu;rice.edu;rice.edu", "position": "PhD student;PhD student;Undergrad student;Undergrad student;Assistant Professor;C. Sidney Burrus Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=HFJWWQP3ado", "pdf_size": 0, "rating": "2;4;4;5", "confidence": "4;4;5;5", "wc_review": "341;1281;447;191", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 1.0897247358851685 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 565.0, 423.27059902620215 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.6882472016116854, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12609309598852783522&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0;1;0;0", "aff_unique_norm": "Rice University;Huazhong University of Science and Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.rice.edu;http://www.hust.edu.cn", "aff_unique_abbr": "Rice;HUST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1;0;0", "aff_country_unique": "United States;China" }, { "title": "Graph-Based Continual Learning", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2524", "id": "HHSEKOnPvaO", "poster": "", "openreview": "https://openreview.net/forum?id=HHSEKOnPvaO", "slides": "https://iclr.cc/virtual/2021/poster/2524", "video": "https://iclr.cc/virtual/2021/poster/2524", "author_site": "Binh Tang, David S Matteson", "tldr": "", "abstract": "Despite significant advances, continual learning models still suffer from catastrophic forgetting when exposed to incrementally available data from non-stationary distributions. Rehearsal approaches alleviate the problem by maintaining and replaying a small episodic memory of previous samples, often implemented as an array of independent memory slots. In this work, we propose to augment such an array with a learnable random graph that captures pairwise similarities between its samples, and use it not only to learn new tasks but also to guard against forgetting. Empirical results on several benchmark datasets show that our model consistently outperforms recently proposed baselines for task-free continual learning.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/7e4494ef4e463d64bf626f99de8c1fcaf4816d2c.zip", "author": "Binh Tang;David S. Matteson", "authorids": "~Binh_Tang1;~David_S._Matteson1", "gender": ";M", "homepage": ";https://davidsmatteson.com", "dblp": ";", "google_scholar": ";", "orcid": ";0000-0002-2674-0387", "linkedin": ";", "or_profile": "~Binh_Tang1;~David_S._Matteson1", "aff": "Cornell University;Cornell University", "aff_domain": "cornell.edu;cornell.edu", "position": "PhD student;Associate Professor", "bibtex": "@inproceedings{\ntang2021graphbased,\ntitle={Graph-Based Continual Learning},\nauthor={Binh Tang and David S. Matteson},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=HHSEKOnPvaO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "4;4;3;4", "wc_review": "417;221;201;108", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "597;250;275;49", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 236.75, 112.4641609580581 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 292.75, 196.29362572432146 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 49, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15299318922925074538&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=HHSEKOnPvaO", "email": "cornell.edu;cornell.edu", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Cornell University", "aff_unique_dep": "", "aff_unique_url": "https://www.cornell.edu", "aff_unique_abbr": "Cornell", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Explaining the Efficacy of Counterfactually Augmented Data", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3111", "id": "HHiiQKWsOcV", "poster": "", "openreview": "https://openreview.net/forum?id=HHiiQKWsOcV", "slides": "https://iclr.cc/virtual/2021/poster/3111", "video": "https://iclr.cc/virtual/2021/poster/3111", "author_site": "Divyansh Kaushik, Amrith Setlur, Eduard H Hovy, Zachary Lipton", "tldr": "", "abstract": "In attempts to produce machine learning models less reliant on spurious patterns in NLP datasets, researchers have recently proposed curating counterfactually augmented data (CAD) via a human-in-the-loop process in which given some documents and their (initial) labels, humans must revise the text to make a counterfactual label applicable. Importantly, edits that are not necessary to flip the applicable label are prohibited. Models trained on the augmented (original and revised) data appear, empirically, to rely less on semantically irrelevant words and to generalize better out of domain. While this work draws loosely on causal thinking, the underlying causal model (even at an abstract level) and the principles underlying the observed out-of-domain improvements remain unclear. In this paper, we introduce a toy analog based on linear Gaussian models, observing interesting relationships between causal models, measurement noise, out-of-domain generalization, and reliance on spurious signals. Our analysis provides some insights that help to explain the efficacy of CAD. Moreover, we develop the hypothesis that while adding noise to causal features should degrade both in-domain and out-of-domain performance, adding noise to non-causal features should lead to relative improvements in out-of-domain performance. This idea inspires a speculative test for determining whether a feature attribution technique has identified the causal spans. If adding noise (e.g., by random word flips) to the highlighted spans degrades both in-domain and out-of-domain performance on a battery of challenge datasets, but adding noise to the complement gives improvements out-of-domain, this suggests we have identified causal spans. Thus, we present a large scale empirical study comparing spans edited to create CAD to those selected by attention and saliency maps. Across numerous challenge domains and models, we find that the hypothesized phenomenon is pronounced for CAD.", "keywords": "humans in the loop;annotation artifacts;text classification;sentiment analysis;natural language inference", "primary_area": "", "supplementary_material": "", "author": "Divyansh Kaushik;Amrith Setlur;Eduard H Hovy;Zachary Chase Lipton", "authorids": "~Divyansh_Kaushik1;~Amrith_Setlur1;~Eduard_H_Hovy1;~Zachary_Chase_Lipton1", "gender": "M;M;M;Unspecified", "homepage": "https://www.cs.cmu.edu/~dkaushik/;http://ars22.github.io;http://www.cs.cmu.edu/~hovy;http://zacklipton.com", "dblp": "212/1751;https://dblp.uni-trier.de/pers/hd/s/Setlur:Amrith;47/2454;", "google_scholar": "Sg3jtCgAAAAJ;https://scholar.google.ru/citations?user=i7V1kJgAAAAJ;https://scholar.google.com.tw/citations?user=PUFxrroAAAAJ;MN9Kfg8AAAAJ", "orcid": ";0000-0002-7061-3094;;", "linkedin": ";;;", "or_profile": "~Divyansh_Kaushik1;~Amrith_Setlur1;~Eduard_H_Hovy1;~Zachary_Chase_Lipton1", "aff": "Carnegie Mellon University;Carnegie Mellon University;;Carnegie Mellon University", "aff_domain": "cmu.edu;cmu.edu;;cmu.edu", "position": "PhD student;PhD student;;Assistant Professor", "bibtex": "@inproceedings{\nkaushik2021explaining,\ntitle={Explaining the Efficacy of Counterfactually Augmented Data},\nauthor={Divyansh Kaushik and Amrith Setlur and Eduard H Hovy and Zachary Chase Lipton},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=HHiiQKWsOcV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "3;3;4;4", "wc_review": "1294;674;1135;223", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "572;87;1095;78", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 831.5, 418.6576763896728 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 458.0, 418.5707825446014 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.7071067811865476, "gs_citation": 84, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17133444890112419337&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=HHiiQKWsOcV", "email": "cmu.edu;cmu.edu;;cmu.edu", "author_num": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "HI0j7omXTaG", "title": "Keep the Gradients Flowing: Using Gradient Flow to study Sparse Network Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Training sparse networks to converge to the same performance as dense neural architectures has proven to be elusive. Recent work suggests that initialization is the key. However, while this direction of research has had some success, focusing on initialization alone appears to be inadequate. In this paper, we take a broader view of training sparse networks and consider various choices made during training that might disadvantage sparse networks. We measure the gradient flow across different networks and datasets, and show that the default choices of optimizers, activation functions and regularizers used for dense networks can disadvantage sparse networks. Based upon these findings, we show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime. Our work suggests that initialization is only one piece of the puzzle and a wider view of tailoring optimization to sparse networks yields promising results. ", "keywords": "neural networks;sparsity;gradient flow;sparse network optimization", "primary_area": "", "supplementary_material": "", "author": "Kale-ab Tessera;Sara Hooker;Benjamin Rosman", "authorids": "~Kale-ab_Tessera1;~Sara_Hooker1;~Benjamin_Rosman1", "gender": ";M;M", "homepage": "https://www.sarahooker.me/;http://www.raillab.org;https://www.kaleabtessera.com/", "dblp": "210/2611;45/4591;284/8544", "google_scholar": "2xy6h3sAAAAJ;https://scholar.google.co.za/citations?user=pWJ0SocAAAAJ;EB5CtIYAAAAJ", "orcid": ";;", "linkedin": ";;kale-ab-tessera-013976101/", "or_profile": "~Sara_Hooker1;~Benjamin_Rosman1;~Kale-ab_Abebe_Tessera1", "aff": "Google Brain;University of the Witwatersrand;InstaDeep", "aff_domain": "google.com;wits.ac.za;instadeep.com", "position": "Research Scientist;Full Professor;Research Engineer", "bibtex": "@misc{\ntessera2021keep,\ntitle={Keep the Gradients Flowing: Using Gradient Flow to study Sparse Network Optimization},\nauthor={Kale-ab Tessera and Sara Hooker and Benjamin Rosman},\nyear={2021},\nurl={https://openreview.net/forum?id=HI0j7omXTaG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=HI0j7omXTaG", "pdf_size": 0, "rating": "3;5;5;5", "confidence": "3;3;4;4", "wc_review": "763;647;567;581", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1388;719;548;1598", "reply_reviewers": "0;0;0;0", "reply_authors": "3;2;1;3", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 639.5, 77.43868542272654 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1063.25, 440.28705125179414 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10025261322677333183&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;2", "aff_unique_norm": "Google;University of the Witwatersrand;InstaDeep", "aff_unique_dep": "Google Brain;;", "aff_unique_url": "https://brain.google.com;https://www.wits.ac.za;https://www.instadeep.com", "aff_unique_abbr": "Google Brain;Wits;InstaDeep", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;1;2", "aff_country_unique": "United States;South Africa;United Kingdom" }, { "title": "Reset-Free Lifelong Learning with Skill-Space Planning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3326", "id": "HIGSa_3kOx3", "poster": "", "openreview": "https://openreview.net/forum?id=HIGSa_3kOx3", "slides": "https://iclr.cc/virtual/2021/poster/3326", "video": "https://iclr.cc/virtual/2021/poster/3326", "author_site": "Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch", "tldr": "", "abstract": "The objective of \\textit{lifelong} reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose \\textit{Lifelong Skill Planning} (LiSP), an algorithmic framework for lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks.", "keywords": "reset-free;lifelong;reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/776cd2ce9090eae7ac082070cf8a2c536a396055.zip", "author": "Kevin Lu;Aditya Grover;Pieter Abbeel;Igor Mordatch", "authorids": "~Kevin_Lu2;~Aditya_Grover1;~Pieter_Abbeel2;~Igor_Mordatch4", "gender": ";M;M;M", "homepage": "http://kevinlu.ai/;https://aditya-grover.github.io;https://people.eecs.berkeley.edu/~pabbeel/;", "dblp": "17/8813;162/5052;;21/17", "google_scholar": "E8s73dYAAAAJ;oOhnPUgAAAAJ;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Kevin_Lu2;~Aditya_Grover1;~Pieter_Abbeel2;~Igor_Mordatch1", "aff": "University of California, Berkeley;University of California, Berkeley;Covariant;OpenAI", "aff_domain": "berkeley.edu;berkeley.edu;covariant.ai;openai.com", "position": "Undergrad student;Postdoc;Founder;Research Scientist", "bibtex": "@inproceedings{\nlu2021resetfree,\ntitle={Reset-Free Lifelong Learning with Skill-Space Planning},\nauthor={Kevin Lu and Aditya Grover and Pieter Abbeel and Igor Mordatch},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=HIGSa_3kOx3}\n}", "github": "[![github](/images/github_icon.svg) kzl/lifelong_rl](https://github.com/kzl/lifelong_rl)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;3;3;3", "wc_review": "318;580;394;982", "wc_reply_reviewers": "0;0;49;85", "wc_reply_authors": "710;852;472;1001", "reply_reviewers": "0;0;1;1", "reply_authors": "1;1;2;2", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 568.5, 257.05787286134614 ], "wc_reply_reviewers_avg": [ 33.5, 35.83643397437864 ], "wc_reply_authors_avg": [ 758.75, 194.9248252532242 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 53, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9940357312981411546&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=HIGSa_3kOx3", "email": "berkeley.edu;berkeley.edu;covariant.ai;openai.com", "author_num": 4, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "University of California, Berkeley;Covariant;OpenAI", "aff_unique_dep": ";;", "aff_unique_url": "https://www.berkeley.edu;;https://openai.com", "aff_unique_abbr": "UC Berkeley;;OpenAI", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States;" }, { "id": "HK_B2K0026", "title": "Attention Based Joint Learning for Supervised Electrocardiogram Arrhythmia Differentiation with Unsupervised Abnormal Beat Segmentation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep learning has shown great promise in arrhythmia classification in electrocar- diogram (ECG). Existing works, when classifying an ECG segment with multiple beats, do not identify the locations of the anomalies, which reduces clinical inter- pretability. On the other hand, segmenting abnormal beats by deep learning usu- ally requires annotation for a large number of regular and irregular beats, which can be laborious, sometimes even challenging, with strong inter-observer variabil- ity between experts. In this work, we propose a method capable of not only dif- ferentiating arrhythmia but also segmenting the associated abnormal beats in the ECG segment. The only annotation used in the training is the type of abnormal beats and no segmentation labels are needed. Imitating human\u2019s perception of an ECG signal, the framework consists of a segmenter and classifier. The segmenter outputs an attention map, which aims to highlight the abnormal sections in the ECG by element-wise modulation. Afterwards, the signals are sent to a classifier for arrhythmia differentiation. Though the training data is only labeled to super- vise the classifier, the segmenter and the classifier are trained in an end-to-end manner so that optimizing classification performance also adjusts how the abnor- mal beats are segmented. Validation of our method is conducted on two dataset. We observe that involving the unsupervised segmentation in fact boosts the clas- sification performance. Meanwhile, a grade study performed by experts suggests that the segmenter also achieves satisfactory quality in identifying abnormal beats, which significantly enhances the interpretability of the classification results.", "keywords": "interpretability;multitask learning;attention mechanism;electrocardiography", "primary_area": "", "supplementary_material": "", "author": "Xinrong Hu;long wen;shushui wang;Dongpo Liang;Jian Zhuang;Yiyu Shi", "authorids": "~Xinrong_Hu1;wl960201@163.com;635386607@qq.com;41421891@qq.com;~Jian_Zhuang1;~Yiyu_Shi1", "gender": "M;;;;;M", "homepage": "https://www3.nd.edu/~scl/people.html;;;;;", "dblp": "02/1690;;;;30/6224;94/5536", "google_scholar": "LgvSqWcAAAAJ;;;;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Xinrong_Hu1;wl960201@163.com;635386607@qq.com;41421891@qq.com;~Jian_Zhuang1;~Yiyu_Shi1", "aff": "University of Notre Dame;;;;Guangdong Provincal People's Hospital;University of Notre Dame", "aff_domain": "nd.edu;;;;ggh.com.cn;nd.edu", "position": "PhD student;;;;Chief physician;Full Professor", "bibtex": "@misc{\nhu2021attention,\ntitle={Attention Based Joint Learning for Supervised Electrocardiogram Arrhythmia Differentiation with Unsupervised Abnormal Beat Segmentation},\nauthor={Xinrong Hu and long wen and shushui wang and Dongpo Liang and Jian Zhuang and Yiyu Shi},\nyear={2021},\nurl={https://openreview.net/forum?id=HK_B2K0026}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=HK_B2K0026", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;4;4;4", "wc_review": "244;703;242;354", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "351;362;433;612", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 385.75, 188.68806931017127 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 439.5, 104.4473551603869 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:SVdoZBM24OkJ:scholar.google.com/&scioq=Attention+Based+Joint+Learning+for+Supervised+Electrocardiogram+Arrhythmia+Differentiation+with+Unsupervised+Abnormal+Beat+Segmentation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Notre Dame;Guangdong Provincial People's Hospital", "aff_unique_dep": ";", "aff_unique_url": "https://www.nd.edu;", "aff_unique_abbr": "Notre Dame;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;China" }, { "id": "HMEiDPTOTmY", "title": "Later Span Adaptation for Language Understanding", "track": "main", "status": "Reject", "tldr": "", "abstract": "Pre-trained contextualized language models (PrLMs) broadly use fine-grained tokens (words or sub-words) as minimal linguistic unit in pre-training phase. Introducing span-level information in pre-training has shown capable of further enhancing PrLMs. However, such methods require enormous resources and are lack of adaptivity due to huge computational requirement from pre-training. Instead of too early fixing the linguistic unit input as nearly all previous work did, we propose a novel method that combines span-level information into the representations generated by PrLMs during fine-tuning phase for better flexibility. In this way, the modeling procedure of span-level texts can be more adaptive to different downstream tasks. In detail, we divide the sentence into several spans according to the segmentation generated by a pre-sampled dictionary. Based on the sub-token-level representation provided by PrLMs, we enhance the connection between the tokens in each span and gain a representation with enhanced span-level information. Experiments are conducted on GLUE benchmark and prove that our approach could remarkably enhance the performance of PrLMs in various natural language understanding tasks.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Rongzhou Bao;Zhuosheng Zhang;hai zhao", "authorids": "~Rongzhou_Bao1;~Zhuosheng_Zhang1;~hai_zhao1", "gender": "M;M;M", "homepage": "http://bcmi.sjtu.edu.cn/home/baorongzhou;https://bcmi.sjtu.edu.cn/~zhangzs/;http://bcmi.sjtu.edu.cn/~zhaohai/", "dblp": ";06/9708;25/1145-1.html", "google_scholar": ";https://scholar.google.co.jp/citations?user=63LTQhgAAAAJ;https://scholar.google.com.tw/citations?user=4dU5KS0AAAAJ", "orcid": ";0000-0002-4183-3645;", "linkedin": ";;", "or_profile": "~Rongzhou_Bao1;~Zhuosheng_Zhang1;~hai_zhao1", "aff": "Shanghai Jiaotong University;Shanghai Jiaotong University;Shanghai Jiaotong University", "aff_domain": "sjtu.edu.cn;sjtu.edu.cn;sjtu.edu.cn", "position": "MS student;PhD student;Full Professor", "bibtex": "@misc{\nbao2021later,\ntitle={Later Span Adaptation for Language Understanding},\nauthor={Rongzhou Bao and Zhuosheng Zhang and hai zhao},\nyear={2021},\nurl={https://openreview.net/forum?id=HMEiDPTOTmY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=HMEiDPTOTmY", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "4;4;3;4", "wc_review": "227;300;136;294", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "361;187;190;309", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 239.25, 66.14142045647341 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 261.75, 75.52938170010397 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:R1e9gmRXXzgJ:scholar.google.com/&scioq=Later+Span+Adaptation+for+Language+Understanding&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Shanghai Jiao Tong University", "aff_unique_dep": "", "aff_unique_url": "https://www.sjtu.edu.cn", "aff_unique_abbr": "SJTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "HMqNjkBEqP4", "title": "Bayesian Meta-Learning for Few-Shot 3D Shape Completion", "track": "main", "status": "Reject", "tldr": "", "abstract": "Estimating the 3D shape of real-world objects is a key perceptual challenge. It requires going from partial observations, which are often too sparse and incomprehensible for the human eye, to detailed shape representations that vary significantly across categories and instances. We propose to cast shape completion as a Bayesian meta-learning problem to facilitate the transfer of knowledge learned from observing one object into estimating the shape of another object. To combine the Bayesian framework with an approach that uses implicit 3D object representation, we introduce an encoder that describes the posterior distribution of a latent representation conditioned on sparse point clouds. With its ability to isolate object-specific properties from object-agnostic properties, our meta-learning algorithm enables accurate shape completion of newly-encountered objects from sparse observations. We demonstrate the efficacy of our proposed method with experimental results on the standard ShapeNet and ICL-NUIM benchmarks. ", "keywords": "shape completion;Meta-learning;Few-shot;3D reconstruction", "primary_area": "", "supplementary_material": "/attachment/5a4ca33104aa066238ef9cd6c260c12b9e58ef49.zip", "author": "Masanori Koyama;Toshiki Nakanishi;Shin-ichi Maeda;Vitor Campagnolo Guizilini;Adrien Gaidon", "authorids": "~Masanori_Koyama1;~Toshiki_Nakanishi1;~Shin-ichi_Maeda2;~Vitor_Campagnolo_Guizilini2;~Adrien_Gaidon1", "gender": ";;M;M;", "homepage": ";https://www.preferred.jp/ja/;https://maeyon.github.io/publication/index.html;;https://adriengaidon.com/", "dblp": "151/6113;249/8077;90/4637;;06/7548.html", "google_scholar": ";;https://scholar.google.ca/citations?user=Fv-ifUQAAAAJ;UH9tP6QAAAAJ;https://scholar.google.fr/citations?user=2StUgf4AAAAJ", "orcid": ";;0000-0002-3254-9722;;", "linkedin": ";;;vitorguizilini/;adrien-gaidon-63ab2358/", "or_profile": "~Masanori_Koyama1;~Toshiki_Nakanishi1;~Shin-ichi_Maeda2;~Vitor_Campagnolo_Guizilini2;~Adrien_Gaidon1", "aff": "Preferred Networks, Inc.;Preferred Networks, Inc.;Preferred Networks, Inc.;Toyota Research Institute;Toyota Research Institute (TRI)", "aff_domain": "preferred.jp;preferred.jp;preferred.jp;tri.global;tri.global", "position": "Researcher;Research Engineer;Senior Researcher;Staff Research Scientist;Head of ML", "bibtex": "@misc{\nkoyama2021bayesian,\ntitle={Bayesian Meta-Learning for Few-Shot 3D Shape Completion },\nauthor={Masanori Koyama and Toshiki Nakanishi and Shin-ichi Maeda and Vitor Campagnolo Guizilini and Adrien Gaidon},\nyear={2021},\nurl={https://openreview.net/forum?id=HMqNjkBEqP4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=HMqNjkBEqP4", "pdf_size": 0, "rating": "4;5;7", "confidence": "4;4;4", "wc_review": "861;631;456", "wc_reply_reviewers": "92;0;0", "wc_reply_authors": "346;223;143", "reply_reviewers": "1;0;0", "reply_authors": "3;1;1", "rating_avg": [ 5.333333333333333, 1.247219128924647 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 649.3333333333334, 165.84798930211832 ], "wc_reply_reviewers_avg": [ 30.666666666666668, 43.36921591277491 ], "wc_reply_authors_avg": [ 237.33333333333334, 83.49184923625113 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:eXhB_IxkUiMJ:scholar.google.com/&scioq=Bayesian+Meta-Learning+for+Few-Shot+3D+Shape+Completion&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;1;1", "aff_unique_norm": "Preferred Networks, Inc.;Toyota Research Institute", "aff_unique_dep": ";", "aff_unique_url": "https://www.preferred-networks.com;https://www.tri.global", "aff_unique_abbr": "PFN;TRI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1;1", "aff_country_unique": "Japan;United States" }, { "id": "HN77M0Sdnp2", "title": "Smooth Adversarial Training", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "It is commonly believed that networks cannot be both accurate and robust, that gaining robustness means losing accuracy. It is also generally believed that, unless making networks larger, network architectural elements would otherwise matter little in improving adversarial robustness. Here we present evidence to challenge these common beliefs by a careful study about adversarial training. Our key observation is that the widely-used ReLU activation function significantly weakens adversarial training due to its non-smooth nature. Hence we propose smooth adversarial training (SAT), in which we replace ReLU with its smooth approximations to strengthen adversarial training. The purpose of smooth activation functions in SAT is to allow it to find harder adversarial examples and compute better gradient updates during adversarial training. Compared to standard adversarial training, SAT improves adversarial robustness for \"free\", i.e., no drop in accuracy and no increase in computational cost. For example, without introducing additional computations, SAT significantly enhances ResNet-50's robustness from 33.0% to 42.3%, while also improving accuracy by 0.9% on ImageNet. SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82.2% accuracy and 58.6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9.5% for accuracy and 11.6% for robustness.", "keywords": "adversarial defense;adversarial machine learning;activation function;adversarial training;neural network architecture", "primary_area": "", "supplementary_material": "/attachment/3f41e7b37d7b2a02745a602f0017d6e490b72d5d.zip", "author": "cihang xie;Mingxing Tan;Boqing Gong;Alan Yuille;Quoc V Le", "authorids": "~cihang_xie1;~Mingxing_Tan3;~Boqing_Gong1;~Alan_Yuille1;~Quoc_V_Le1", "gender": "M;M;M;M;M", "homepage": "https://cihangxie.github.io/;;http://boqinggong.info;;", "dblp": "175/3366;11/7863;29/7457;y/AlanLYuille;29/6166", "google_scholar": "X3vVZPcAAAAJ;6POeyBoAAAAJ;lv9ZeVUAAAAJ;;", "orcid": ";;;;", "linkedin": ";mingxing-tan-2724551b/;boqing-gong-46aa5821/;;", "or_profile": "~cihang_xie1;~Mingxing_Tan3;~Boqing_Gong1;~Alan_Yuille1;~Quoc_V_Le1", "aff": "University of California, Santa Cruz;Google/Waymo;Google;Johns Hopkins University;Google", "aff_domain": "ucsc.edu;google.com;google.com;johnshopkins.edu;google.com", "position": "Assistant Professor;Researcher;Research Scientist;Full Professor;Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=HN77M0Sdnp2", "pdf_size": 0, "rating": "4;4;6;7", "confidence": "5;4;4;4", "wc_review": "705;648;465;568", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 596.5, 90.17898868361743 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5555555555555555, "gs_citation": 197, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13440643532270481246&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;1;2;1", "aff_unique_norm": "University of California, Santa Cruz;Google;Johns Hopkins University", "aff_unique_dep": ";Waymo;", "aff_unique_url": "https://www.ucsc.edu;https://www.google.com;https://www.jhu.edu", "aff_unique_abbr": "UCSC;Google;JHU", "aff_campus_unique_index": "0;2;2", "aff_campus_unique": "Santa Cruz;;Mountain View", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "HNA0kUAFdbv", "title": "CANVASEMB: Learning Layout Representation with Large-scale Pre-training for Graphic Design", "track": "main", "status": "Reject", "tldr": "", "abstract": "Layout representation, which models visual elements in a canvas and their inter-relations, plays a crucial role in graphic design intelligence.\nWith a large variety of layout designs and the unique characteristic of layouts that visual elements are defined as a list of categorical (e.g. shape type) and numerical (e.g. position and size) properties, it is challenging to learn a general and compact representation with limited data. Inspired by the recent success of self-supervised pre-training techniques in various natural language processing tasks, in this paper, we propose CanvasEmb (Canvas Embedding), which pre-trains deep representation from unlabeled graphic designs by jointly conditioning on all the context elements in the same canvas, with a multi-dimensional feature encoder and a multi-task learning objective. The pre-trained CanvasEmb model can be fine-tuned with just one additional output layer and with a small size of training data to create models for a wide range of downstream tasks. We verify our approach with presentation slides data. We construct a large-scale dataset with more than one million slides, and propose two novel layout understanding tasks with human labeling sets, namely element role labeling and image captioning. Evaluation results on these two tasks show that our model with fine-tuning achieves state-of-the-art performances. Furthermore, we conduct a deep analysis aiming to understand the modeling mechanism of CanvasEmb, and demonstrate its great potential use on more applications such as layout auto completion and layout retrieval.", "keywords": "Layout Representation;Pre-training", "primary_area": "", "supplementary_material": "/attachment/00438d5f5691655464acd47ad1f4d0c9fbdf05c0.zip", "author": "Yuxi Xie;Danqing Huang;Jinpeng Wang;Chin-Yew Lin", "authorids": "~Yuxi_Xie1;~Danqing_Huang1;wjp.pku@gmail.com;~Chin-Yew_Lin1", "gender": "F;F;;M", "homepage": "https://yuxixie.github.io/;;;https://www.microsoft.com/en-us/research/people/cyl/", "dblp": ";56/10136;;64/6843", "google_scholar": "LNLECx0AAAAJ;P55WbwYAAAAJ;;cDF07aYAAAAJ", "orcid": ";;;", "linkedin": "yuxi-xie-494265181;;;chin-yew-lin-32585a4", "or_profile": "~Yuxi_Xie1;~Danqing_Huang1;wjp.pku@gmail.com;~Chin-Yew_Lin1", "aff": "Microsoft Research Asia;Microsoft Research;;Microsoft", "aff_domain": "microsoft.com;research.microsoft.com;;microsoft.com", "position": "Intern;Researcher at Microsoft;;Senior Principal Research Manager", "bibtex": "@misc{\nxie2021canvasemb,\ntitle={{\\{}CANVASEMB{\\}}: Learning Layout Representation with Large-scale Pre-training for Graphic Design},\nauthor={Yuxi Xie and Danqing Huang and Jinpeng Wang and Chin-Yew Lin},\nyear={2021},\nurl={https://openreview.net/forum?id=HNA0kUAFdbv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=HNA0kUAFdbv", "pdf_size": 0, "rating": "4;5;5", "confidence": "3;3;4", "wc_review": "603;1155;351", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "204;310;166", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 703.0, 335.761820342933 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 226.66666666666666, 60.933479212079206 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6542980410165221667&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;0;0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Research", "aff_unique_url": "https://www.microsoft.com/en-us/research/group/asia", "aff_unique_abbr": "MSR Asia", "aff_campus_unique_index": "0", "aff_campus_unique": "Asia;", "aff_country_unique_index": "0;1;1", "aff_country_unique": "China;United States" }, { "id": "HNytlGv1VjG", "title": "What are effective labels for augmented data? Improving robustness with AutoLabel", "track": "main", "status": "Reject", "tldr": "", "abstract": "A wide breadth of research has devised data augmentation approaches that can improve both accuracy and generalization performance for neural networks. However, augmented data can end up being far from the clean data and what is the appropriate label is less clear. Despite this, most existing work simply reuses the original label from the clean data, and the choice of label accompanying the augmented data is relatively less explored. In this paper, we propose AutoLabel to automatically learn the labels for augmented data, based on the distance between the clean distribution and augmented distribution. AutoLabel is built on label smoothing and is guided by the calibration-performance over a hold-out validation set. We show that AutoLabel is a generic framework that can be easily applied to existing data augmentation methods, including AugMix, mixup, and adversarial training. Experiments on CIFAR-10, CIFAR-100 and ImageNet show that AutoLabel can improve models' accuracy and calibration performance, especially under distributional shift. Additionally, we demonstrate that AutoLabel can help adversarial training by bridging the gap between clean accuracy and adversarial robustness.", "keywords": "data augmentation;image classification;calibration;distributional shifts;adversarial robustness", "primary_area": "", "supplementary_material": "/attachment/21529e017e6a4c1accbb1b4dfb19e5cafc7220c0.zip", "author": "Yao Qin;Xuezhi Wang;Balaji Lakshminarayanan;Ed Chi;Alex Beutel", "authorids": "~Yao_Qin1;~Xuezhi_Wang3;~Balaji_Lakshminarayanan1;~Ed_Chi1;~Alex_Beutel1", "gender": ";;M;M;", "homepage": "https://yaoqin1.github.io;https://research.google/people/105995/;http://www.gatsby.ucl.ac.uk/~balaji/;http://edchi.net;", "dblp": "66/10420-1;70/4090-2;71/8324;13/310;", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;ScLUQ-YAAAAJ;QYn8RbgAAAAJ;VuWl-KUAAAAJ;", "orcid": ";;;0000-0003-3230-5338;", "linkedin": ";;;edchi/;", "or_profile": "~Yao_Qin1;~Xuezhi_Wang3;~Balaji_Lakshminarayanan1;~Ed_Chi1;~Alex_Beutel1", "aff": "Google;Google DeepMind;Google Brain;Google;", "aff_domain": "google.com;google.com;google.com;google.com;", "position": "Researcher;Research Scientist;Research Scientist;Researcher;", "bibtex": "@misc{\nqin2021what,\ntitle={What are effective labels for augmented data? Improving robustness with AutoLabel},\nauthor={Yao Qin and Xuezhi Wang and Balaji Lakshminarayanan and Ed Chi and Alex Beutel},\nyear={2021},\nurl={https://openreview.net/forum?id=HNytlGv1VjG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=HNytlGv1VjG", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;3;4;4", "wc_review": "539;308;547;421", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "83;453;113;0", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 453.75, 97.8247795806359 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 162.25, 172.89212677273653 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.75, 0.4330127018922193 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ugp43_kQVdgJ:scholar.google.com/&scioq=What+are+effective+labels+for+augmented+data%3F+Improving+robustness+with+AutoLabel&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "HO80-Z4l0M", "title": "Alpha Net: Adaptation with Composition in Classifier Space", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep learning classification models typically train poorly on classes with small numbers of examples. Motivated by the human ability to solve this task, models have been developed that transfer knowledge from classes with many examples to learn classes with few examples. Critically, the majority of these models transfer knowledge within model feature space. In this work, we demonstrate that transferring knowledge within classifier space is more effective and efficient. Specifically, by linearly combining strong nearest neighbor classifiers along with a weak classifier, we are able to compose a stronger classifier. Uniquely, our model can be implemented on top of any existing classification model that includes a classifier layer. We showcase the success of our approach in the task of long-tailed recognition, whereby the classes with few examples, otherwise known as the tail classes, suffer the most in performance and are the most challenging classes to learn. Using classifier-level knowledge transfer, we are able to drastically improve - by a margin as high as 10.5% - the state-of-the-art performance on the tail categories.", "keywords": "long-tail recognition;classifier composition", "primary_area": "", "supplementary_material": "", "author": "Nadine Chang;Jayanth Koushik;Michael Tarr;Martial Hebert;Yu-Xiong Wang", "authorids": "~Nadine_Chang1;~Jayanth_Koushik1;~Michael_Tarr1;~Martial_Hebert1;~Yu-Xiong_Wang1", "gender": "F;M;M;M;", "homepage": "https://nadinechang.com/;https://jayanthkoushik.github.io;https://tarrlab.org;http://www.cs.cmu.edu/~hebert/;https://yxw.cs.illinois.edu/", "dblp": "227/2758;https://dblp.uni-trier.de/pers/hd/k/Koushik:Jayanth.html;36/1880;h/MartialHebert;35/10700", "google_scholar": "https://scholar.google.com/citations?hl=en;XTqgW-EAAAAJ;O8ALPlkAAAAJ;https://scholar.google.com.tw/citations?user=0ytii2EAAAAJ;T_Q-xDkAAAAJ", "orcid": "0000-0003-4765-8478;;0000-0003-4724-1744;;", "linkedin": ";jayanth-koushik-18213b63;michael-tarr-ab078046/;;", "or_profile": "~Nadine_Chang1;~Jayanth_Koushik1;~Michael_Tarr1;~Martial_Hebert1;~Yu-Xiong_Wang1", "aff": "School of Computer Science, Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Department of Computer Science, University of Illinois Urbana-Champaign", "aff_domain": "cs.cmu.edu;cmu.edu;cmu.edu;cmu.edu;cs.illinois.edu", "position": "PhD student;PhD student;Full Professor;Professor;Assistant Professor", "bibtex": "@misc{\nchang2021alpha,\ntitle={Alpha Net: Adaptation with Composition in Classifier Space},\nauthor={Nadine Chang and Jayanth Koushik and Michael Tarr and Martial Hebert and Yu-Xiong Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=HO80-Z4l0M}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer5;AnonReviewer3", "site": "https://openreview.net/forum?id=HO80-Z4l0M", "pdf_size": 0, "rating": "3;4;4;8", "confidence": "5;5;4;4", "wc_review": "459;903;658;213", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "534;826;615;81", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.920286436967152 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 558.25, 253.8950324445124 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 514.0, 271.7692035533092 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.6509445549041193, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7362237371708766450&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;0;0;0;1", "aff_unique_norm": "Carnegie Mellon University;University of Illinois Urbana-Champaign", "aff_unique_dep": "School of Computer Science;Department of Computer Science", "aff_unique_url": "https://www.cmu.edu;https://illinois.edu", "aff_unique_abbr": "CMU;UIUC", "aff_campus_unique_index": "0;2", "aff_campus_unique": "Pittsburgh;;Urbana-Champaign", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2858", "id": "HOFxeCutxZR", "poster": "", "openreview": "https://openreview.net/forum?id=HOFxeCutxZR", "slides": "https://iclr.cc/virtual/2021/poster/2858", "video": "https://iclr.cc/virtual/2021/poster/2858", "author_site": "Peiye Zhuang, Sanmi Koyejo, Alex Schwing", "tldr": "", "abstract": "Controllable semantic image editing enables a user to change entire image attributes with a few clicks, e.g., gradually making a summer scene look like it was taken in winter. Classic approaches for this task use a Generative Adversarial Net (GAN) to learn a latent space and suitable latent-space transformations. However, current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism. To address these concerns, we learn multiple attribute transformations simultaneously, integrate attribute regression into the training of transformation functions, and apply a content loss and an adversarial loss that encourages the maintenance of image identity and photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work, which primarily focuses on qualitative evaluation. Our model permits better control for both single- and multiple-attribute editing while preserving image identity and realism during transformation. We provide empirical results for both natural and synthetic images, highlighting that our model achieves state-of-the-art performance for targeted image manipulation. ", "keywords": "Image manipulation;GANs;latent space of GANs", "primary_area": "", "supplementary_material": "/attachment/6971bd2313573b601b46062551f6ebe3c838a4ca.zip", "author": "Peiye Zhuang;Oluwasanmi O Koyejo;Alex Schwing", "authorids": "~Peiye_Zhuang2;~Oluwasanmi_O_Koyejo1;~Alex_Schwing1", "gender": "F;M;Unspecified", "homepage": "https://payeah.net;https://cs.stanford.edu/~sanmi/;https://ece.illinois.edu/directory/profile/aschwing", "dblp": "244/7937;14/8885;79/9775", "google_scholar": "gsPILWoAAAAJ;EaaOeJwAAAAJ;3B2c31wAAAAJ", "orcid": ";0000-0002-4023-419X;", "linkedin": ";sanmi-koyejo-984754/;", "or_profile": "~Peiye_Zhuang2;~Oluwasanmi_O_Koyejo1;~Alex_Schwing1", "aff": "University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign", "aff_domain": "illinois.edu;illinois.edu;illinois.edu", "position": "PhD student;Assistant Professor;Assistant Professor", "bibtex": "@inproceedings{\nzhuang2021enjoy,\ntitle={Enjoy Your Editing: Controllable {\\{}GAN{\\}}s for Image Editing via Latent Space Navigation},\nauthor={Peiye Zhuang and Oluwasanmi O Koyejo and Alex Schwing},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=HOFxeCutxZR}\n}", "github": "[![github](/images/github_icon.svg) KelestZ/Latent2im](https://github.com/KelestZ/Latent2im) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=HOFxeCutxZR)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "3;4;4;4", "wc_review": "250;334;107;265", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "421;782;365;329", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 239.0, 82.5318120484459 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 474.25, 180.67840905874723 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 92, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15259069119096220128&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=HOFxeCutxZR", "email": "illinois.edu;illinois.edu;illinois.edu", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Illinois Urbana-Champaign", "aff_unique_dep": "", "aff_unique_url": "https://illinois.edu", "aff_unique_abbr": "UIUC", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Urbana-Champaign", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "HP-tcf48fT", "title": "Learning to Search for Fast Maximum Common Subgraph Detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "Detecting the Maximum Common Subgraph (MCS) between two input graphs is fundamental for applications in biomedical analysis, malware detection, cloud computing, etc. This is especially important in the task of drug design, where the successful extraction of common substructures in compounds can reduce the number of experiments needed to be conducted by humans. However, MCS computation is NP-hard, and state-of-the-art MCS solvers rely on heuristics in search which in practice cannot find good solution for large graph pairs under a limited search budget. Here we propose GLSearch, a Graph Neural Network based model for MCS detection, which learns to search. Our model uses a state-of-the-art branch and bound algorithm as the backbone search algorithm to extract subgraphs by selecting one node pair at a time. In order to make better node selection decision at each step, we replace the node selection heuristics with a novel task-specific Deep Q-Network (DQN), allowing the search process to find larger common subgraphs faster. To enhance the training of DQN, we leverage the search process to provide supervision in a pre-training stage and guide our agent during an imitation learning stage. Therefore, our framework allows search and reinforcement learning to mutually benefit each other. Experiments on synthetic and real-world large graph pairs demonstrate that our model outperforms state-of-the-art MCS solvers and neural graph matching network models.", "keywords": "graph matching;maximum common subgraph;graph neural network;reinforcement learning;search", "primary_area": "", "supplementary_material": "", "author": "Yunsheng Bai;Derek Qiang Xu;Yizhou Sun;Wei Wang", "authorids": "~Yunsheng_Bai1;~Derek_Qiang_Xu2;~Yizhou_Sun1;~Wei_Wang13", "gender": "M;M;F;F", "homepage": "https://yunshengb.com/;https://derekqxu.github.io;http://web.cs.ucla.edu/~yzsun/;http://www.cs.ucla.edu/~weiwang", "dblp": "225/5377.html;155/0712;37/3868;w/WeiWang.html", "google_scholar": ";07nfvIgAAAAJ;https://scholar.google.com.tw/citations?user=TQgOjK0AAAAJ;UedS9LQAAAAJ", "orcid": ";0009-0008-2992-9768;;0000-0002-8180-2886", "linkedin": ";derekqxu/;;wei-wang-8800845/", "or_profile": "~Yunsheng_Bai1;~Derek_Qiang_Xu2;~Yizhou_Sun1;~Wei_Wang13", "aff": "University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles", "aff_domain": "cs.ucla.edu;ucla.edu;ucla.edu;ucla.edu", "position": "PhD student;PhD student;Associate Professor;Full Professor", "bibtex": "@misc{\nbai2021learning,\ntitle={Learning to Search for Fast Maximum Common Subgraph Detection},\nauthor={Yunsheng Bai and Derek Qiang Xu and Yizhou Sun and Wei Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=HP-tcf48fT}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=HP-tcf48fT", "pdf_size": 0, "rating": "5;5;7", "confidence": "3;4;4", "wc_review": "503;296;270", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "2050;805;419", "reply_reviewers": "0;0;0", "reply_authors": "6;3;2", "rating_avg": [ 5.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 356.3333333333333, 104.25076605100905 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1091.3333333333333, 695.9551390395472 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 3.6666666666666665, 1.699673171197595 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.49999999999999983, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:kXYfPpcgZrcJ:scholar.google.com/&scioq=Learning+to+Search+for+Fast+Maximum+Common+Subgraph+Detection&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "University of California, Los Angeles", "aff_unique_dep": "", "aff_unique_url": "https://www.ucla.edu", "aff_unique_abbr": "UCLA", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "HPGtPvFNROh", "title": "DROPS: Deep Retrieval of Physiological Signals via Attribute-specific Clinical Prototypes", "track": "main", "status": "Reject", "tldr": "", "abstract": "The ongoing digitization of health records within the healthcare industry results in large-scale datasets. Manually extracting clinically-useful insight from such datasets is non-trivial. However, doing so at scale while simultaneously leveraging patient-specific attributes such as sex and age can assist with clinical-trial enrollment, medical school educational endeavours, and the evaluation of the fairness of neural networks. To facilitate the reliable extraction of clinical information, we propose to learn embeddings, known as clinical prototypes (CPs), via supervised contrastive learning. We show that CPs can be efficiently used for large-scale retrieval and clustering of physiological signals based on multiple patient attributes. We also show that CPs capture attribute-specific semantic relationships.", "keywords": "Contrastive learning;information retrieval;clustering;physiological signals;healthcare", "primary_area": "", "supplementary_material": "/attachment/dd01a671e170ac0bc391a56483a84dd0100d747c.zip", "author": "Dani Kiyasseh;Tingting Zhu;David A. Clifton", "authorids": "~Dani_Kiyasseh1;tingting.zhu@eng.ox.ac.uk;~David_A._Clifton1", "gender": ";;M", "homepage": "https://danikiyasseh.github.io/;;http://www.eng.ox.ac.uk/chi", "dblp": ";;89/6424", "google_scholar": "UD1oO4MAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Dani_Kiyasseh1;tingting.zhu@eng.ox.ac.uk;~David_A._Clifton1", "aff": "University of Oxford;;University of Oxford", "aff_domain": "oxford.ac.uk;;ox.ac.uk", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nkiyasseh2021drops,\ntitle={{\\{}DROPS{\\}}: Deep Retrieval of Physiological Signals via Attribute-specific Clinical Prototypes},\nauthor={Dani Kiyasseh and Tingting Zhu and David A. Clifton},\nyear={2021},\nurl={https://openreview.net/forum?id=HPGtPvFNROh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer5;AnonReviewer6", "site": "https://openreview.net/forum?id=HPGtPvFNROh", "pdf_size": 0, "rating": "2;4;4", "confidence": "4;4;3", "wc_review": "455;1402;344", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "327;1379;860", "reply_reviewers": "0;0;0", "reply_authors": "1;2;2", "rating_avg": [ 3.3333333333333335, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 733.6666666666666, 474.7506948096256 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 855.3333333333334, 429.4898782923243 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:bn3iTwaxTk4J:scholar.google.com/&scioq=DROPS:+Deep+Retrieval+of+Physiological+Signals+via+Attribute-specific+Clinical+Prototypes&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "HQoCa9WODc0", "title": "Suppressing Outlier Reconstruction in Autoencoders for Out-of-Distribution Detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "While only trained to reconstruct training data, autoencoders may produce high-quality reconstructions of inputs that are well outside the training data distribution. This phenomenon, which we refer to as outlier reconstruction, has a detrimental effect on the use of autoencoders for outlier detection, as an autoencoder will misclassify a clear outlier as being in-distribution. In this paper, we introduce the Energy-Based Autoencoder (EBAE), an autoencoder that is considerably less susceptible to outlier reconstruction. \nThe core idea of EBAE is to treat the reconstruction error as an energy function of a normalized density and to strictly enforce the normalization constraint. We show that the reconstruction of non-training inputs can be suppressed, and the reconstruction error made highly discriminative to outliers, by enforcing this constraint. We empirically show that EBAE significantly outperforms both existing autoencoders and other generative models for several out-of-distribution detection tasks.", "keywords": "autoencoder;outlier detection;novelty detection;energy-based model", "primary_area": "", "supplementary_material": "/attachment/8210ef766c7f8164e2c6680e7b8c91945d91b610.zip", "author": "Sangwoong Yoon;Yung-Kyun Noh;Frank C. Park", "authorids": "~Sangwoong_Yoon1;~Yung-Kyun_Noh1;~Frank_C._Park1", "gender": "M;M;M", "homepage": "https://swyoon.github.io/;http://aais.hanyang.ac.kr;http://robotics.snu.ac.kr", "dblp": "237/1318;54/6443;p/FrankChongwooPark", "google_scholar": "https://scholar.google.co.kr/citations?user=cH2rjfIAAAAJ;https://scholar.google.com/citations?hl=en;u-h3PJIAAAAJ", "orcid": "0000-0002-7251-3230;;0000-0002-0293-6975", "linkedin": ";;", "or_profile": "~Sangwoong_Yoon1;~Yung-Kyun_Noh1;~Frank_C._Park1", "aff": "Seoul National University;Korea Institute for Advanced Study;Seoul National University", "aff_domain": "snu.ac.kr;kias.re.kr;snu.ac.kr", "position": "PhD student;Affiliate Professor;Full Professor", "bibtex": "@misc{\nyoon2021suppressing,\ntitle={Suppressing Outlier Reconstruction in Autoencoders for Out-of-Distribution Detection},\nauthor={Sangwoong Yoon and Yung-Kyun Noh and Frank C. Park},\nyear={2021},\nurl={https://openreview.net/forum?id=HQoCa9WODc0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=HQoCa9WODc0", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "5;4;4;4", "wc_review": "360;213;483;1036", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 523.0, 311.2225891544507 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:R3X8TUtIHrUJ:scholar.google.com/&scioq=Suppressing+Outlier+Reconstruction+in+Autoencoders+for+Out-of-Distribution+Detection&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Seoul National University;Korea Institute for Advanced Study", "aff_unique_dep": ";", "aff_unique_url": "https://www.snu.ac.kr;http://www.kaist.edu", "aff_unique_abbr": "SNU;KIAS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "HUd2wQ0j200", "title": "TransNAS-Bench-101: Improving Transferrability and Generalizability of Cross-Task Neural Architecture Search", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Recent breakthroughs of Neural Architecture Search (NAS) are extending the field's research scope towards a broader range of vision tasks and more diversified search spaces. While existing NAS methods mostly design architectures on one single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks. Many of them leverage transfer learning and seek to preserve, reuse, and refine network design knowledge to achieve higher efficiency in future tasks. However, the huge computational cost and experiment complexity of cross-task NAS are imposing barriers for valuable research in this direction. Existing transferrable NAS algorithms are also based on different settings, e.g., datasets and search spaces, which raises concerns on performance comparability. Although existing NAS benchmarks provided some solutions, they all focus on one single type of vision task, i.e., classification. In this work, we propose TransNAS-Bench-101, a benchmark containing network performance across 7 tasks, covering classification, regression, pixel-level prediction, and self-supervised tasks. This diversity provides opportunities to transfer NAS methods among the tasks, and allows for more complex transfer schemes to evolve. We explore two fundamentally different types of search spaces: cell-level search space and macro-level search space. With 7,352 backbones evaluated on 7 tasks, 51,464 trained models with detailed training information are provided. Generating this benchmark takes about 193,760 GPU hours, which is equivalent to 22.12 years of computation on a single Nvidia V100 GPU. Analysis of 4 benchmark transfer schemes highlights that: (1) Direct deployment of both architectures and policies can easily lead to negative transfer unless guided by carefully designed mechanisms. (2) Evolutionary methods' role in transferrable NAS might be overlooked in the past. (3) It is a valid challenge for NAS algorithms to perform well across tasks and search spaces consistently. We also provide our suggestions for future research along with the analysis. With TransNAS-Bench-101, we hope to encourage the advent of exceptional NAS algorithms that raise cross-task search efficiency and generalizability to the next level.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Yawen Duan;Xin Chen;Hang Xu;Zewei Chen;Xiaodan Liang;Tong Zhang;Zhenguo Li", "authorids": "~Yawen_Duan1;~Xin_Chen15;~Hang_Xu1;~Zewei_Chen1;~Xiaodan_Liang2;~Tong_Zhang2;~Zhenguo_Li1", "gender": "M;F;M;;F;M;M", "homepage": ";https://www.xccyn.com;;;https://www.sysu-hcp.net/;http://tongzhang-ml.org;http://www.ee.columbia.edu/~zgli/", "dblp": ";;;;;07/4227-1;23/6479", "google_scholar": "IJQlPvYAAAAJ;KVMYX5QAAAAJ;https://scholar.google.com.hk/citations?user=J_8TX6sAAAAJ;https://scholar.google.com.hk/citations?hl=en;voxznZAAAAAJ;LurWtuYAAAAJ;XboZC1AAAAAJ", "orcid": ";;0000-0003-3645-8972;;;0000-0002-5511-2558;", "linkedin": "yawen-duan/;;;;;;", "or_profile": "~Yawen_Duan1;~Xin_Chen15;~Hang_Xu1;~Zewei_Chen1;~Xiaodan_Liang2;~Tong_Zhang2;~Zhenguo_Li1", "aff": "The University of Hong Kong;;Huawei Noah\u2018s Ark Lab;;SUN YAT-SEN UNIVERSITY;Hong Kong University of Science and Technology;Huawei Noah's Ark Lab", "aff_domain": "connect.hku.hk;;huawei.com;;sysu.edu.cn;ust.hk;huawei.com", "position": "Undergrad student;;Researcher;;Associate Professor;Full Professor;Principal Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=HUd2wQ0j200", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "4;5;3;5", "wc_review": "554;584;416;547", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 525.25, 64.58860193563567 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 84, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3825333699437094048&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;2;3;1", "aff_unique_norm": "University of Hong Kong;Huawei;Sun Yat-sen University;Hong Kong University of Science and Technology", "aff_unique_dep": ";Noah's Ark Lab;;", "aff_unique_url": "https://www.hku.hk;https://www.huawei.com;http://www.sysu.edu.cn;https://www.ust.hk", "aff_unique_abbr": "HKU;Huawei;SYSU;HKUST", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hong Kong SAR;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "China" }, { "id": "HW4aTJHx0X", "title": "What's new? Summarizing Contributions in Scientific Literature", "track": "main", "status": "Reject", "tldr": "", "abstract": "With thousands of academic articles shared on a daily basis, it has become increasingly difficult to keep up with the latest scientific findings. To overcome this problem, we introduce a new task of $\\textit{disentangled paper summarization}$, which seeks to generate separate summaries for the paper contributions and the context of the work, making it easier to identify the key findings shared in articles. For this purpose, we extend the S2ORC corpus of academic articles, which spans a diverse set of domains ranging from economics to psychology, by adding disentangled \"contribution\" and \"context\" reference labels. Together with the dataset, we introduce and analyze three baseline approaches: 1) a unified model controlled by input code prefixes, 2) a model with separate generation heads specialized in generating the disentangled outputs, and 3) a training strategy that guides the model using additional supervision coming from inbound and outbound citations. We also propose a comprehensive automatic evaluation protocol which reports the $\\textit{relevance}$, $\\textit{novelty}$, and $\\textit{disentanglement}$ of generated outputs. Through a human study involving expert annotators, we show that in 79%, of cases our new task is considered more helpful than traditional scientific paper summarization.\n", "keywords": "abstractive summarization;scientific papers", "primary_area": "", "supplementary_material": "/attachment/782b3f9b79910ca5e8a63c93394b67c7b9ee6555.zip", "author": "Hiroaki Hayashi;Wojciech Maciej Kryscinski;Bryan McCann;Nazneen Rajani;Caiming Xiong", "authorids": "~Hiroaki_Hayashi1;~Wojciech_Maciej_Kryscinski1;~Bryan_McCann1;~Nazneen_Rajani1;~Caiming_Xiong1", "gender": "M;M;;M;F", "homepage": "https://hiroakih.me;;https://bmccann.github.io/;http://cmxiong.com/;https://www.nazneenrajani.com/", "dblp": ";;205/2296;80/7282;", "google_scholar": "dhdP1fIAAAAJ;;QVj22CwAAAAJ;vaSdahkAAAAJ;eIRG81YAAAAJ", "orcid": ";;;;", "linkedin": ";;bmarcusmccann/;caiming-xiong-150a1417;", "or_profile": "~Hiroaki_Hayashi1;~Wojciech_Maciej_Kryscinski1;~Bryan_McCann1;~Caiming_Xiong1;~Nazneen_Fatema_Fatema_Rajani1", "aff": "Carnegie Mellon University;;;Salesforce Research;SalesForce.com", "aff_domain": "cs.cmu.edu;;;salesforce.com;salesforce.com", "position": "PhD student;;;Research Scientist;Researcher", "bibtex": "@misc{\nhayashi2021whats,\ntitle={What's new? Summarizing Contributions in Scientific Literature},\nauthor={Hiroaki Hayashi and Wojciech Maciej Kryscinski and Bryan McCann and Nazneen Rajani and Caiming Xiong},\nyear={2021},\nurl={https://openreview.net/forum?id=HW4aTJHx0X}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=HW4aTJHx0X", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;5;4;4", "wc_review": "348;286;648;464", "wc_reply_reviewers": "61;0;0;50", "wc_reply_authors": "717;328;609;710", "reply_reviewers": "1;0;0;1", "reply_authors": "2;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 436.5, 137.81418649761716 ], "wc_reply_reviewers_avg": [ 27.75, 28.021197333447407 ], "wc_reply_authors_avg": [ 591.0, 157.74187776237483 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18380532491326366466&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;1", "aff_unique_norm": "Carnegie Mellon University;Salesforce", "aff_unique_dep": ";Salesforce Research", "aff_unique_url": "https://www.cmu.edu;https://research.salesforce.com", "aff_unique_abbr": "CMU;Salesforce", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "HWX5j6Bv_ih", "title": "Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.", "keywords": "Federated Learning;Graph Neural Network;Spatio-Temporal Data Modeling", "primary_area": "", "supplementary_material": "/attachment/a02870a7dbe49abd4a1dbf06e8fc1377cb3ebfc2.zip", "author": "Chuizheng Meng;Sirisha Rambhatla;Yan Liu", "authorids": "~Chuizheng_Meng1;~Sirisha_Rambhatla1;~Yan_Liu1", "gender": "M;F;F", "homepage": ";;http://www-bcf.usc.edu/~liu32/", "dblp": "207/8096.html;123/4808.html;150/4295", "google_scholar": "nzkOdekAAAAJ;EOSZeBMAAAAJ;UUKLPMYAAAAJ", "orcid": ";;0000-0002-7055-9518", "linkedin": ";;", "or_profile": "~Chuizheng_Meng1;~Sirisha_Rambhatla1;~Yan_Liu1", "aff": "University of Southern California;University of Southern California;University of Southern California", "aff_domain": "usc.edu;usc.edu;usc.edu", "position": "PhD student;Postdoc;Professor", "bibtex": "@misc{\nmeng2021crossnode,\ntitle={Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling},\nauthor={Chuizheng Meng and Sirisha Rambhatla and Yan Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=HWX5j6Bv_ih}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=HWX5j6Bv_ih", "pdf_size": 0, "rating": "3;5;6;6", "confidence": "5;3;2;1", "wc_review": "1886;367;157;254", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "2063;821;356;426", "reply_reviewers": "0;0;0;0", "reply_authors": "5;2;1;1", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 2.75, 1.479019945774904 ], "wc_review_avg": [ 666.0, 708.2771350255491 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 916.5, 685.261446456752 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 1.6393596310755 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9660917830792958, "gs_citation": 147, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14577849625404840730&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Southern California", "aff_unique_dep": "", "aff_unique_url": "https://www.usc.edu", "aff_unique_abbr": "USC", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "HWqv5Pm3E3", "title": "Source-free Domain Adaptation via Distributional Alignment by Matching Batch Normalization Statistics", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we propose a novel domain adaptation method for the source-free setting. In this setting, we cannot access source data during adaptation, while unlabeled target data and a model pretrained with source data are given. Due to lack of source data, we cannot directly match the data distributions between domains unlike typical domain adaptation algorithms. To cope with this problem, we propose utilizing batch normalization statistics stored in the pretrained model to approximate the distribution of unobserved source data. Specifically, we fix the classifier part of the model during adaptation and only fine-tune the remaining feature encoder part so that batch normalization statistics of the features extracted by the encoder match those stored in the fixed classifier. Additionally, we also maximize the mutual information between the features and the classifier's outputs to further boost the classification performance. Experimental results with several benchmark datasets show that our method achieves competitive performance with state-of-the-art domain adaptation methods even though it does not require access to source data. ", "keywords": "domain adaptation;transfer learning", "primary_area": "", "supplementary_material": "", "author": "Masato Ishii;Masashi Sugiyama", "authorids": "~Masato_Ishii1;~Masashi_Sugiyama1", "gender": "M;M", "homepage": ";http://www.ms.k.u-tokyo.ac.jp/sugi/", "dblp": "27/2060;35/1228", "google_scholar": "https://scholar.google.co.jp/citations?user=RRIO1CcAAAAJ;https://scholar.google.co.jp/citations?user=GkYIrlIAAAAJ", "orcid": ";0000-0001-6658-6743", "linkedin": ";", "or_profile": "~Masato_Ishii1;~Masashi_Sugiyama1", "aff": "The University of Tokyo;The University of Tokyo", "aff_domain": "u-tokyo.ac.jp;u-tokyo.ac.jp", "position": "PhD student;Full Professor", "bibtex": "@misc{\nishii2021sourcefree,\ntitle={Source-free Domain Adaptation via Distributional Alignment by Matching Batch Normalization Statistics},\nauthor={Masato Ishii and Masashi Sugiyama},\nyear={2021},\nurl={https://openreview.net/forum?id=HWqv5Pm3E3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=HWqv5Pm3E3", "pdf_size": 0, "rating": "4;6;6", "confidence": "4;3;3", "wc_review": "507;192;955", "wc_reply_reviewers": "0;0;217", "wc_reply_authors": "508;281;689", "reply_reviewers": "0;0;1", "reply_authors": "1;1;2", "rating_avg": [ 5.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 551.3333333333334, 313.0669080067213 ], "wc_reply_reviewers_avg": [ 72.33333333333333, 102.29478101165388 ], "wc_reply_authors_avg": [ 492.6666666666667, 166.91781077990316 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9999999999999998, "gs_citation": 52, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14393126740179968632&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "id": "HZcDljfUljt", "title": "Filter pre-pruning for improved fine-tuning of quantized deep neural networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and activations, using fine-tuning to recover the drop in accuracy. However, it is generally difficult to train neural networks which use low-bit expressions. One reason is that the weights in the middle layer of the DNN have a wide dynamic range and so when quantizing the wide dynamic range into a few bits, the step size becomes large, which leads to a large quantization error and finally a large degradation in accuracy. To solve this problem, this paper makes the following three contributions without using any additional learning parameters and hyper-parameters. First, we analyze how batch normalization, which causes the aforementioned problem, disturbs the fine-tuning of the quantized DNN. Second, based on these results, we propose a new pruning method called Pruning for Quantization (PfQ) which removes the filters that disturb the fine-tuning of the DNN while not affecting the inferred result as far as possible. Third, we propose a workflow of fine-tuning for quantized DNNs using the proposed pruning method(PfQ). Experiments using well-known models and datasets confirmed that the proposed method achieves higher performance with a similar model size than conventional quantization methods including fine-tuning.", "keywords": "Deep Neural Networks;Quantization;Quantize;Pruning;MobileNet;compression", "primary_area": "", "supplementary_material": "", "author": "Jun Nishikawa;Ryoji Ikegaya", "authorids": "~Jun_Nishikawa2;~Ryoji_Ikegaya1", "gender": ";M", "homepage": "https://twitter.com/__NJ__000;", "dblp": ";87/3430", "google_scholar": ";", "orcid": ";0009-0009-1589-0847", "linkedin": ";", "or_profile": "~Jun_Nishikawa2;~Ryoji_Ikegaya1", "aff": ";Sony Group Corporation", "aff_domain": ";sony.com", "position": ";Researcher", "bibtex": "@misc{\nnishikawa2021filter,\ntitle={Filter pre-pruning for improved fine-tuning of quantized deep neural networks},\nauthor={Jun Nishikawa and Ryoji Ikegaya},\nyear={2021},\nurl={https://openreview.net/forum?id=HZcDljfUljt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=HZcDljfUljt", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;3;3;4", "wc_review": "396;368;458;498", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "316;170;622;453", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;2", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 430.0, 51.0098029794274 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 390.25, 167.08437239909662 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6898884655346065042&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Sony Group Corporation", "aff_unique_dep": "", "aff_unique_url": "https://www.sony.com", "aff_unique_abbr": "Sony", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "title": "Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2661", "id": "HajQFbx_yB", "poster": "", "openreview": "https://openreview.net/forum?id=HajQFbx_yB", "slides": "https://iclr.cc/virtual/2021/poster/2661", "video": "https://iclr.cc/virtual/2021/poster/2661", "author_site": "Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel", "tldr": "", "abstract": "Determinantal point processes (DPPs) have attracted significant attention in machine learning for their ability to model subsets drawn from a large item collection. Recent work shows that nonsymmetric DPP (NDPP) kernels have significant advantages over symmetric kernels in terms of modeling power and predictive performance. However, for an item collection of size $M$, existing NDPP learning and inference algorithms require memory quadratic in $M$ and runtime cubic (for learning) or quadratic (for inference) in $M$, making them impractical for many typical subset selection tasks. In this work, we develop a learning algorithm with space and time requirements linear in $M$ by introducing a new NDPP kernel decomposition. We also derive a linear-complexity NDPP maximum a posteriori (MAP) inference algorithm that applies not only to our new kernel but also to that of prior work. Through evaluation on real-world datasets, we show that our algorithms scale significantly better, and can match the predictive performance of prior work.", "keywords": "determinantal point processes;unsupervised learning;representation learning;submodular optimization", "primary_area": "", "supplementary_material": "/attachment/e5f70f16cdbc4eaf59f11c4509a7f1653bdf75bd.zip", "author": "Mike Gartrell;Insu Han;Elvis Dohmatob;Jennifer Gillenwater;Victor-Emmanuel Brunel", "authorids": "~Mike_Gartrell1;~Insu_Han1;~Elvis_Dohmatob1;~Jennifer_Gillenwater1;~Victor-Emmanuel_Brunel1", "gender": "M;M;M;F;M", "homepage": "https://cgartrel.github.io;https://insuhan.github.io/;http://dohmatob.github.io/;http://jgillenw.com;https://vebrunel.com/", "dblp": "75/3021;160/8272;134/9794;73/3828;203/4175", "google_scholar": "NX6eiWYAAAAJ;0w39xsoAAAAJ;https://scholar.google.fr/citations?user=FDWgJY8AAAAJ;5lUnZgsAAAAJ;", "orcid": ";;;;", "linkedin": "mikegartrell/;;;;", "or_profile": "~Mike_Gartrell1;~Insu_Han1;~Elvis_Dohmatob1;~Jennifer_Gillenwater1;~Victor-Emmanuel_Brunel1", "aff": "Criteo AI Lab;;Criteo;Google;Ensae ParisTech", "aff_domain": "criteo.com;;criteo.com;google.com;ensae.fr", "position": "Senior Researcher;;Researcher;Research Scientist;Assistant Professor", "bibtex": "@inproceedings{\ngartrell2021scalable,\ntitle={Scalable Learning and {\\{}MAP{\\}} Inference for Nonsymmetric Determinantal Point Processes},\nauthor={Mike Gartrell and Insu Han and Elvis Dohmatob and Jennifer Gillenwater and Victor-Emmanuel Brunel},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=HajQFbx_yB}\n}", "github": "[![github](/images/github_icon.svg) cgartrel/nonsymmetric-DPP-learning](https://github.com/cgartrel/nonsymmetric-DPP-learning) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=HajQFbx_yB)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "7;8;9", "confidence": "4;4;5", "wc_review": "324;273;220", "wc_reply_reviewers": "21;0;88", "wc_reply_authors": "388;118;260", "reply_reviewers": "1;0;1", "reply_authors": "1;1;1", "rating_avg": [ 8.0, 0.816496580927726 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 272.3333333333333, 42.460439103816256 ], "wc_reply_reviewers_avg": [ 36.333333333333336, 37.52628708281999 ], "wc_reply_authors_avg": [ 255.33333333333334, 110.2764203666808 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 19, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14875734078245489785&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=HajQFbx_yB", "email": "criteo.com;;criteo.com;google.com;ensae.fr", "author_num": 5, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "Criteo;Google;ENSAE ParisTech", "aff_unique_dep": "Criteo AI Lab;Google;", "aff_unique_url": "https://www.criteo.com;https://www.google.com;https://www.ensae.fr", "aff_unique_abbr": "Criteo;Google;Ensae", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "France;United States" }, { "id": "HbZTcIuiMAG", "title": "Fusion 360 Gallery: A Dataset and Environment for Programmatic CAD Reconstruction", "track": "main", "status": "Reject", "tldr": "", "abstract": "Parametric computer-aided design (CAD) is a standard paradigm used for the design of manufactured objects. CAD designers perform modeling operations, such as sketch and extrude, to form a construction sequence that makes up a final design. Despite the pervasiveness of parametric CAD and growing interest from the research community, a dataset of human designed 3D CAD construction sequences has not been available to-date. In this paper we present the Fusion 360 Gallery reconstruction dataset and environment for learning CAD reconstruction. We provide a dataset of 8,625 designs, comprising sequential sketch and extrude modeling operations, together with a complementary environment called the Fusion 360 Gym, to assist with performing CAD reconstruction. We outline a standard CAD reconstruction task, together with evaluation metrics, and present results from a novel method using neurally guided search to recover a construction sequence from a target geometry.", "keywords": "CAD;dataset;3D;reconstruction;environment;design;sequence", "primary_area": "", "supplementary_material": "", "author": "Karl Willis;Yewen Pu;Jieliang Luo;Hang Chu;Tao Du;Joseph Lambourne;Armando Solar-Lezama;Wojciech Matusik", "authorids": "~Karl_Willis1;~Yewen_Pu1;~Jieliang_Luo1;~Hang_Chu4;~Tao_Du1;joseph.lambourne@autodesk.com;~Armando_Solar-Lezama1;~Wojciech_Matusik2", "gender": ";M;M;;;;M;M", "homepage": ";http://www.mit.edu/~yewenpu;;;https://people.iiis.tsinghua.edu.cn/~taodu/;;https://people.csail.mit.edu/asolar/;https://cdfg.mit.edu/wojciech", "dblp": "82/121;53/10322;;;51/3026-1;;95/6919;", "google_scholar": "yMoEQSMAAAAJ;LJnNKXMAAAAJ;;https://scholar.google.ca/citations?user=awvsNQYAAAAJ;https://scholar.google.com/citations?hl=en;;https://scholar.google.com.tw/citations?user=8BX3BokAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;0000-0001-7337-7667;;;0000-0003-0212-5643", "linkedin": ";;rodger-luo/;;;;;wojciech-matusik-67238126/", "or_profile": "~Karl_Willis1;~Yewen_Pu1;~Jieliang_Luo1;~Hang_Chu4;~Tao_Du1;joseph.lambourne@autodesk.com;~Armando_Solar-Lezama1;~Wojciech_Matusik2", "aff": "Autodesk;Autodesk;Autodesk;Autodesk;Massachusetts Institute of Technology;;Massachusetts Institute of Technology;Massachusetts Institute of Technology", "aff_domain": "autodesk.com;autodesk.com;autodesk.com;autodesk.com;mit.edu;;mit.edu;mit.edu", "position": "Senior Research Manager;Principal Researcher;Principal Research Scientist;Researcher;PhD student;;Full Professor;Full Professor", "bibtex": "@misc{\nwillis2021fusion,\ntitle={Fusion 360 Gallery: A Dataset and Environment for Programmatic {\\{}CAD{\\}} Reconstruction},\nauthor={Karl Willis and Yewen Pu and Jieliang Luo and Hang Chu and Tao Du and Joseph Lambourne and Armando Solar-Lezama and Wojciech Matusik},\nyear={2021},\nurl={https://openreview.net/forum?id=HbZTcIuiMAG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=HbZTcIuiMAG", "pdf_size": 0, "rating": "4;5;7;8", "confidence": "4;3;4;1", "wc_review": "674;514;344;258", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "128;403;661;76", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 1.5811388300841898 ], "confidence_avg": [ 3.0, 1.224744871391589 ], "wc_review_avg": [ 447.5, 159.95858839087072 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 317.0, 234.2722774892497 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.6454972243679027, "gs_citation": 23, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16244089990203369093&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;1;1;1", "aff_unique_norm": "Autodesk;Massachusetts Institute of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.autodesk.com;https://web.mit.edu", "aff_unique_abbr": "Autodesk;MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "HclUGWwAVE", "title": "Sample Balancing for Improving Generalization under Distribution Shifts", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Deep neural networks have achieved striking performance when evaluated on testing data which share the same distribution with training ones, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is of paramount importance for building performance-promising deep models. Conventional methods (e.g. domain adaptation/generalization) assume either the availability of testing data or the known heterogeneity of training data (e.g. domain labels). In this paper, we consider a more challenging case where neither of the above information is available during the training phase. We propose to address this problem by removing the dependencies between features via reweighting training samples, which results in a more balanced distribution and helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between features and labels. We conduct extensive experiments on object recognition benchmarks including PACS, VLCS, MNIST-M, and NICO which support the evaluation of generalization ability. The experimental results clearly demonstrate the effectiveness of the proposed method compared with state-of-the-art counterparts.", "keywords": "Image classification;distribution shift", "primary_area": "", "supplementary_material": "/attachment/569460e8ff39a67a537d330fe942bcf2f525233b.zip", "author": "Xingxuan Zhang;Peng Cui;Renzhe Xu;Yue He;Linjun Zhou;Zheyan Shen", "authorids": "~Xingxuan_Zhang1;~Peng_Cui1;~Renzhe_Xu1;~Yue_He2;~Linjun_Zhou1;~Zheyan_Shen1", "gender": "M;M;M;M;M;", "homepage": "https://xingxuanzhang.cn;http://pengcui.thumedialab.com/;https://windxrz.github.io;https://heyuethu.github.io;https://scholar.google.com/citations?user=M0EnLh8AAAAJ&hl=zh-CN;", "dblp": "226/2478;31/891-1;245/5972;51/6071-1;207/7466;205/2423", "google_scholar": ";https://scholar.google.com.tw/citations?user=G8x97ZgAAAAJ;NnppITIAAAAJ;;M0EnLh8AAAAJ;", "orcid": "0009-0002-4788-1127;0000-0003-2957-8511;0000-0001-8418-0034;0009-0009-1536-1179;;", "linkedin": ";;;;;", "or_profile": "~Xingxuan_Zhang1;~Peng_Cui1;~Renzhe_Xu1;~Yue_He2;~Linjun_Zhou1;~Zheyan_Shen1", "aff": "Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University", "aff_domain": "mails.tsinghua.edu.cn;tsinghua.edu.cn;mails.tsinghua.edu.cn;tsinghua.edu.cn;mails.tsinghua.edu.cn;tsinghua.edu.cn", "position": "PhD student;Associate Professor;PhD student;PhD student;PhD student;PhD student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=HclUGWwAVE", "pdf_size": 0, "rating": "3;3;4;6", "confidence": "5;5;5;3", "wc_review": "589;417;805;260", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;24", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;1", "rating_avg": [ 4.0, 1.224744871391589 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 517.75, 202.59241718287484 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 6.0, 10.392304845413264 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.25, 0.4330127018922193 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.9428090415820632, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:MEAEiAaF65kJ:scholar.google.com/&scioq=Sample+Balancing+for+Improving+Generalization+under+Distribution+Shifts&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;0;0", "aff_unique_norm": "Tsinghua University", "aff_unique_dep": "", "aff_unique_url": "https://www.tsinghua.edu.cn", "aff_unique_abbr": "THU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "China" }, { "id": "HdX654Yn81", "title": "Improving the Unsupervised Disentangled Representation Learning with VAE Ensemble", "track": "main", "status": "Reject", "tldr": "", "abstract": "Variational Autoencoder (VAE) based frameworks have achieved the state-of-the-art performance on the unsupervised disentangled representation learning. A recent theoretical analysis shows that such success is mainly due to the VAE implementation choices that encourage a PCA-like behavior locally on data samples. Despite this implied model identifiability, the VAE based disentanglement frameworks still face the trade-off between the local orthogonality and data reconstruction. As a result, models with the same architecture and hyperparameter setting can sometime learn entangled representations. To address this challenge, we propose a simple yet effective VAE ensemble framework consisting of multiple VAEs. It is based on the assumption that entangled representations are unique in their own ways, and the disentangled representations are \"alike\" (similar up to a signed permutation transformation). In the proposed VAE ensemble, each model not only maintains its original objective, but also encodes to and decodes from other models through pair-wise linear transformations between the latent representations. We show both theoretically and experimentally, the VAE ensemble objective encourages the linear transformations connecting the VAEs to be trivial transformations, aligning the latent representations of different models to be \"alike\". We compare our approach with the state-of-the-art unsupervised disentangled representation learning approaches and show the improved performance.", "keywords": "Unsupervised disentangled representation learning;network ensemble;variational auto encoder", "primary_area": "", "supplementary_material": "/attachment/90fe09c02f145fe5d4e33f225be450b37bac32b3.zip", "author": "Nanxiang Li;Shabnam Ghaffarzadegan;Liu Ren", "authorids": "~Nanxiang_Li1;~Shabnam_Ghaffarzadegan1;~Liu_Ren1", "gender": "M;F;M", "homepage": ";;https://sites.google.com/site/liurenshomepage/", "dblp": ";117/8486;65/4250", "google_scholar": ";;", "orcid": ";;", "linkedin": "nanxiang-li-80948915/;;", "or_profile": "~Nanxiang_Li1;~Shabnam_Ghaffarzadegan1;~Liu_Ren1", "aff": "Bosch;;Bosch Research", "aff_domain": "us.bosch.com;;us.bosch.com", "position": "Senior Scientist;;Principal Researcher", "bibtex": "@misc{\nli2021improving,\ntitle={Improving the Unsupervised Disentangled Representation Learning with {\\{}VAE{\\}} Ensemble},\nauthor={Nanxiang Li and Shabnam Ghaffarzadegan and Liu Ren},\nyear={2021},\nurl={https://openreview.net/forum?id=HdX654Yn81}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=HdX654Yn81", "pdf_size": 0, "rating": "3;5;7", "confidence": "4;3;4", "wc_review": "786;366;1054", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1044;856;767", "reply_reviewers": "0;0;0", "reply_authors": "2;2;2", "rating_avg": [ 5.0, 1.632993161855452 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 735.3333333333334, 283.15052926353894 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 889.0, 115.46716705049391 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:9hMrg3o_7YwJ:scholar.google.com/&scioq=Improving+the+Unsupervised+Disentangled+Representation+Learning+with+VAE+Ensemble&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Robert Bosch GmbH;Bosch Research", "aff_unique_dep": ";", "aff_unique_url": "https://www.bosch.com;https://research.bosch.com", "aff_unique_abbr": "Bosch;Bosch", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "HeEzgm-f4g1", "title": "On Batch-size Selection for Stochastic Training for Graph Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Batch size is an important hyper-parameter for training deep learning models with stochastic gradient decent (SGD) method, and it has great influence on the training time and model performance. We study the batch size selection problem for training graph neural network (GNN) with SGD method. \nTo reduce the training time while keeping a decent model performance, we propose a metric that combining both the variance of gradients and compute time for each mini-batch. We theoretically analyze how batch-size influence such a metric and propose the formula to evaluate some rough range of optimal batch size. \nIn GNN, gradients evaluated on samples in a mini-batch are not independent and it is challenging to evaluate the exact variance of gradients. To address the dependency, we analyze an estimator for gradients that considers the randomness arising from two consecutive layers in GNN, and suggest a guideline for picking the appropriate scale of the batch size. \nWe complement our theoretical results with extensive empirical experiments for ClusterGCN, FastGCN and GraphSAINT on 4 datasets: Ogbn-products, Ogbn-arxiv, Reddit and Pubmed. We demonstrate that in contrast to conventional deep learning models, GNNs benefit from large batch sizes.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/c3c8c6c5f1207dc80cbf994ffaf5e32a61ae48b3.zip", "author": "Yaochen Hu;Amit Levi;Ishaan Kumar;Yingxue Zhang;Mark Coates", "authorids": "~Yaochen_Hu1;amit.levi@huawei.com;ishaan.kumar@huawei.com;yingxue.zhang@huawei.com;~Mark_Coates1", "gender": "M;;;;M", "homepage": "https://hyclex.github.io/;;;;http://www.ece.mcgill.ca/~mcoate/", "dblp": "143/4817-1;;;;c/MarkCoates", "google_scholar": "VMwM-ZwAAAAJ;;;;https://scholar.google.ca/citations?user=qxWORNoAAAAJ", "orcid": ";;;;0000-0001-5030-1379", "linkedin": ";;;;", "or_profile": "~Yaochen_Hu1;amit.levi@huawei.com;ishaan.kumar@huawei.com;yingxue.zhang@huawei.com;~Mark_Coates1", "aff": "Huawei Technologies Ltd.;;;;McGill University", "aff_domain": "huawei.com;;;;mcgill.ca", "position": "Principal Researcher;;;;Full Professor", "bibtex": "@misc{\nhu2021on,\ntitle={On Batch-size Selection for Stochastic Training for Graph Neural Networks},\nauthor={Yaochen Hu and Amit Levi and Ishaan Kumar and Yingxue Zhang and Mark Coates},\nyear={2021},\nurl={https://openreview.net/forum?id=HeEzgm-f4g1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=HeEzgm-f4g1", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;4;4;3", "wc_review": "471;658;1102;808", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "587;590;458;388", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 759.75, 230.86400217444034 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 505.75, 86.3781656438709 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2383107612332846059&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Huawei;McGill University", "aff_unique_dep": "Huawei Technologies;", "aff_unique_url": "https://www.huawei.com;https://www.mcgill.ca", "aff_unique_abbr": "Huawei;McGill", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "China;Canada" }, { "title": "Learning from others' mistakes: Avoiding dataset biases without modeling them", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3089", "id": "Hf3qXoiNkR", "poster": "", "openreview": "https://openreview.net/forum?id=Hf3qXoiNkR", "slides": "https://iclr.cc/virtual/2021/poster/3089", "video": "https://iclr.cc/virtual/2021/poster/3089", "author_site": "Victor Sanh, Thomas Wolf, Yonatan Belinkov, Alexander M Rush", "tldr": "", "abstract": "State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We consider cases where the bias issues may not be explicitly identified, and show a method for training models that learn to ignore these problematic correlations. Our approach relies on the observation that models with limited capacity primarily learn to exploit biases in the dataset. We can leverage the errors of such limited capacity models to train a more robust model in a product of experts, thus bypassing the need to hand-craft a biased model. We show the effectiveness of this method to retain improvements in out-of-distribution settings even if no particular bias is targeted by the biased model.", "keywords": "dataset bias;product of experts;natural language processing", "primary_area": "", "supplementary_material": "", "author": "Victor Sanh;Thomas Wolf;Yonatan Belinkov;Alexander M Rush", "authorids": "~Victor_Sanh1;~Thomas_Wolf1;~Yonatan_Belinkov1;~Alexander_M_Rush1", "gender": ";M;M;M", "homepage": ";https://thomwolf.io;https://www.belinkov.com;http://rush.seas.harvard.edu/", "dblp": "230/4101;;136/8705;http://dblp.uni-trier.de/pers/hd/r/Rush:Alexander_M=", "google_scholar": "6STg_7IAAAAJ;D2H5EFEAAAAJ;https://scholar.google.com/citations?authorid=K-6ujU4AAAAJ;LIjnUGgAAAAJ", "orcid": ";;;0000-0002-9900-1606", "linkedin": "victor-sanh/;;;sasha-rush-a69b6917/", "or_profile": "~Victor_Sanh1;~Thomas_Wolf1;~Yonatan_Belinkov1;~Alexander_M_Rush1", "aff": "Hugging Face;Hugging Face;Technion, Technion;School of Engineering and Applied Sciences, Harvard University", "aff_domain": "huggingface.co;huggingface.co;technion.ac.il;seas.harvard.edu", "position": "Researcher;Researcher;Assistant Professor;Assistant Professor", "bibtex": "@inproceedings{\nsanh2021learning,\ntitle={Learning from others' mistakes: Avoiding dataset biases without modeling them},\nauthor={Victor Sanh and Thomas Wolf and Yonatan Belinkov and Alexander M Rush},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Hf3qXoiNkR}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "2;6;7;7", "confidence": "5;4;4;4", "wc_review": "243;560;740;575", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "684;942;902;613", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;2;1", "rating_avg": [ 5.5, 2.0615528128088303 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 529.5, 179.85619255394016 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 785.25, 139.75223611806717 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.9801960588196067, "gs_citation": 118, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10773495630588602743&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=Hf3qXoiNkR", "email": "huggingface.co;huggingface.co;technion.ac.il;seas.harvard.edu", "author_num": 4, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "Hugging Face;Technion - Israel Institute of Technology;Harvard University", "aff_unique_dep": ";;School of Engineering and Applied Sciences", "aff_unique_url": "https://huggingface.co;https://www.technion.ac.il/en/;https://www.harvard.edu", "aff_unique_abbr": "Hugging Face;Technion;Harvard", "aff_campus_unique_index": "1", "aff_campus_unique": ";Cambridge", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "United States;Israel" }, { "id": "HfnQjEN_ZC", "title": "Ballroom Dance Movement Recognition Using a Smart Watch and Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Smart watches are being increasingly used to detect human gestures and movements. Using a single smart watch, whole body movement recognition remains a hard problem because movements may not be adequately captured by the sensors in the watch. In this paper, we present a whole body movement detection study using a single smart watch in the context of ballroom dancing. Deep learning representations are used to classify well-defined sequences of movements, called \\emph{figures}. Those representations are found to outperform ensembles of random forests and hidden Markov models. The classification accuracy of 85.95\\% was improved to 92.31\\% by modeling a dance as a first-order Markov chain of figures.", "keywords": "ballroom;sequence;deep;learning;machine;markov;prior", "primary_area": "", "supplementary_material": "", "author": "Varun Badrinath Krishna", "authorids": "~Varun_Badrinath_Krishna1", "gender": "", "homepage": "https://www.siebelscholars.com/scholar-profile/1382/", "dblp": "", "google_scholar": "SMuNexUAAAAJ", "orcid": "", "linkedin": "", "or_profile": "~Varun_Badrinath_Krishna1", "aff": "", "aff_domain": "", "position": "", "bibtex": "@misc{\nkrishna2021ballroom,\ntitle={Ballroom Dance Movement Recognition Using a Smart Watch and Representation Learning},\nauthor={Varun Badrinath Krishna},\nyear={2021},\nurl={https://openreview.net/forum?id=HfnQjEN_ZC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=HfnQjEN_ZC", "pdf_size": 0, "rating": "4;4;4", "confidence": "3;5;5", "wc_review": "252;791;436", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.0, 0.0 ], "confidence_avg": [ 4.333333333333333, 0.9428090415820634 ], "wc_review_avg": [ 493.0, 223.70665315691141 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:TdILQ46_oGMJ:scholar.google.com/&scioq=Ballroom+Dance+Movement+Recognition+Using+a+Smart+Watch+and+Representation+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0 }, { "title": "Regularized Inverse Reinforcement Learning", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2850", "id": "HgLO8yalfwc", "poster": "", "openreview": "https://openreview.net/forum?id=HgLO8yalfwc", "slides": "https://iclr.cc/virtual/2021/poster/2850", "video": "https://iclr.cc/virtual/2021/poster/2850", "author_site": "Wonseok Jeon, Chen-Yang Su, Paul Barde, Thang Doan, Derek Nowrouzezahrai, Joelle Pineau", "tldr": "", "abstract": "Inverse Reinforcement Learning (IRL) aims to facilitate a learner\u2019s ability to imitate expert behavior by acquiring reward functions that explain the expert\u2019s decisions. Regularized IRLapplies strongly convex regularizers to the learner\u2019s policy in order to avoid the expert\u2019s behavior being rationalized by arbitrary constant rewards, also known as degenerate solutions. We propose tractable solutions, and practical methods to obtain them, for regularized IRL. Current methods are restricted to the maximum-entropy IRL framework, limiting them to Shannon-entropy regularizers, as well as proposing solutions that are intractable in practice. We present theoretical backing for our proposed IRL method\u2019s applicability to both discrete and continuous controls, empirically validating our performance on a variety of tasks.", "keywords": "inverse reinforcement learning;reward learning;regularized markov decision processes;reinforcement learning", "primary_area": "", "supplementary_material": "", "author": "Wonseok Jeon;Chen-Yang Su;Paul Barde;Thang Doan;Derek Nowrouzezahrai;Joelle Pineau", "authorids": "~Wonseok_Jeon1;~Chen-Yang_Su1;~Paul_Barde1;~Thang_Doan1;~Derek_Nowrouzezahrai1;~Joelle_Pineau1", "gender": "M;M;M;;Not Specified;F", "homepage": ";https://chenyangsu.github.io/;https://pbarde.github.io/;;https://www.cim.mcgill.ca/~derek/;http://www.cs.mcgill.ca/~jpineau", "dblp": ";;246/4858;;30/4225;p/JoellePineau", "google_scholar": "https://scholar.google.com/citations?hl=en;;FoxktlkAAAAJ;;https://scholar.google.ca/citations?user=nCZ2PMcAAAAJ;https://scholar.google.ca/citations?user=CEt6_mMAAAAJ", "orcid": ";;;;;", "linkedin": ";chen-yang-su/;;;;", "or_profile": "~Wonseok_Jeon1;~Chen-Yang_Su1;~Paul_Barde1;~Thang_Doan1;~Derek_Nowrouzezahrai1;~Joelle_Pineau1", "aff": "MILA/McGill University;MILA/McGill University;INRIA;;McGill University;Meta Facebook", "aff_domain": "mail.mcgill.ca;mail.mcgill.ca;inria.fr;;mcgill.ca;fb.com", "position": "Postdoc;MS student;PhD student;;Full Professor;Researcher Manager", "bibtex": "@inproceedings{\njeon2021regularized,\ntitle={Regularized Inverse Reinforcement Learning},\nauthor={Wonseok Jeon and Chen-Yang Su and Paul Barde and Thang Doan and Derek Nowrouzezahrai and Joelle Pineau},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=HgLO8yalfwc}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer5", "pdf_size": 0, "rating": "6;6;7;7;8", "confidence": "4;3;3;3;4", "wc_review": "422;173;353;371;361", "wc_reply_reviewers": "25;0;0;0;0", "wc_reply_authors": "587;249;751;699;1012", "reply_reviewers": "1;0;0;0;0", "reply_authors": "1;1;1;1;2", "rating_avg": [ 6.8, 0.7483314773547882 ], "confidence_avg": [ 3.4, 0.4898979485566356 ], "wc_review_avg": [ 336.0, 84.97529052612883 ], "wc_reply_reviewers_avg": [ 5.0, 10.0 ], "wc_reply_authors_avg": [ 659.6, 248.17542182899578 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 1.2, 0.4 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.21821789023599233, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=310682822248963614&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=HgLO8yalfwc", "email": "mail.mcgill.ca;mail.mcgill.ca;inria.fr;;mcgill.ca;fb.com", "author_num": 6, "aff_unique_index": "0;0;1;0;2", "aff_unique_norm": "McGill University;INRIA;Meta", "aff_unique_dep": "MILA;;Meta Platforms, Inc.", "aff_unique_url": "https://www.mcgill.ca;https://www.inria.fr;https://meta.com", "aff_unique_abbr": "McGill;INRIA;Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0;2", "aff_country_unique": "Canada;France;United States" }, { "id": "HjD70ArLTQt", "title": "Generating unseen complex scenes: are we there yet?", "track": "main", "status": "Reject", "tldr": "", "abstract": "Although recent complex scene conditional generation models generate increasingly appealing scenes, it is very hard to assess which models perform better and why. This is often due to models being trained to fit different data splits, and defining their own experimental setups. In this paper, we propose a methodology to compare complex scene conditional generation models, and provide an in-depth analysis that assesses the ability of each model to (1) fit the training distribution and hence perform well on seen conditionings, (2) to generalize to unseen conditionings composed of seen object combinations, and (3) generalize to unseen conditionings composed of unseen object combinations. As a result, we observe that recent methods are able to generate recognizable scenes given seen conditionings, and exploit compositionality to generalize to unseen conditionings with seen object combinations. However, all methods suffer from noticeable image quality degradation when asked to generate images from conditionings composed of unseen object combinations. Moreover, through our analysis, we identify the advantages of different pipeline components, and find that (1) encouraging compositionality through instance-wise spatial conditioning normalizations increases robustness to both types of unseen conditionings, (2) using semantically aware losses such as the scene-graph perceptual similarity helps improve some dimensions of the generation process, and (3) enhancing the quality of generated masks and the quality of the individual objects are crucial steps to improve robustness to both types of unseen conditionings.", "keywords": "generative adversarial networks;conditional scene generation;zero-shot generalization;out of distribution", "primary_area": "", "supplementary_material": "", "author": "Arantxa Casanova;Michal Drozdzal;Adriana Romero", "authorids": "~Arantxa_Casanova1;~Michal_Drozdzal1;~Adriana_Romero1", "gender": "F;M;F", "homepage": ";;https://sites.google.com/site/adriromsor/home", "dblp": "193/6415.html;24/9794;54/10771", "google_scholar": "iFhSTbAAAAAJ;https://scholar.google.ca/citations?user=XK_ktwQAAAAJ;https://scholar.google.ca/citations?user=Sm15FXIAAAAJ", "orcid": ";;", "linkedin": ";;https://ca.linkedin.com/in/adriana-romero-a6415123", "or_profile": "~Arantxa_Casanova1;~Michal_Drozdzal1;~Adriana_Romero1", "aff": "Polytechnique Montreal;Meta;Meta", "aff_domain": "polymtl.ca;fb.com;meta.com", "position": "PhD student;Research Scientst;Research Scientist", "bibtex": "@misc{\ncasanova2021generating,\ntitle={Generating unseen complex scenes: are we there yet?},\nauthor={Arantxa Casanova and Michal Drozdzal and Adriana Romero},\nyear={2021},\nurl={https://openreview.net/forum?id=HjD70ArLTQt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=HjD70ArLTQt", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "5;4;3;5", "wc_review": "380;543;292;569", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "442;533;714;1126", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 446.0, 114.68434941176587 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 703.75, 262.7112245413203 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0909090909090909, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15142701628509011003&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;1", "aff_unique_norm": "Polytechnique Montreal;Meta", "aff_unique_dep": ";Meta Platforms, Inc.", "aff_unique_url": "https://www.polymtl.ca;https://meta.com", "aff_unique_abbr": "PolyMTL;Meta", "aff_campus_unique_index": "0", "aff_campus_unique": "Montreal;", "aff_country_unique_index": "0;1;1", "aff_country_unique": "Canada;United States" }, { "id": "HkUfnZFt1Rw", "title": "Dissecting graph measures performance for node clustering in LFR parameter space", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph measures can be used for graph node clustering using metric clustering algorithms. There are multiple measures applicable to this task, and which one performs better is an open question. We study the performance of 25 graph measures on generated graphs with different parameters. While usually measure comparisons are limited to general measure ranking on a particular dataset, we aim to explore the performance of various measures depending on graph features. Using an LFR generator, we create a dataset of ~7500 graphs covering the whole LFR parameter space. For each graph, we assess the quality of clustering with k-means algorithm for every considered measure. We determine the best measure for every area of the parameter space. We find that the parameter space consists of distinct zones where one particular measure is the best. We analyze the geometry of the resulting zones and describe it with simple criteria. Given particular graph parameters, this allows us to choose the best measure to use for clustering.", "keywords": "graph theory;graph measures;kernel k-means;clustering", "primary_area": "", "supplementary_material": "", "author": "Vladimir Ivashkin;Pavel Chebotarev", "authorids": "~Vladimir_Ivashkin1;pavel4e@gmail.com", "gender": "M;", "homepage": ";", "dblp": ";", "google_scholar": "WQMvNsUAAAAJ;", "orcid": "0000-0002-8695-4192;", "linkedin": "ivashkin/;", "or_profile": "~Vladimir_Ivashkin1;pavel4e@gmail.com", "aff": "Moscow Institute of Physics and Technology;", "aff_domain": "phystech.edu;", "position": "PhD student;", "bibtex": "@misc{\nivashkin2021dissecting,\ntitle={Dissecting graph measures performance for node clustering in {\\{}LFR{\\}} parameter space},\nauthor={Vladimir Ivashkin and Pavel Chebotarev},\nyear={2021},\nurl={https://openreview.net/forum?id=HkUfnZFt1Rw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=HkUfnZFt1Rw", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "5;5;2;4", "wc_review": "479;358;160;144", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1166;1038;837;569", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 1.224744871391589 ], "wc_review_avg": [ 285.25, 140.06315539784188 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 902.5, 225.44677864187813 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5477225575051661, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15782602203144276166&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "Moscow Institute of Physics and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.mipt.ru/en", "aff_unique_abbr": "MIPT", "aff_country_unique_index": "0", "aff_country_unique": "Russian Federation" }, { "id": "HmAhqnu3qu", "title": "Graph Representation Learning for Multi-Task Settings: a Meta-Learning Approach", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph Neural Networks (GNNs) have become the state-of-the-art method for many applications on graph structured data. GNNs are a framework for graph representation learning, where a model learns to generate low dimensional node embeddings that encapsulate structural and feature-related information. GNNs are usually trained in an end-to-end fashion, leading to highly specialized node embeddings. While this approach achieves great results in the single-task setting, generating node embeddings that can be used to perform multiple tasks (with performance comparable to single-task models) is an open problem. We propose a novel representation learning strategy, based on meta-learning, capable of producing multi-task node embeddings. Our method avoids the difficulties arising when learning to perform multiple tasks concurrently by, instead, learning to quickly (i.e. with a few steps of gradient descent) adapt to multiple tasks singularly. We show that the embeddings produced by our method can be used to perform multiple tasks with comparable or higher performance than both single-task and multi-task end-to-end models. Our method is model-agnostic and task-agnostic and can hence be applied to a wide variety of multi-task domains.", "keywords": "Graph Representation Learning;Multi-Task Learning;Meta-Learning;Graph Neural Networks", "primary_area": "", "supplementary_material": "", "author": "Davide Buffelli;Fabio Vandin", "authorids": "~Davide_Buffelli1;~Fabio_Vandin2", "gender": "M;", "homepage": "https://davidebuffelli.github.io;", "dblp": "267/1651;62/5172", "google_scholar": "v28My7wAAAAJ;", "orcid": "0000-0001-5565-1634;", "linkedin": "davide-buffelli/;", "or_profile": "~Davide_Buffelli1;~Fabio_Vandin2", "aff": "Samsung;", "aff_domain": "samsung.com;", "position": "Research Intern;", "bibtex": "@misc{\nbuffelli2021graph,\ntitle={Graph Representation Learning for Multi-Task Settings: a Meta-Learning Approach},\nauthor={Davide Buffelli and Fabio Vandin},\nyear={2021},\nurl={https://openreview.net/forum?id=HmAhqnu3qu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=HmAhqnu3qu", "pdf_size": 0, "rating": "5;6;7", "confidence": "4;5;3", "wc_review": "118;240;136", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "612;874;283", "reply_reviewers": "0;0;0", "reply_authors": "1;2;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 164.66666666666666, 53.77318621353542 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 589.6666666666666, 241.7910024977954 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16166456546158451795&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "Samsung", "aff_unique_dep": "Samsung", "aff_unique_url": "https://www.samsung.com", "aff_unique_abbr": "Samsung", "aff_country_unique_index": "0", "aff_country_unique": "South Korea" }, { "id": "HowQIZwD_42", "title": "Measuring and Harnessing Transference in Multi-Task Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naive formulations often degrade performance and in particular, identifying the tasks that would benefit from co-training remains a challenging design question. In this paper, we analyze the dynamics of information transfer, or transference, across tasks throughout training. Specifically, we develop a similarity measure that can quantify transference among tasks and use this quantity to both better understand the optimization dynamics of multi-task learning as well as improve overall learning performance. In the latter case, we propose two methods to leverage our transference metric. The first operates at a macro-level by selecting which tasks should train together while the second functions at a micro-level by determining how to combine task gradients at each training step. We find these methods can lead to significant improvement over prior work on three supervised multi-task learning benchmarks and one multi-task reinforcement learning paradigm.", "keywords": "multitask learning", "primary_area": "", "supplementary_material": "/attachment/c8a911dfb937a4ba40c439b44b902b883dff3253.zip", "author": "Chris Fifty;Ehsan Amid;Zhe Zhao;Tianhe Yu;Rohan Anil;Chelsea Finn", "authorids": "~Chris_Fifty1;~Ehsan_Amid1;~Zhe_Zhao3;~Tianhe_Yu1;~Rohan_Anil1;~Chelsea_Finn1", "gender": ";M;M;M;M;F", "homepage": "https://cfifty.github.io;https://sites.google.com/corp/view/eamid/;https://sites.google.com/view/zhezhao;https://cs.stanford.edu/~tianheyu/;;https://ai.stanford.edu/~cbfinn/", "dblp": "236/4965;142/5754;28/6429-1.html;192/1797;182/1833;131/1783", "google_scholar": "lg2M2RYAAAAJ;https://scholar.google.fi/citations?user=F6omR3gAAAAJ;TRZB0J4AAAAJ;;;vfPE6hgAAAAJ", "orcid": ";;;;;", "linkedin": ";ehsan-amid-63aba754;;;;", "or_profile": "~Chris_Fifty1;~Ehsan_Amid1;~Zhe_Zhao3;~Tianhe_Yu1;~Rohan_Anil1;~Chelsea_Finn1", "aff": "Google Brain;Google DeepMind;Google;Stanford University;Google Brain ;Google", "aff_domain": "google.com;google.com;google.com;stanford.edu;google.com;google.com", "position": "Researcher;Research Scientist;Research Scientist;PhD student;Principal Engineer;Research Scientist", "bibtex": "@misc{\nfifty2021measuring,\ntitle={Measuring and Harnessing Transference in Multi-Task Learning},\nauthor={Chris Fifty and Ehsan Amid and Zhe Zhao and Tianhe Yu and Rohan Anil and Chelsea Finn},\nyear={2021},\nurl={https://openreview.net/forum?id=HowQIZwD_42}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=HowQIZwD_42", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;5;4;5", "wc_review": "243;207;392;329", "wc_reply_reviewers": "124;0;100;27", "wc_reply_authors": "1064;294;746;399", "reply_reviewers": "1;0;1;1", "reply_authors": "2;1;2;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 292.75, 72.44437521298669 ], "wc_reply_reviewers_avg": [ 62.75, 50.87914602270757 ], "wc_reply_authors_avg": [ 625.75, 303.3136783925183 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14266708292780865076&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0;1;0;0", "aff_unique_norm": "Google;Stanford University", "aff_unique_dep": "Google Brain;", "aff_unique_url": "https://brain.google.com;https://www.stanford.edu", "aff_unique_abbr": "Google Brain;Stanford", "aff_campus_unique_index": "0;0;2;0;0", "aff_campus_unique": "Mountain View;;Stanford", "aff_country_unique_index": "0;1;0;0;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "Hpxrls8yAn", "title": "Robust Memory Augmentation by Constrained Latent Imagination", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The latent dynamics model summarizes an agent\u2019s high dimensional experiences in a compact way. While learning from imagined trajectories by the latent model is confirmed to has great potential to facilitate behavior learning, the lack of memory diversity limits generalization capability. Inspired by a neuroscience experiment of \u201cforming artificial memories during sleep\u201d, we propose a robust memory augmentation method with Constrained Latent ImaginatiON (CLION) under a novel actor-critic framework, which aims to speed up the learning of the optimal policy with virtual episodic. Various experiments on high-dimensional visual control tasks with arbitrary image uncertainty demonstrate that CLION outperforms existing approaches in terms of data-efficiency, robustness to uncertainty, and final performance.", "keywords": "Memory Augmentation;Model-based reinforcement learning;Latent imagination", "primary_area": "", "supplementary_material": "", "author": "Yao Mu;Yuzheng Zhuang;Bin Wang;Wulong Liu;Shengbo Eben Li;Jianye HAO", "authorids": "~Yao_Mu1;~Yuzheng_Zhuang1;~Bin_Wang12;~Wulong_Liu1;~Shengbo_Eben_Li1;~Jianye_HAO1", "gender": "M;F;M;M;M;M", "homepage": "https://yaomarkmu.github.io/;;http://binwang.top;;https://www.researchgate.net/profile/Shengbo_Li;http://www.icdai.org/jianye.html", "dblp": "260/0674;;13/1898-34;36/9257.html;;21/7664.html", "google_scholar": ";https://scholar.google.com/citations?hl=en;KWZG_YsAAAAJ;https://scholar.google.ca/citations?user=od00FfIAAAAJ;;", "orcid": ";;0000-0002-0267-3749;;;0000-0002-0422-8235", "linkedin": ";;;wulong-liu-28006155/;;", "or_profile": "~Yao_Mu1;~Yuzheng_Zhuang1;~Bin_Wang12;~Wulong_Liu1;~Shengbo_Eben_Li1;~Jianye_HAO1", "aff": "Tsinghua University;Huawei Technologies Ltd.;Huawei Noah's Ark Lab;Huawei Noah's Ark Lab;Tsinghua University;Tianjin University", "aff_domain": "tsinghua.edu.cn;huawei.com;huawei.com;huawei.com;tsinghua.edu.cn;tju.edu.cn", "position": "MS student;Research Engineer;Senior Researcher;Researcher;Full Professor;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Hpxrls8yAn", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "4;3;5;3", "wc_review": "858;348;379;481", "wc_reply_reviewers": "170;0;0;0", "wc_reply_authors": "999;829;688;134", "reply_reviewers": "1;0;0;0", "reply_authors": "3;2;2;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 516.5, 203.2123273819775 ], "wc_reply_reviewers_avg": [ 42.5, 73.61215932167728 ], "wc_reply_authors_avg": [ 662.5, 324.39058247735863 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.25482359571881275, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:qYDtIF7dduUJ:scholar.google.com/&scioq=Robust+Memory+Augmentation+by+Constrained+Latent+Imagination&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;1;0;2", "aff_unique_norm": "Tsinghua University;Huawei;Tianjin University", "aff_unique_dep": ";Huawei Technologies;", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.huawei.com;http://www.tju.edu.cn", "aff_unique_abbr": "THU;Huawei;TJU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "China" }, { "id": "Hr-cI3LMKb8", "title": "Leveraging affinity cycle consistency to isolate factors of variation in learned representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Identifying the dominant factors of variation across a dataset is a central goal of representation learning. Generative approaches lead to descriptions that are rich enough to recreate the data, but often only a partial description is needed to complete downstream tasks or to gain insights about the dataset. In this work, we operate in the setting where limited information is known about the data in the form of groupings, or set membership, and the task is to learn representations which isolate the factors of variation that are common across the groupings. Our key insight is the use of affinity cycle consistency (ACC) between the learned embeddings of images belonging to different sets. In contrast to prior work, we demonstrate that ACC can be applied with significantly fewer constraints on the factors of variation, across a remarkably broad range of settings, and without any supervision for half of the data. By curating datasets from Shapes3D, we quantify the effectiveness of ACC through mutual information between the learned representations and the known generative factors. In addition, we demonstrate the applicability of ACC to the tasks of digit style isolation and synthetic-to-real object pose transfer and compare to generative approaches utilizing the same supervision.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/646c22db9a849c4fae7571d69ce69c22c436b945.zip", "author": "Kieran A Murphy;Varun Jampani;Srikumar Ramalingam;Ameesh Makadia", "authorids": "~Kieran_A_Murphy1;~Varun_Jampani2;~Srikumar_Ramalingam2;~Ameesh_Makadia1", "gender": "M;M;;M", "homepage": "https://kieranamurphy.com;https://www.cs.utah.edu/~srikumar/;http://www.ameeshmakadia.com/index.html;https://varunjampani.github.io/", "dblp": "287/4780;17/4216;59/6004;124/2785", "google_scholar": "VC653zEAAAAJ;6m1ptOgAAAAJ;OT1uf7kAAAAJ;1Cv6Sf4AAAAJ", "orcid": "0000-0003-0960-6685;;;", "linkedin": ";srikumar-ramalingam-17728b22/;;", "or_profile": "~Kieran_A_Murphy1;~Srikumar_Ramalingam2;~Ameesh_Makadia1;~Varun_Jampani1", "aff": "Google;Google;Google;Google Research", "aff_domain": "google.com;google.com;google.com;google.com", "position": "AI Resident;Research Scientist;Research Scientist;Researcher", "bibtex": "@misc{\nmurphy2021leveraging,\ntitle={Leveraging affinity cycle consistency to isolate factors of variation in learned representations},\nauthor={Kieran A Murphy and Varun Jampani and Srikumar Ramalingam and Ameesh Makadia},\nyear={2021},\nurl={https://openreview.net/forum?id=Hr-cI3LMKb8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=Hr-cI3LMKb8", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "4;4;3;3", "wc_review": "248;561;577;391", "wc_reply_reviewers": "0;0;0;118", "wc_reply_authors": "316;672;945;756", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;2;2", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 444.25, 134.72448738072822 ], "wc_reply_reviewers_avg": [ 29.5, 51.09549882328188 ], "wc_reply_authors_avg": [ 672.25, 228.21084001422895 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:qVyQljYxNp8J:scholar.google.com/&scioq=Leveraging+affinity+cycle+consistency+to+isolate+factors+of+variation+in+learned+representations&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "Hrtbm8u0RXu", "title": "Provable Memorization via Deep Neural Networks using Sub-linear Parameters", "track": "main", "status": "Reject", "tldr": "", "abstract": "It is known that $\\Theta(N)$ parameters are sufficient for neural networks to memorize arbitrary $N$ input-label pairs. By exploiting depth, we show that $\\Theta(N^{2/3})$ parameters suffice to memorize $N$ pairs, under a mild condition on the separation of input points. In particular, deeper networks (even with width $3$) are shown to memorize more pairs than shallow networks, which also agrees with the recent line of works on the benefits of depth for function approximation. We also provide empirical results that support our theoretical findings.", "keywords": "memorization", "primary_area": "", "supplementary_material": "", "author": "Sejun Park;Jaeho Lee;Chulhee Yun;Jinwoo Shin", "authorids": "~Sejun_Park1;~Jaeho_Lee3;~Chulhee_Yun1;~Jinwoo_Shin1", "gender": ";M;M;M", "homepage": ";https://jaeho-lee.github.io;https://chulheeyun.github.io/;https://sites.google.com/site/mijirim/", "dblp": "155/9882;78/6080-1;138/0148.html;31/7062", "google_scholar": ";t91zoQMAAAAJ;Ukl64ggAAAAJ;https://scholar.google.com.tw/citations?user=m3eDp7kAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Sejun_Park1;~Jaeho_Lee3;~Chulhee_Yun1;~Jinwoo_Shin1", "aff": "Korea University;Korea Advanced Institute of Science & Technology;Massachusetts Institute of Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "korea.ac.kr;kaist.ac.kr;mit.edu;kaist.ac.kr", "position": "Assistant Professor;Postdoc;PhD student;Associate Professor", "bibtex": "@misc{\npark2021provable,\ntitle={Provable Memorization via Deep Neural Networks using Sub-linear Parameters},\nauthor={Sejun Park and Jaeho Lee and Chulhee Yun and Jinwoo Shin},\nyear={2021},\nurl={https://openreview.net/forum?id=Hrtbm8u0RXu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=Hrtbm8u0RXu", "pdf_size": 0, "rating": "5;6;7", "confidence": "4;3;4", "wc_review": "235;698;651", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "781;458;240", "reply_reviewers": "0;0;0", "reply_authors": "2;1;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 528.0, 208.06889884523025 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 493.0, 222.24460998338444 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 43, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5221793781863007565&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;2;1", "aff_unique_norm": "Korea University;Korea Advanced Institute of Science and Technology;Massachusetts Institute of Technology", "aff_unique_dep": ";;", "aff_unique_url": "https://www.korea.ac.kr;https://www.kaist.ac.kr;https://web.mit.edu", "aff_unique_abbr": "KU;KAIST;MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "South Korea;United States" }, { "id": "Hw2Za4N5hy0", "title": "Federated Learning with Decoupled Probabilistic-Weighted Gradient Aggregation", "track": "main", "status": "Reject", "tldr": "", "abstract": " In the federated learning paradigm, multiple mobile clients train local models independently based on datasets generated by edge devices, and the server aggregates parameters/gradients from local models to form a global model. However, existing model aggregation approaches suffer from high bias on both data distribution and parameter distribution for non-IID datasets, which result in severe accuracy drop for increasing number of heterogeneous clients. In this paper, we proposed a novel decoupled probabilistic-weighted gradient aggregation approach called FeDEC for federated learning. The key idea is to optimize gradient parameters and statistical parameters in a decoupled way, and aggregate the parameters from local models with probabilistic weights to deal with the heterogeneity of clients. Since the overall dataset is unaccessible by the central server, we introduce a variational inference method to derive the optimal probabilistic weights to minimize statistical bias. We further prove the convergence bound of the proposed approach. Extensive experiments using mainstream convolutional neural network models based on three federated datasets show that FeDEC significantly outperforms the state-of-the-arts in terms of model accuracy and training efficiency.", "keywords": "Federated Learning;Gradient Aggregation;Variational Inference", "primary_area": "", "supplementary_material": "", "author": "Jian-hui Duan;Wenzhong Li;Sanglu Lu", "authorids": "~Jian-hui_Duan1;~Wenzhong_Li1;~Sanglu_Lu1", "gender": "M;M;F", "homepage": "https://enzoduan.github.io/;https://cs.nju.edu.cn/lwz/;https://cs.nju.edu.cn/58/1e/c2639a153630/page.htm", "dblp": ";98/3150;24/3318", "google_scholar": ";;", "orcid": ";;0000-0003-1467-4519", "linkedin": ";;", "or_profile": "~Jian-hui_Duan1;~Wenzhong_Li1;~Sanglu_Lu1", "aff": "Nanjing University;Nanjing University;Nanjing University", "aff_domain": "nju.edu.cn;nju.edu.cn;nju.edu.cn", "position": "MS student;Full Professor;Full Professor", "bibtex": "@misc{\nduan2021federated,\ntitle={Federated Learning with Decoupled Probabilistic-Weighted Gradient Aggregation},\nauthor={Jian-hui Duan and Wenzhong Li and Sanglu Lu},\nyear={2021},\nurl={https://openreview.net/forum?id=Hw2Za4N5hy0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=Hw2Za4N5hy0", "pdf_size": 0, "rating": "3;3;4;6", "confidence": "3;3;4;4", "wc_review": "1625;792;753;572", "wc_reply_reviewers": "227;0;0;0", "wc_reply_authors": "948;664;422;316", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.0, 1.224744871391589 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 935.5, 406.64511554917266 ], "wc_reply_reviewers_avg": [ 56.75, 98.29388332953378 ], "wc_reply_authors_avg": [ 587.5, 243.36957492669455 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.8164965809277259, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:jO6NaCzBOtwJ:scholar.google.com/&scioq=Federated+Learning+with+Decoupled+Probabilistic-Weighted+Gradient+Aggregation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Nanjing University", "aff_unique_dep": "", "aff_unique_url": "https://www.nju.edu.cn", "aff_unique_abbr": "Nanjing U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "title": "ResNet After All: Neural ODEs and Their Numerical Solution", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2936", "id": "HxzSxSxLOJZ", "poster": "", "openreview": "https://openreview.net/forum?id=HxzSxSxLOJZ", "slides": "https://iclr.cc/virtual/2021/poster/2936", "video": "https://iclr.cc/virtual/2021/poster/2936", "author_site": "Katharina Ott, Prateek Katiyar, Philipp Hennig, Michael Tiemann", "tldr": "", "abstract": "A key appeal of the recently proposed Neural Ordinary Differential Equation (ODE) framework is that it seems to provide a continuous-time extension of discrete residual neural networks. \nAs we show herein, though, trained Neural ODE models actually depend on the specific numerical method used during training.\nIf the trained model is supposed to be a flow generated from an ODE, it should be possible to choose another numerical solver with equal or smaller numerical error without loss of performance.\nWe observe that if training relies on a solver with overly coarse discretization, then testing with another solver of equal or smaller numerical error results in a sharp drop in accuracy. \nIn such cases, the combination of vector field and numerical method cannot be interpreted as a flow generated from an ODE, which arguably poses a fatal breakdown of the Neural ODE concept.\nWe observe, however, that there exists a critical step size beyond which the training yields a valid ODE vector field. \nWe propose a method that monitors the behavior of the ODE solver during training to adapt its step size, aiming to ensure a valid ODE without unnecessarily increasing computational cost.\nWe verify this adaption algorithm on a common bench mark dataset as well as a synthetic dataset. \n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Katharina Ott;Prateek Katiyar;Philipp Hennig;Michael Tiemann", "authorids": "~Katharina_Ott1;prateek.katiyar@de.bosch.com;~Philipp_Hennig1;~Michael_Tiemann1", "gender": "F;;M;M", "homepage": ";;http://mml.inf.uni-tuebingen.de;", "dblp": ";;08/9077;147/4977", "google_scholar": ";;https://scholar.google.de/citations?user=UeG5w08AAAAJ;https://scholar.google.de/citations?", "orcid": ";;0000-0001-7293-6092;", "linkedin": "katharina-ott-b91634171/;;;", "or_profile": "~Katharina_Ott1;prateek.katiyar@de.bosch.com;~Philipp_Hennig1;~Michael_Tiemann1", "aff": "University of Tuebingen;;Max Planck Institute for Intelligent Systems, Max-Planck Institute;Bosch Center for Artificial Intelligence", "aff_domain": "uni-tuebingen.de;;tuebingen.mpg.de;de.bosch.com", "position": "PhD student;;Adjunct Professor;Research scientist", "bibtex": "@inproceedings{\nott2021resnet,\ntitle={ResNet After All: Neural {\\{}ODE{\\}}s and Their Numerical Solution},\nauthor={Katharina Ott and Prateek Katiyar and Philipp Hennig and Michael Tiemann},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=HxzSxSxLOJZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;4;4;3", "wc_review": "926;206;189;742", "wc_reply_reviewers": "610;0;0;103", "wc_reply_authors": "1231;225;291;603", "reply_reviewers": "2;0;0;1", "reply_authors": "4;1;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 515.75, 324.8864224617582 ], "wc_reply_reviewers_avg": [ 178.25, 252.79277580658828 ], "wc_reply_authors_avg": [ 587.5, 398.010992310514 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 23, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 45, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3301466238062394018&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=HxzSxSxLOJZ", "email": "uni-tuebingen.de;;tuebingen.mpg.de;de.bosch.com", "author_num": 4, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Tuebingen;Max Planck Institute for Intelligent Systems;Bosch Center for Artificial Intelligence", "aff_unique_dep": ";Intelligent Systems;Center for Artificial Intelligence", "aff_unique_url": "https://www.uni-tuebingen.de/;https://www.mpi-is.mpg.de;https://www.bosch-ai.com", "aff_unique_abbr": "Uni T\u00fcbingen;MPI-IS;BCAI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Germany" }, { "id": "Hy8JM_Fvt5N", "title": "Less bits is more: How pruning deep binary networks increases weight capacity", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Binary networks are extremely efficient as multiplications and additions are replaced by bit shifts. Yet, binarizing network models reduces their representational power: a binary weight with a value of -1 or +1 cannot represent as much information as a real weight. We make the observation that pruning weights adds the value 0 as an additional symbol and thus increases the information capacity of the network. This increases the solution space of our network -- more network configurations are possible. Thus far, all hypothesis are considered equally likely. Yet, given that the network is binary, by assuming a Bernoulli prior over the weights, we restricts the hypothesis space to only the ones that can be effectively encoded in a binary network. We show that this view leads to maximizing the information capacity over the binary weights. In this work we propose to jointly prune binary weights and maximize the information capacity, thus finding a subnetwork that performs better than the original network. On 3 datasets and 11 architectures we show compact models with good accuracy comparing favorably to state-of-the-art. \n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Yunqiang Li;Silvia Laura Pintea;Jan van Gemert", "authorids": "~Yunqiang_Li1;~Silvia_Laura_Pintea1;~Jan_van_Gemert1", "gender": "M;Not Specified;M", "homepage": "https://www.tudelft.nl/ewi/over-de-faculteit/afdelingen/intelligent-systems/pattern-recognition-bioinformatics/computer-vision-lab/people/yunqiang-li/;https://silvialaurapintea.github.io/;https://jvgemert.github.io/", "dblp": "205/3383;150/4232;25/3153", "google_scholar": "qrGix2AAAAAJ;shTkx9EAAAAJ;JUdMRGcAAAAJ", "orcid": ";;0000-0002-3913-2786", "linkedin": ";;jan-van-gemert-1628b94/", "or_profile": "~Yunqiang_Li1;~Silvia_Laura_Pintea1;~Jan_C_van_Gemert1", "aff": "Delft University of Technology;Delft University of Technology;Delft University of Technology", "aff_domain": "tudelft.nl;tudelft.nl;tudelft.nl", "position": "PhD student;Researcher;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=Hy8JM_Fvt5N", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14702034311206465846&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Delft University of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.tudelft.nl", "aff_unique_abbr": "TU Delft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Netherlands" }, { "id": "I-VfjSBzi36", "title": "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep, heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks. However, their high model complexity requires enormous computation resources and extremely long training time for both pre-training and fine-tuning. Many works have studied model compression on large NLP models, but only focus on reducing inference cost/time, while still requiring expensive training process. Other works use extremely large batch sizes to shorten the pre-training time at the expense of high demand for computation resources. In this paper, inspired by the Early-Bird Lottery Tickets studied for computer vision tasks, we propose EarlyBERT, a general computationally-efficient training algorithm applicable to both pre-training and fine-tuning of large-scale language models. We are the first to identify structured winning tickets in the early stage of BERT training, and use them for efficient training. Comprehensive pre-training and fine-tuning experiments on GLUE and SQuAD downstream tasks show that EarlyBERT easily achieves comparable performance to standard BERT with 35~45% less training time.", "keywords": "Natural Language Processing;Lottery Tickets Hypothesis;Efficient Training", "primary_area": "", "supplementary_material": "", "author": "Xiaohan Chen;Yu Cheng;Shuohang Wang;Zhe Gan;Zhangyang Wang;Jingjing Liu", "authorids": "~Xiaohan_Chen1;~Yu_Cheng1;~Shuohang_Wang1;~Zhe_Gan1;~Zhangyang_Wang1;~Jingjing_Liu2", "gender": "M;M;M;M;M;", "homepage": "http://xiaohanchen.com;https://ych133.github.io;;http://zhegan27.github.io/;https://vita-group.github.io;https://air.tsinghua.edu.cn/en/info/1046/1194.htm#:~:text=Jingjing%20Liu%20is%20Professor%2C%20Principal,CVPR%2C%20ACL%2C%20etc.)", "dblp": "94/3802;96/3060-1.html;173/5469.html;41/7845;119/4026;30/3008-1", "google_scholar": "https://scholar.google.com/citations?authuser=1;https://scholar.google.com/citations?hl=en;mN-IO6wAAAAJ;E64XWyMAAAAJ;pxFyKAIAAAAJ;BzJ_GboAAAAJ", "orcid": "0000-0002-0360-0402;;;;;", "linkedin": "xiaohan-chen-400b00147/;chengyu05/;;zhe-gan-a2229a78/;;jingjing-liu-65703431/", "or_profile": "~Xiaohan_Chen1;~Yu_Cheng1;~Shuohang_Wang1;~Zhe_Gan1;~Zhangyang_Wang1;~Jingjing_Liu2", "aff": "University of Texas, Austin;Microsoft Research;Microsoft;Microsoft;University of Texas, Austin;Microsoft", "aff_domain": "utexas.edu;microsoft.com;microsoft.com;microsoft.com;utexas.edu;microsoft.com", "position": "PhD student;Principal Researcher;Researcher;Principal Researcher;Assistant Professor;Sr Principal Research Manager", "bibtex": "@misc{\nchen2021earlybert,\ntitle={Early{\\{}BERT{\\}}: Efficient {\\{}BERT{\\}} Training via Early-bird Lottery Tickets},\nauthor={Xiaohan Chen and Yu Cheng and Shuohang Wang and Zhe Gan and Zhangyang Wang and Jingjing Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=I-VfjSBzi36}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=I-VfjSBzi36", "pdf_size": 0, "rating": "3;5;5;6;7", "confidence": "4;4;3;4;3", "wc_review": "390;765;409;387;458", "wc_reply_reviewers": "255;0;0;0;0", "wc_reply_authors": "1352;1135;1083;823;598", "reply_reviewers": "1;0;0;0;0", "reply_authors": "3;2;3;1;2", "rating_avg": [ 5.2, 1.32664991614216 ], "confidence_avg": [ 3.6, 0.4898979485566356 ], "wc_review_avg": [ 481.8, 143.86299037626043 ], "wc_reply_reviewers_avg": [ 51.0, 102.0 ], "wc_reply_authors_avg": [ 998.2, 261.5090055810698 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 2.2, 0.7483314773547882 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.492365963917331, "gs_citation": 101, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7718345183696994473&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "aff_unique_index": "0;1;1;1;0;1", "aff_unique_norm": "University of Texas at Austin;Microsoft", "aff_unique_dep": ";Microsoft Research", "aff_unique_url": "https://www.utexas.edu;https://www.microsoft.com/en-us/research", "aff_unique_abbr": "UT Austin;MSR", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Austin;", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "I3xhgVtNC5t", "title": "Wasserstein Distributionally Robust Optimization: A Three-Player Game Framework", "track": "main", "status": "Reject", "tldr": "", "abstract": "Wasserstein distributionally robust optimization (DRO) has recently received significant attention in machine learning due to its connection to generalization, robustness and regularization. Existing methods only consider a limited class of loss functions or apply to small values of robustness. In this paper, we present a three-player game framework for solving Wasserstein DRO problem with arbitrary level of robustness, which can handle general loss functions. Specifically, we formulate a min-max game between three players who optimize over probability measures, model parameters and Lagrange multipliers. We also propose new algorithms for finding an equilibrium of the game in convex and non-convex settings which both enjoy provable convergence guarantees. Furthermore, we prove an excess risk bound for the proposed algorithms which shows that the solution returned by the algorithms closely achieves the optimal minimax risk. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Zhuozhuo Tu;Shan You;Tao Huang;Dacheng Tao", "authorids": "~Zhuozhuo_Tu1;~Shan_You3;~Tao_Huang5;~Dacheng_Tao1", "gender": "M;M;M;", "homepage": ";https://shanyou92.github.io/;https://taohuang.info;", "dblp": "230/4649;179/2548;34/808-20;", "google_scholar": ";https://scholar.google.com/citations?hl=en;jkcRdBgAAAAJ;", "orcid": ";0000-0003-1964-0430;;", "linkedin": ";;;", "or_profile": "~Zhuozhuo_Tu1;~Shan_You3;~Tao_Huang5;~Dacheng_Tao1", "aff": ";Tsinghua University;SenseTime Research;", "aff_domain": ";tsinghua.edu.cn;sensetime.com;", "position": ";Postdoc;Researcher;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer5;AnonReviewer2", "site": "https://openreview.net/forum?id=I3xhgVtNC5t", "pdf_size": 0, "rating": "4;5;5;5;6", "confidence": "3;3;4;3;3", "wc_review": "263;410;196;491;106", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "455;622;145;590;228", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.0, 0.6324555320336759 ], "confidence_avg": [ 3.2, 0.39999999999999997 ], "wc_review_avg": [ 293.2, 140.1219468891294 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 408.0, 191.15334158732355 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:f54TewUJ-A8J:scholar.google.com/&scioq=Wasserstein+Distributionally+Robust+Optimization:+A+Three-Player+Game+Framework&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Tsinghua University;SenseTime", "aff_unique_dep": ";SenseTime Research", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.sensetime.com", "aff_unique_abbr": "THU;SenseTime", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "I3zV6igAT9", "title": "Quantile Regularization : Towards Implicit Calibration of Regression Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent works have shown that most deep learning models are often poorly calibrated, i.e., they may produce overconfident\npredictions that are wrong, implying that their uncertainty estimates are unreliable. While a number of approaches have been proposed recently to calibrate classification models, relatively little work exists on calibrating regression models. Isotonic Regression has recently been advocated for regression calibration. We provide a detailed formal analysis of the \\emph{side-effects} of Isotonic Regression when used for regression calibration. To address this, we recast quantile calibration as entropy estimation, and leverage this idea to construct a novel quantile regularizer, which can be used in any optimization based probabilisitc regression models. Unlike most of the existing approaches for calibrating regression models, which are based on \\emph{post-hoc} processing of the model's output, and require an additional dataset, our method is trainable in an end-to-end fashion, without requiring an additional dataset. We provide empirical results demonstrating that our approach improves calibration for regression models trained on diverse architectures that provide uncertainty estimates, such as Dropout VI, Deep Ensembles", "keywords": "Calibration;Reliable Uncertainty Quantification;Probabilistic Deep Learning", "primary_area": "", "supplementary_material": "/attachment/d2e022b8bc98a07045665b395de6fce07fadde5c.zip", "author": "Saiteja Utpala;Piyush Rai", "authorids": "~Saiteja_Utpala1;~Piyush_Rai1", "gender": "M;M", "homepage": ";http://cse.iitk.ac.in/users/piyush/", "dblp": ";02/525", "google_scholar": ";https://scholar.google.com.tw/citations?user=D50grEgAAAAJ", "orcid": ";", "linkedin": "saiteja-utpala/;", "or_profile": "~Saiteja_Utpala1;~Piyush_Rai1", "aff": ";IIT Kanpur, IIT Kanpur", "aff_domain": ";cse.iitk.ac.in", "position": ";Associate Professor", "bibtex": "@misc{\nutpala2021quantile,\ntitle={Quantile Regularization : Towards Implicit Calibration of Regression Models},\nauthor={Saiteja Utpala and Piyush Rai},\nyear={2021},\nurl={https://openreview.net/forum?id=I3zV6igAT9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer5", "site": "https://openreview.net/forum?id=I3zV6igAT9", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;3;4;4", "wc_review": "496;467;488;890", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "590;1209;473;801", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 585.25, 176.26595672449062 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 768.25, 280.3028496109164 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11098413964431478324&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Indian Institute of Technology Kanpur", "aff_unique_dep": "", "aff_unique_url": "https://www.iitk.ac.in", "aff_unique_abbr": "IITK", "aff_campus_unique_index": "0", "aff_campus_unique": "Kanpur", "aff_country_unique_index": "0", "aff_country_unique": "India" }, { "title": "Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3133", "id": "I4c4K9vBNny", "poster": "", "openreview": "https://openreview.net/forum?id=I4c4K9vBNny", "slides": "https://iclr.cc/virtual/2021/poster/3133", "video": "https://iclr.cc/virtual/2021/poster/3133", "author_site": "\u0110or\u0111e Miladinovi\u0107, Aleksandar Stani\u0107, Stefan Bauer, J\u00fcrgen Schmidhuber, Joachim M Buhmann", "tldr": "", "abstract": "How to improve generative modeling by better exploiting spatial regularities and coherence in images? We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs). In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way, using a sequential gating-based mechanism that distributes contextual information across 2-D space. We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation over baseline convolutional architectures and the state-of-the-art among the models within the same class. Furthermore, we demonstrate that SDN can be applied to large images by synthesizing samples of high quality and coherence. In a vanilla VAE setting, we find that a powerful SDN decoder also improves learning disentangled representations, indicating that neural architectures play an important role in this task. Our results suggest favoring spatial dependency over convolutional layers in various VAE settings. The accompanying source code is given at https://github.com/djordjemila/sdn.", "keywords": "Neural networks;Deep generative models;Image Modeling;Variational Autoencoders", "primary_area": "", "supplementary_material": "/attachment/dee0e44dce5ac919aa218b7c2b42650a2b281a5b.zip", "author": "\u0110or\u0111e Miladinovi\u0107;Aleksandar Stani\u0107;Stefan Bauer;J\u00fcrgen Schmidhuber;Joachim M. Buhmann", "authorids": "~\u0110or\u0111e_Miladinovi\u01071;~Aleksandar_Stani\u01071;~Stefan_Bauer1;~J\u00fcrgen_Schmidhuber1;~Joachim_M._Buhmann1", "gender": ";M;M;M;M", "homepage": "https://cifar.ca/bios/stefan-bauer/;http://people.idsia.ch/~juergen/;https://ise.ethz.ch;http://astanic.github.io/;", "dblp": ";s/JurgenSchmidhuber;b/JMBuhmann;180/5949;209/4947", "google_scholar": "O-oICE8AAAAJ;https://scholar.google.ch/citations?user=gLnCTgIAAAAJ;https://scholar.google.ch/citations?user=zQWbCzYAAAAJ;tx0opKcAAAAJ;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Stefan_Bauer1;~J\u00fcrgen_Schmidhuber1;~Joachim_M._Buhmann1;~Aleksandar_Stanic1;~Djordje_Miladinovic1", "aff": "Max Planck Institute for Intelligent Systems, Max-Planck Institute;IDSIA;Department of Computer Science, ETHZ - ETH Zurich;The Swiss AI Lab - IDSIA;Swiss Federal Institute of Technology", "aff_domain": "tuebingen.mpg.de;idsia.ch;inf.ethz.ch;idsia.ch;ethz.ch", "position": "Research Group Leader;Scientific Director;Professor;PhD student;PhD student", "bibtex": "@inproceedings{\nmiladinovi{\\'c}2021spatial,\ntitle={Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling},\nauthor={{\\DJ}or{\\dj}e Miladinovi{\\'c} and Aleksandar Stani{\\'c} and Stefan Bauer and J{\\\"u}rgen Schmidhuber and Joachim M. Buhmann},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=I4c4K9vBNny}\n}", "github": "[![github](/images/github_icon.svg) djordjemila/sdn](https://github.com/djordjemila/sdn)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "3;5;3;3", "wc_review": "428;1462;241;524", "wc_reply_reviewers": "0;0;0;40", "wc_reply_authors": "577;760;472;753", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 663.75, 471.97159607332304 ], "wc_reply_reviewers_avg": [ 10.0, 17.320508075688775 ], "wc_reply_authors_avg": [ 640.5, 121.82056476638088 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4211572628480421542&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=I4c4K9vBNny", "email": "tuebingen.mpg.de;idsia.ch;inf.ethz.ch;idsia.ch;ethz.ch", "author_num": 5, "aff_unique_index": "0;1;2;3;4", "aff_unique_norm": "Max Planck Institute for Intelligent Systems;Institute of Digital Technologies;ETH Zurich;IDSIA;Swiss Federal Institute of Technology", "aff_unique_dep": "Intelligent Systems;;Department of Computer Science;Swiss AI Lab;", "aff_unique_url": "https://www.mpi-is.mpg.de;https://www.idsia.ch;https://www.ethz.ch;https://www.idsia.ch/;https://www.ethz.ch", "aff_unique_abbr": "MPI-IS;IDSIA;ETHZ;IDSIA;ETH Zurich", "aff_campus_unique_index": "1", "aff_campus_unique": ";Zurich", "aff_country_unique_index": "0;1;1;1;1", "aff_country_unique": "Germany;Switzerland" }, { "id": "I4pQCAhSu62", "title": "Balancing Robustness and Sensitivity using Feature Contrastive Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "It is generally believed that robust training of extremely large networks is critical to their success in real-world applications. However, when taken to the extreme, methods that promote robustness can hurt the model\u2019s sensitivity to rare or underrepresented patterns. In this paper, we discuss this trade-off between robustness and sensitivity by introducing two notions: contextual feature utility and contextual feature sensitivity. We propose Feature Contrastive Learning (FCL) that encourages the model to be more sensitive to the features that have higher contextual utility. Empirical results demonstrate that models trained with FCL achieve a better balance of robustness and sensitivity, leading to improved generalization in the presence of noise.", "keywords": "deep learning;non-adversarial robustness;sensitivity;input perturbation;contextual feature utility;contextual feature sensitivity.", "primary_area": "", "supplementary_material": "", "author": "Seungyeon Kim;Daniel Glasner;Srikumar Ramalingam;Cho-Jui Hsieh;Kishore Papineni;Sanjiv Kumar", "authorids": "~Seungyeon_Kim1;~Daniel_Glasner2;~Srikumar_Ramalingam2;~Cho-Jui_Hsieh1;papineni@google.com;~Sanjiv_Kumar1", "gender": ";M;M;M;;", "homepage": "https://www.seungyeon.ai;https://sites.google.com/site/dglasner/;https://www.cs.utah.edu/~srikumar/;http://web.cs.ucla.edu/~chohsieh/index.html;;http://www.sanjivk.com/", "dblp": "74/7997-1.html;28/1971;17/4216;14/2770;;", "google_scholar": "zbcN_QIAAAAJ;w0OodaEAAAAJ;6m1ptOgAAAAJ;Wy89g4IAAAAJ;;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;", "linkedin": ";;srikumar-ramalingam-17728b22/;;;", "or_profile": "~Seungyeon_Kim1;~Daniel_Glasner2;~Srikumar_Ramalingam2;~Cho-Jui_Hsieh1;papineni@google.com;~Sanjiv_Kumar1", "aff": "Google;Google;Google;University of California, Los Angeles;;Google", "aff_domain": "google.com;google.com;google.com;ucla.edu;;google.com", "position": "Researcher;Research Software Engineer;Research Scientist;Assistant Professor;;Research Scientist", "bibtex": "@misc{\nkim2021balancing,\ntitle={Balancing Robustness and Sensitivity using Feature Contrastive Learning},\nauthor={Seungyeon Kim and Daniel Glasner and Srikumar Ramalingam and Cho-Jui Hsieh and Kishore Papineni and Sanjiv Kumar},\nyear={2021},\nurl={https://openreview.net/forum?id=I4pQCAhSu62}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=I4pQCAhSu62", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;2;3;4", "wc_review": "433;110;197;369", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "753;354;124;464", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;2", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 277.25, 129.50748047892833 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 423.75, 226.24143630201786 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.4545454545454545, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13968424612421450812&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0;1;0", "aff_unique_norm": "Google;University of California, Los Angeles", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.ucla.edu", "aff_unique_abbr": "Google;UCLA", "aff_campus_unique_index": "0;0;0;1;0", "aff_campus_unique": "Mountain View;Los Angeles", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "I6-3mg29P6y", "title": "Flatness is a False Friend", "track": "main", "status": "Reject", "tldr": "", "abstract": "Hessian based measures of flatness, such as the trace, Frobenius and spectral norms, have been argued, used and shown to relate to generalisation. In this paper we demonstrate that, for feed-forward neural networks under the cross-entropy loss, low-loss solutions with large neural network weights have small Hessian based measures of flatness. This implies that solutions obtained without L2 regularisation should be less sharp than those with despite generalising worse. We show this to be true for logistic regression, multi-layer perceptrons, simple convolutional, pre-activated and wide residual networks on the MNIST and CIFAR-$100$ datasets. Furthermore, we show that adaptive optimisation algorithms using iterate averaging, on the VGG-$16$ network and CIFAR-$100$ dataset, achieve superior generalisation to SGD but are $30 \\times$ sharper. These theoretical and experimental results further advocate the need to use flatness in conjunction with the weights scale to measure generalisation \\citep{neyshabur2017exploring,dziugaite2017computing}. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Diego Granziol", "authorids": "~Diego_Granziol1", "gender": "M", "homepage": "", "dblp": "", "google_scholar": "https://scholar.google.co.uk/citations?user=-MuqKlIAAAAJ", "orcid": "0000-0003-3169-2081", "linkedin": "", "or_profile": "~Diego_Marco_Granziol1", "aff": "", "aff_domain": "", "position": "", "bibtex": "@misc{\ngranziol2021flatness,\ntitle={Flatness is a False Friend},\nauthor={Diego Granziol},\nyear={2021},\nurl={https://openreview.net/forum?id=I6-3mg29P6y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=I6-3mg29P6y", "pdf_size": 0, "rating": "3;4;6", "confidence": "5;4;3", "wc_review": "334;763;801", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 1.247219128924647 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 632.6666666666666, 211.7582479044336 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.9819805060619659, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7930754213147891024&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3 }, { "id": "I6NRcao1w-X", "title": "Robust Reinforcement Learning using Adversarial Populations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed. The Robust RL formulation tackles this by adding worst-case adversarial noise to the dynamics and constructing the noise distribution as the solution to a zero-sum minimax game. However, existing work on learning solutions to the Robust RL formulation has primarily focused on training a single RL agent against a single adversary. In this work, we demonstrate that using a single adversary does not consistently yield robustness to dynamics variations under standard parametrizations of the adversary; the resulting policy is highly exploitable by new adversaries. We propose a population-based augmentation to the Robust RL formulation in which we randomly initialize a population of adversaries and sample from the population uniformly during training. We empirically validate across a variety of benchmarks that the use of an adversarial population results in a less exploitable, more robust policy. Finally, we demonstrate that this approach provides comparable robustness and generalization as domain randomization on these benchmarks while avoiding a ubiquitous domain randomization failure mode.", "keywords": "Robust Control;Reinforcement Learning;Multiagent Systems", "primary_area": "", "supplementary_material": "", "author": "Eugene Vinitsky;Yuqing du;Kanaad V Parvate;Kathy Jang;Pieter Abbeel;Alexandre Bayen", "authorids": "~Eugene_Vinitsky1;yuqing_du@berkeley.edu;~Kanaad_V_Parvate1;~Kathy_Jang1;~Pieter_Abbeel2;~Alexandre_Bayen2", "gender": "M;;;;M;M", "homepage": "https://eugenevinitsky.github.io;;http://kanaad.me;;https://people.eecs.berkeley.edu/~pabbeel/;https://bayen.berkeley.edu/", "dblp": "207/7772;;;;;", "google_scholar": "6dr5fLEAAAAJ;;;;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ;a5nY-pYAAAAJ", "orcid": ";;;;;", "linkedin": ";;http://linkedin.com/in/kanaad;;;", "or_profile": "~Eugene_Vinitsky1;yuqing_du@berkeley.edu;~Kanaad_V_Parvate1;~Kathy_Jang1;~Pieter_Abbeel2;~Alexandre_Bayen2", "aff": ";;;;Covariant;", "aff_domain": ";;;;covariant.ai;", "position": ";;;;Founder;", "bibtex": "@misc{\nvinitsky2021robust,\ntitle={Robust Reinforcement Learning using Adversarial Populations},\nauthor={Eugene Vinitsky and Yuqing du and Kanaad V Parvate and Kathy Jang and Pieter Abbeel and Alexandre Bayen},\nyear={2021},\nurl={https://openreview.net/forum?id=I6NRcao1w-X}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=I6NRcao1w-X", "pdf_size": 0, "rating": "4;5;5;7", "confidence": "4;4;3;4", "wc_review": "677;343;191;635", "wc_reply_reviewers": "289;0;0;181", "wc_reply_authors": "2240;1409;989;1021", "reply_reviewers": "2;0;0;1", "reply_authors": "4;2;2;2", "rating_avg": [ 5.25, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 461.5, 202.33326468971927 ], "wc_reply_reviewers_avg": [ 117.5, 123.54857344380792 ], "wc_reply_authors_avg": [ 1414.75, 504.3244863180847 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.5, 0.8660254037844386 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 103, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2106426805159897208&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 7, "aff_unique_index": "0", "aff_unique_norm": "Covariant", "aff_unique_dep": "", "aff_unique_url": "", "aff_unique_abbr": "" }, { "id": "I6QHpMdZD5k", "title": "Learning to Solve Nonlinear Partial Differential Equation Systems To Accelerate MOSFET Simulation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Semiconductor device simulation uses numerical analysis, where a set of coupled nonlinear partial differential equations is solved with the iterative Newton-Raphson method. Since an appropriate initial guess to start the Newton-Raphson method is not available, a solution of practical importance with desired boundary conditions cannot be trivially achieved. Instead, several solutions with intermediate boundary conditions should be calculated to address the nonlinearity and introducing intermediate boundary conditions significantly increases the computation time. In order to accelerate the semiconductor device simulation, we propose to use a neural network to learn an approximate solution for desired boundary conditions. With an initial solution sufficiently close to the final one by a trained neural network, computational cost to calculate several unnecessary solutions is significantly reduced. Specifically, a convolutional neural network for MOSFET (Metal-Oxide-Semiconductor Field-Effect Transistor), the most widely used semiconductor device, are trained in a supervised manner to compute the initial solution. Particularly, we propose to consider device grids with varying size and spacing and derive a compact expression of the solution based upon the electrostatic potential. We empirically show that the proposed method accelerates the simulation by more than 12 times. Results from the local linear regression and a fully-connected network are compared and extension to a complex two-dimensional domain is sketched.", "keywords": "Partial differential equation;nonlinear equation;Newton-Raphson method;convolutional neural network", "primary_area": "", "supplementary_material": "/attachment/96d1dfa79ff9c8b9237948693f8ecb122822fbdb.zip", "author": "Seungcheol Han;Jonghyun Choi;Sung-Min Hong", "authorids": "~Seungcheol_Han1;~Jonghyun_Choi1;~Sung-Min_Hong1", "gender": "M;M;", "homepage": "https://sites.google.com/view/gist-sdsl/members/students;https://ppolon.github.io/;https://sites.google.com/view/gist-sdsl/", "dblp": ";21/11103;", "google_scholar": ";uiGWnm4AAAAJ;", "orcid": ";0000-0002-7934-8434;", "linkedin": ";jonghyun-choi-459bb615/;", "or_profile": "~Seungcheol_Han1;~Jonghyun_Choi1;~Sung-Min_Hong1", "aff": "Gwangju Institute of Science and Technology;NAVER;Gwangju Institute of Science and Technology", "aff_domain": "gist.ac.kr;navercorp.com;gist.ac.kr", "position": "MS student;AI Advisor Committee;Associate Professor", "bibtex": "@misc{\nhan2021learning,\ntitle={Learning to Solve Nonlinear Partial Differential Equation Systems To Accelerate {\\{}MOSFET{\\}} Simulation},\nauthor={Seungcheol Han and Jonghyun Choi and Sung-Min Hong},\nyear={2021},\nurl={https://openreview.net/forum?id=I6QHpMdZD5k}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer5;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=I6QHpMdZD5k", "pdf_size": 0, "rating": "4;5;5;6;7", "confidence": "4;2;5;4;4", "wc_review": "425;155;231;359;90", "wc_reply_reviewers": "154;0;38;113;0", "wc_reply_authors": "717;335;535;646;90", "reply_reviewers": "1;0;1;1;0", "reply_authors": "3;3;3;3;1", "rating_avg": [ 5.4, 1.0198039027185568 ], "confidence_avg": [ 3.8, 0.9797958971132712 ], "wc_review_avg": [ 252.0, 124.47650380694341 ], "wc_reply_reviewers_avg": [ 61.0, 62.167515633166495 ], "wc_reply_authors_avg": [ 464.6, 227.45953486279708 ], "reply_reviewers_avg": [ 0.6, 0.48989794855663565 ], "reply_authors_avg": [ 2.6, 0.8 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.08006407690254358, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:q1K8XYqehp0J:scholar.google.com/&scioq=Learning+to+Solve+Nonlinear+Partial+Differential+Equation+Systems+To+Accelerate+MOSFET+Simulation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Gwangju Institute of Science and Technology;NAVER Corporation", "aff_unique_dep": ";", "aff_unique_url": "https://www.gist.ac.kr;https://www.naver.com", "aff_unique_abbr": "GIST;NAVER", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Gwangju;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "I8nahMfPixC", "title": "ADD-Defense: Towards Defending Widespread Adversarial Examples via Perturbation-Invariant Representation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Due to vulnerability of machine learning algorithms under adversarial examples, it is challenging to defend against them. Recently, various defenses have been proposed to mitigate negative effects of adversarial examples generated from known attacks. However, these methods have obvious limitations against unknown attacks. Cognitive science investigates that the brain can recognize the same person with any expression by extracting invariant information on the face. Similarly, different adversarial examples share the invariant information retained from original examples. Motivated by this observation, we propose a defense framework ADD-Defense, which extracts the invariant information called \\textit{perturbation-invariant representation} (PIR) to defend against widespread adversarial examples. Specifically, realized by adversarial training with additional ability to utilize perturbation-specific information, the PIR is invariant to known attacks and has no perturbation-specific information. Facing the imbalance between widespread unknown attacks and limited known attacks, the PIR is expected to generalize well on unknown attacks via being matched to a Gaussian prior distribution. In this way, the PIR is invariant to both known and unknown attacks. Once the PIR is learned, we can generate an example without malicious perturbations as the output. We evaluate our ADD-Defense using various pixel-constrained and spatially-constrained attacks, especially BPDA and AutoAttack. The empirical results illustrate that our ADD-Defense is robust to widespread adversarial examples.", "keywords": "defense framework;widespread adversarial examples;perturbation-invariant representation;adversarial learning", "primary_area": "", "supplementary_material": "", "author": "Dawei Zhou;Tongliang Liu;Bo Han;Nannan Wang;Xinbo Gao", "authorids": "~Dawei_Zhou3;~Tongliang_Liu1;~Bo_Han1;~Nannan_Wang1;~Xinbo_Gao3", "gender": "M;M;M;M;M", "homepage": "https://tongliang-liu.github.io/;;http://see.xidian.edu.cn/faculty/xbgao/;https://bhanml.github.io/;", "dblp": "150/6667;10/8359-1;;241/0472-3;39/3130-4", "google_scholar": "https://scholar.google.com.au/citations?user=EiLdZ_YAAAAJ;SRBn7oUAAAAJ;;nTNjqHwAAAAJ;https://scholar.google.com.hk/citations?user=7H-LIigAAAAJ", "orcid": ";;0000-0003-1443-0776;;0000-0002-0694-3603", "linkedin": ";;;;", "or_profile": "~Tongliang_Liu1;~Nannan_Wang1;~Xinbo_Gao3;~bo_han2;~Zhou_Dawei1", "aff": "University of Sydney;Xidian University;Xidian University;RIKEN;", "aff_domain": "sydney.edu.au;xidian.edu.cn;xidian.edu.cn;riken.jp;", "position": "Lecturer;Full Professor;Full Professor;Adjunct Scientist;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=I8nahMfPixC", "pdf_size": 0, "rating": "2;3;6;7", "confidence": "5;5;4;5", "wc_review": "519;334;353;360", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 2.0615528128088303 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 391.5, 74.22432215924913 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.42008402520840293, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:rRL_SyHveK8J:scholar.google.com/&scioq=ADD-Defense:+Towards+Defending+Widespread+Adversarial+Examples+via+Perturbation-Invariant+Representation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "University of Sydney;Xidian University;RIKEN", "aff_unique_dep": ";;", "aff_unique_url": "https://www.sydney.edu.au;http://www.xidian.edu.cn/;https://www.riken.jp", "aff_unique_abbr": "USYD;Xidian;RIKEN", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;2", "aff_country_unique": "Australia;China;Japan" }, { "title": "Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3093", "id": "IDFQI9OY6K", "poster": "", "openreview": "https://openreview.net/forum?id=IDFQI9OY6K", "slides": "https://iclr.cc/virtual/2021/poster/3093", "video": "https://iclr.cc/virtual/2021/poster/3093", "author_site": "Benedikt Boecking, Willie Neiswanger, Eric P Xing, Artur Dubrawski", "tldr": "", "abstract": "Obtaining large annotated datasets is critical for training successful machine learning models and it is often a bottleneck in practice. Weak supervision offers a promising alternative for producing labeled datasets without ground truth annotations by generating probabilistic labels using multiple noisy heuristics. This process can scale to large datasets and has demonstrated state of the art performance in diverse domains such as healthcare and e-commerce. One practical issue with learning from user-generated heuristics is that their creation requires creativity, foresight, and domain expertise from those who hand-craft them, a process which can be tedious and subjective. We develop the first framework for interactive weak supervision in which a method proposes heuristics and learns from user feedback given on each proposed heuristic. Our experiments demonstrate that only a small number of feedback iterations are needed to train models that achieve highly competitive test set performance without access to ground truth training labels. We conduct user studies, which show that users are able to effectively provide feedback on heuristics and that test set results track the performance of simulated oracles.", "keywords": "weak supervision;data programming;data labeling;active learning", "primary_area": "", "supplementary_material": "", "author": "Benedikt Boecking;Willie Neiswanger;Eric Xing;Artur Dubrawski", "authorids": "~Benedikt_Boecking1;~Willie_Neiswanger2;~Eric_Xing1;~Artur_Dubrawski2", "gender": "M;M;M;M", "homepage": "http://www.cs.cmu.edu/~boecking/;https://willieneis.github.io/;http://www.cs.cmu.edu/~epxing/;https://www.autonlab.org", "dblp": "146/0168;120/7593.html;36/3855;76/48", "google_scholar": "wNtfa1wAAAAJ;QwKHApEAAAAJ;https://scholar.google.com.tw/citations?user=5pKTRxEAAAAJ;O3gezzcAAAAJ", "orcid": ";;;0000-0002-2372-0831", "linkedin": ";;;artur-dubrawski-33a2a87/", "or_profile": "~Benedikt_Boecking1;~Willie_Neiswanger2;~Eric_Xing1;~Artur_Dubrawski2", "aff": "Carnegie Mellon University;Stanford University;School of Computer Science, Carnegie Mellon University;Carnegie Mellon University", "aff_domain": "cmu.edu;stanford.edu;cs.cmu.edu;cmu.edu", "position": "PhD student;Postdoc;Full Professor;Research Professor", "bibtex": "@inproceedings{\nboecking2021interactive,\ntitle={Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling},\nauthor={Benedikt Boecking and Willie Neiswanger and Eric Xing and Artur Dubrawski},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=IDFQI9OY6K}\n}", "github": "[![github](/images/github_icon.svg) benbo/interactive-weak-supervision](https://github.com/benbo/interactive-weak-supervision)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "3;4;4;4", "wc_review": "366;322;569;281", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1027;741;1125;451", "reply_reviewers": "0;0;0;0", "reply_authors": "3;2;3;2", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 384.5, 110.68084748500979 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 836.0, 263.2736219221364 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.5, 0.5 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 97, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15628651718896902730&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=IDFQI9OY6K", "email": "cmu.edu;stanford.edu;cs.cmu.edu;cmu.edu", "author_num": 4, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Carnegie Mellon University;Stanford University", "aff_unique_dep": ";", "aff_unique_url": "https://www.cmu.edu;https://www.stanford.edu", "aff_unique_abbr": "CMU;Stanford", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Stanford;Pittsburgh", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Distance-Based Regularisation of Deep Networks for Fine-Tuning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3044", "id": "IFqrg1p5Bc", "poster": "", "openreview": "https://openreview.net/forum?id=IFqrg1p5Bc", "slides": "https://iclr.cc/virtual/2021/poster/3044", "video": "https://iclr.cc/virtual/2021/poster/3044", "author_site": "Henry Gouk, Timothy Hospedales, massimiliano pontil", "tldr": "", "abstract": "We investigate approaches to regularisation during fine-tuning of deep neural networks. First we provide a neural network generalisation bound based on Rademacher complexity that uses the distance the weights have moved from their initial values. This bound has no direct dependence on the number of weights and compares favourably to other bounds when applied to convolutional networks. Our bound is highly relevant for fine-tuning, because providing a network with a good initialisation based on transfer learning means that learning can modify the weights less, and hence achieve tighter generalisation. Inspired by this, we develop a simple yet effective fine-tuning algorithm that constrains the hypothesis class to a small sphere centred on the initial pre-trained weights, thus obtaining provably better generalisation performance than conventional transfer learning. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art fine-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space.", "keywords": "Deep Learning;Transfer Learning;Statistical Learning Theory", "primary_area": "", "supplementary_material": "/attachment/4de6dea60f6ec55f5332c9b35db80adc84753e13.zip", "author": "Henry Gouk;Timothy Hospedales;Massimiliano Pontil", "authorids": "~Henry_Gouk1;~Timothy_Hospedales1;~Massimiliano_Pontil4", "gender": "M;M;Not Specified", "homepage": "https://www.henrygouk.com;http://homepages.inf.ed.ac.uk/thospeda/;https://www.iit.it/web/computational-statistics-and-machine-learning", "dblp": "172/0943;32/3545;", "google_scholar": "https://scholar.google.co.nz/citations?user=i1bzlyAAAAAJ;https://scholar.google.fr/citations?user=nHhtvqkAAAAJ;lcOacs8AAAAJ", "orcid": ";0000-0003-4867-7486;0000-0001-9415-098X", "linkedin": ";timothyhospedales/;", "or_profile": "~Henry_Gouk1;~Timothy_Hospedales1;~Massimiliano_Pontil4", "aff": "University of Edinburgh;Samsung AI Research Centre;University College London, University of London", "aff_domain": "ed.ac.uk;samsung.com;ucl.ac.uk", "position": "Postdoc;Principal Researcher;Full Professor", "bibtex": "@inproceedings{\ngouk2021distancebased,\ntitle={Distance-Based Regularisation of Deep Networks for Fine-Tuning},\nauthor={Henry Gouk and Timothy Hospedales and Massimiliano Pontil},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=IFqrg1p5Bc}\n}", "github": "[![github](/images/github_icon.svg) henrygouk/mars-finetuning](https://github.com/henrygouk/mars-finetuning)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;4;4;4", "wc_review": "880;289;384;294", "wc_reply_reviewers": "25;411;0;13", "wc_reply_authors": "456;494;184;226", "reply_reviewers": "1;3;0;1", "reply_authors": "2;3;1;2", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 461.75, 244.41805886636118 ], "wc_reply_reviewers_avg": [ 112.25, 172.70983614143117 ], "wc_reply_authors_avg": [ 340.0, 136.47710430691296 ], "reply_reviewers_avg": [ 1.25, 1.0897247358851685 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 23, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 64, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16025867309498919322&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=IFqrg1p5Bc", "email": "ed.ac.uk;samsung.com;ucl.ac.uk", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Edinburgh;Samsung;University College London", "aff_unique_dep": ";AI Research;", "aff_unique_url": "https://www.ed.ac.uk;https://www.samsung.com/global/researchers/samsung-ai-research-centre/;https://www.ucl.ac.uk", "aff_unique_abbr": "Edinburgh;SARC;UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United Kingdom;South Korea" }, { "id": "IG3jEGLN0jd", "title": "Contrastive estimation reveals topic posterior information to linear models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Contrastive learning is an approach to representation learning that utilizes naturally occurring similar and dissimilar pairs of data points to find useful embeddings of data. In the context of document classification under topic modeling assumptions, we prove that contrastive learning is capable of recovering a representation of documents that reveals their underlying topic posterior information to linear models. We apply this procedure in a semi-supervised setup and demonstrate empirically that linear classifiers with these representations perform well in document classification tasks with very few training examples.", "keywords": "contrastive learning;self-supervised learning;representation learning;theory", "primary_area": "", "supplementary_material": "/attachment/45fcfe0e1fd58d443d378cf68deb424e3256ac39.zip", "author": "Christopher Tosh;Akshay Krishnamurthy;Daniel Hsu", "authorids": "~Christopher_Tosh1;~Akshay_Krishnamurthy1;~Daniel_Hsu1", "gender": "M;M;M", "homepage": "https://cjtosh.github.io/;https://www.cics.umass.edu/~akshay/;https://www.cs.columbia.edu/~djhsu/", "dblp": "153/5451;85/8024;h/DanielHsu.html", "google_scholar": ";https://scholar.google.com.tw/citations?user=K0kaNvkAAAAJ;Bp6tvy0AAAAJ", "orcid": ";;0000-0002-3495-7113", "linkedin": ";;", "or_profile": "~Christopher_Tosh1;~Akshay_Krishnamurthy1;~Daniel_Hsu1", "aff": "Columbia University;Microsoft Research;Columbia University", "aff_domain": "columbia.edu;research.microsoft.com;columbia.edu", "position": "Postdoc;Principal Researcher;Associate Professor", "bibtex": "@misc{\ntosh2021contrastive,\ntitle={Contrastive estimation reveals topic posterior information to linear models},\nauthor={Christopher Tosh and Akshay Krishnamurthy and Daniel Hsu},\nyear={2021},\nurl={https://openreview.net/forum?id=IG3jEGLN0jd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=IG3jEGLN0jd", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;2;3;4", "wc_review": "706;225;470;361", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;5", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 440.5, 176.15405189776362 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1.25, 2.165063509461097 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.25, 0.4330127018922193 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 84, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14333621839810532047&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;0", "aff_unique_norm": "Columbia University;Microsoft", "aff_unique_dep": ";Microsoft Research", "aff_unique_url": "https://www.columbia.edu;https://www.microsoft.com/en-us/research", "aff_unique_abbr": "Columbia;MSR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "IJxaSrLIbkx", "title": "On Relating \"Why?\" and \"Why Not?\" Explanations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Explanations of Machine Learning (ML) models often address a \u2018Why?\u2019 question. Such explanations can be related with selecting feature-value pairs which are sufficient for the prediction. Recent work has investigated explanations that address\na \u2018Why Not?\u2019 question, i.e. finding a change of feature values that guarantee a change of prediction. Given their goals, these two forms of explaining predictions of ML models appear to be mostly unrelated. However, this paper demonstrates otherwise, and establishes a rigorous formal relationship between \u2018Why?\u2019 and \u2018Why Not?\u2019 explanations. Concretely, the paper proves that, for any given instance, \u2018Why?\u2019 explanations are minimal hitting sets of \u2018Why Not?\u2019 explanations and vice-versa. Furthermore, the paper devises novel algorithms for extracting and enumerating both forms of explanations.\n", "keywords": "Explanability;contrastive explanations;duality", "primary_area": "", "supplementary_material": "", "author": "Alexey Ignatiev;Nina Narodytska;Nicholas Asher;Joao Marques-Silva", "authorids": "~Alexey_Ignatiev1;~Nina_Narodytska1;nicholas.asher@irit.fr;~Joao_Marques-Silva1", "gender": "M;F;;M", "homepage": "https://alexeyignatiev.github.io/;;;http://jpmarquessilva.github.io", "dblp": "26/9729;87/3366;;s/JoaoPMarquesSilva", "google_scholar": "https://scholar.google.pt/citations?user=CkHZ6fMAAAAJ;;;1b9hppwAAAAJ", "orcid": "0000-0002-4535-2902;;;0000-0002-6632-3086", "linkedin": ";;;jpmarquessilva/", "or_profile": "~Alexey_Ignatiev1;~Nina_Narodytska1;nicholas.asher@irit.fr;~Joao_Marques-Silva1", "aff": "Monash University;VMware;;CNRS", "aff_domain": "monash.edu;vmware.com;;cnrs.fr", "position": "Assistant Professor;Researcher;;Senior Researcher (Directeur de Recherche)", "bibtex": "@misc{\nignatiev2021on,\ntitle={On Relating ''Why?'' and ''Why Not?'' Explanations},\nauthor={Alexey Ignatiev and Nina Narodytska and Nicholas Asher and Joao Marques-Silva},\nyear={2021},\nurl={https://openreview.net/forum?id=IJxaSrLIbkx}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=IJxaSrLIbkx", "pdf_size": 0, "rating": "5;5;6;8", "confidence": "2;5;2;3", "wc_review": "274;328;311;139", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "301;347;98;54", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.0, 1.224744871391589 ], "wc_review_avg": [ 263.0, 74.2057949219601 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 200.0, 126.02579101120533 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.16666666666666663, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1;2", "aff_unique_norm": "Monash University;VMware, Inc.;Centre National de la Recherche Scientifique", "aff_unique_dep": ";;", "aff_unique_url": "https://www.monash.edu;https://www.vmware.com;https://www.cnrs.fr", "aff_unique_abbr": "Monash;VMware;CNRS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2", "aff_country_unique": "Australia;United States;France" }, { "id": "IKqCy8i1XL3", "title": "Optimizing Information Bottleneck in Reinforcement Learning: A Stein Variational Approach", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The information bottleneck (IB) principle is an elegant and useful learning framework for extracting relevant information that an input feature contains about the target. The principle has been widely used in supervised and unsupervised learning. In this paper, we investigate the effectiveness of the IB framework in reinforcement learning (RL). We first derive the objective based on IB in reinforcement learning, then we analytically derive the optimal conditional distribution of the optimization problem. Following the variational information bottleneck (VIB), we provide a variational lower bound using a prior distribution. Unlike VIB, we propose to utilize the amortized Stein variational gradient method to optimize the lower bound. We incorporate this framework in two popular RL algorithms: the advantageous actor critic algorithm (A2C) and the proximal policy optimization algorithm (PPO). Our experimental results show that our framework can improve the sample efficiency of vanilla A2C and PPO. We also show that our method achieves better performance than VIB and mutual information neural estimation (MINE), two other popular approaches to optimize the information bottleneck framework in supervised learning.", "keywords": "Information Bottleneck;Reinforcement Learning;Stein Variational Gradient", "primary_area": "", "supplementary_material": "/attachment/bad1702a7df7f8c3d991b0dc4fcbe6858e2c3608.zip", "author": "Pei Yingjun;Hou Xinwen;Li Jian;Lei Wang", "authorids": "~Pei_Yingjun1;xwhou@nlpr.ia.ac.cn;~Li_Jian1;~Lei_Wang22", "gender": "M;;;M", "homepage": ";;;", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": ";;;wanglei-IIIS", "or_profile": "~Pei_Yingjun1;xwhou@nlpr.ia.ac.cn;~Li_Jian1;~Lei_Wang22", "aff": ";;;Tsinghua University", "aff_domain": ";;;mails.tsinghua.edu.cn", "position": ";;;PhD student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=IKqCy8i1XL3", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;4;3;5", "wc_review": "443;429;972;243", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 521.75, 271.67570281495546 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:cYcGHPv3OygJ:scholar.google.com/&scioq=Optimizing+Information+Bottleneck+in+Reinforcement+Learning:+A+Stein+Variational+Approach&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Tsinghua University", "aff_unique_dep": "", "aff_unique_url": "https://www.tsinghua.edu.cn", "aff_unique_abbr": "THU", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "title": "Integrating Categorical Semantics into Unsupervised Domain Translation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2614", "id": "IMPA6MndSXU", "poster": "", "openreview": "https://openreview.net/forum?id=IMPA6MndSXU", "slides": "https://iclr.cc/virtual/2021/poster/2614", "video": "https://iclr.cc/virtual/2021/poster/2614", "author_site": "Samuel Lavoie, Faruk Ahmed, Aaron Courville", "tldr": "", "abstract": "While unsupervised domain translation (UDT) has seen a lot of success recently, we argue that mediating its translation via categorical semantic features could broaden its applicability. In particular, we demonstrate that categorical semantics improves the translation between perceptually different domains sharing multiple object categories. We propose a method to learn, in an unsupervised manner, categorical semantic features (such as object labels) that are invariant of the source and target domains. We show that conditioning the style encoder of unsupervised domain translation methods on the learned categorical semantics leads to a translation preserving the digits on MNIST$\\leftrightarrow$SVHN and to a more realistic stylization on Sketches$\\to$Reals.", "keywords": "Unsupervised Domain Translation;Unsupervised Learning;Image-to-Image Translation;Deep Learning;Representation Learning", "primary_area": "", "supplementary_material": "/attachment/ed2d603e82501fb492cde2c0f1e44204ca7afee6.zip", "author": "Samuel Lavoie-Marchildon;Faruk Ahmed;Aaron Courville", "authorids": "~Samuel_Lavoie-Marchildon1;~Faruk_Ahmed1;~Aaron_Courville3", "gender": "M;M;", "homepage": "http://example.com;;", "dblp": "225/6508;;56/1688", "google_scholar": ";https://scholar.google.ca/citations?user=eo9JtywAAAAJ;https://scholar.google.ca/citations?user=km6CP8cAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Samuel_Lavoie-Marchildon1;~Faruk_Ahmed1;~Aaron_Courville3", "aff": "University of Montreal;University of Montreal;Universit\u00e9 de Montr\u00e9al", "aff_domain": "umontreal.ca;umontreal.ca; ", "position": "PhD student;PhD student;Assistant Professor", "bibtex": "@inproceedings{\nlavoie-marchildon2021integrating,\ntitle={Integrating Categorical Semantics into Unsupervised Domain Translation},\nauthor={Samuel Lavoie-Marchildon and Faruk Ahmed and Aaron Courville},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=IMPA6MndSXU}\n}", "github": "[![github](/images/github_icon.svg) lavoiems/Cats-UDT](https://github.com/lavoiems/Cats-UDT)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "4;7;7;7", "confidence": "4;2;4;3", "wc_review": "165;505;203;122", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "832;184;116;104", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.25, 1.299038105676658 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 248.75, 150.69567843836796 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 309.0, 303.4913507828518 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16605089044349710257&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=IMPA6MndSXU", "email": "umontreal.ca;umontreal.ca; ", "author_num": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of Montreal;Universit\u00e9 de Montr\u00e9al", "aff_unique_dep": ";", "aff_unique_url": "https://wwwumontreal.ca;https://www.umontreal.ca", "aff_unique_abbr": "UM;UdeM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Canada" }, { "title": "Towards Impartial Multi-task Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2894", "id": "IMPnRXEWpvr", "poster": "", "openreview": "https://openreview.net/forum?id=IMPnRXEWpvr", "slides": "https://iclr.cc/virtual/2021/poster/2894", "video": "https://iclr.cc/virtual/2021/poster/2894", "author_site": "Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, Wei Zhang", "tldr": "", "abstract": "Multi-task learning (MTL) has been widely used in representation learning. However, naively training all tasks simultaneously may lead to the partial training issue, where specific tasks are trained more adequately than others. In this paper, we propose to learn multiple tasks impartially. Specifically, for the task-shared parameters, we optimize the scaling factors via a closed-form solution, such that the aggregated gradient (sum of raw gradients weighted by the scaling factors) has equal projections onto individual tasks. For the task-specific parameters, we dynamically weigh the task losses so that all of them are kept at a comparable scale. Further, we find the above gradient balance and loss balance are complementary and thus propose a hybrid balance method to further improve the performance. Our impartial multi-task learning (IMTL) can be end-to-end trained without any heuristic hyper-parameter tuning, and is general to be applied on all kinds of losses without any distribution assumption. Moreover, our IMTL can converge to similar results even when the task losses are designed to have different scales, and thus it is scale-invariant. We extensively evaluate our IMTL on the standard MTL benchmarks including Cityscapes, NYUv2 and CelebA. It outperforms existing loss weighting methods under the same experimental settings.", "keywords": "Multi-task Learning;Impartial Learning;Scene Understanding", "primary_area": "", "supplementary_material": "", "author": "Liyang Liu;Yi Li;Zhanghui Kuang;Jing-Hao Xue;Yimin Chen;Wenming Yang;Qingmin Liao;Wayne Zhang", "authorids": "~Liyang_Liu1;~Yi_Li15;~Zhanghui_Kuang4;~Jing-Hao_Xue1;~Yimin_Chen1;~Wenming_Yang1;~Qingmin_Liao1;~Wayne_Zhang2", "gender": "M;M;;M;M;M;M;M", "homepage": ";https://none.com;http://jeffreykuang.github.io/;http://www.homepages.ucl.ac.uk/~ucakjxu/;;https://www.sigs.tsinghua.edu.cn/ywm_en/main.htm;https://www.sigs.tsinghua.edu.cn/lqm_en/main.htm;http://www.statfe.com", "dblp": "92/9944;;53/1707;72/1980;38/1020;75/2339.html;13/322;239/6045", "google_scholar": "z2KTE6UAAAAJ;qGsK180AAAAJ;z4wkHDgAAAAJ;https://scholar.google.co.uk/citations?user=a6Pul3UAAAAJ;;https://scholar.google.com/citations?hl=zh-CN;;5GtyVooAAAAJ", "orcid": ";;;0000-0003-1174-610X;;0000-0002-2506-1286;0000-0002-7509-3964;0000-0002-8415-1062", "linkedin": ";;;;;;;", "or_profile": "~Liyang_Liu1;~Yi_Li15;~Zhanghui_Kuang4;~Jing-Hao_Xue1;~Yimin_Chen1;~Wenming_Yang1;~Qingmin_Liao1;~Wei_Zhang5", "aff": "Tsinghua University;sensetime;;University College London;;Tsinghua University,;Tsinghua University;SenseTime Research", "aff_domain": "tsinghua.edu.cn;sensetime.com;;ucl.ac.uk;;tsinghua.edu.cn;tsinghua.edu.cn;sensetime.com", "position": "PhD student;Researcher;;Full Professor;;Associate Professor;Full Professor;Research Director", "bibtex": "@inproceedings{\nliu2021towards,\ntitle={Towards Impartial Multi-task Learning},\nauthor={Liyang Liu and Yi Li and Zhanghui Kuang and Jing-Hao Xue and Yimin Chen and Wenming Yang and Qingmin Liao and Wayne Zhang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=IMPnRXEWpvr}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=IMPnRXEWpvr)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "4;5;7", "confidence": "5;5;4", "wc_review": "443;196;620", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "645;525;499", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 1.247219128924647 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 419.6666666666667, 173.88182449263894 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 556.3333333333334, 63.58895781152224 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.944911182523068, "gs_citation": 198, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13360912159321036513&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=IMPnRXEWpvr", "email": "tsinghua.edu.cn;sensetime.com;;ucl.ac.uk;;tsinghua.edu.cn;tsinghua.edu.cn;sensetime.com", "author_num": 8, "aff_unique_index": "0;1;2;0;0;1", "aff_unique_norm": "Tsinghua University;SenseTime;University College London", "aff_unique_dep": ";;", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.sensetime.com;https://www.ucl.ac.uk", "aff_unique_abbr": "THU;SenseTime;UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0;0;0", "aff_country_unique": "China;United Kingdom" }, { "id": "INXUNEmgbnx", "title": "Neural Bayes: A Generic Parameterization Method for Unsupervised Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "We introduce a parameterization method called Neural Bayes which allows computing statistical quantities that are in general difficult to compute and opens avenues for formulating new objectives for unsupervised representation learning. Specifically, given an observed random variable $\\mathbf{x}$ and a latent discrete variable $z$, we can express $p(\\mathbf{x}|z)$, $p(z|\\mathbf{x})$ and $p(z)$ in closed form in terms of a sufficiently expressive function (Eg. neural network) using our parameterization without restricting the class of these distributions. To demonstrate its usefulness, we develop two independent use cases for this parameterization: \n\n1. Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution. This can be seen as a specific form of clustering where each disjoint manifold in the support is a separate cluster. We design clustering tasks that obey this formulation and empirically show that the model optimally labels the disjoint manifolds.\n\n2. Mutual Information Maximization (MIM): MIM has become a popular means for self-supervised representation learning. Neural Bayes allows us to compute mutual information between observed random variables $\\mathbf{x}$ and latent discrete random variables $z$ in closed form. We use this for learning image representations and show its usefulness on downstream classification tasks. ", "keywords": "unsupervised learning;clustering;manifold separation;representation learning;Bayes rule", "primary_area": "", "supplementary_material": "", "author": "Devansh Arpit;Huan Wang;Caiming Xiong;richard socher;Yoshua Bengio", "authorids": "~Devansh_Arpit2;~Huan_Wang1;~Caiming_Xiong1;~richard_socher1;~Yoshua_Bengio1", "gender": "M;M;M;;M", "homepage": ";http://www.cs.yale.edu/homes/wang-huan/;http://cmxiong.com/;http://www.socher.org;http://yoshuabengio.org", "dblp": "120/8494;70/6155-16.html;80/7282;79/128;56/953", "google_scholar": "https://scholar.google.ca/citations?hl=en;7NpTttkAAAAJ;vaSdahkAAAAJ;http://scholar.google.com/citations?user=FaOcyfMAAAAJ;kukA0LcAAAAJ", "orcid": ";;;;", "linkedin": ";huanwangyale/;caiming-xiong-150a1417;;yoshuabengio/?originalSubdomain=ca", "or_profile": "~Devansh_Arpit2;~Huan_Wang1;~Caiming_Xiong1;~richard_socher1;~Yoshua_Bengio1", "aff": "Salesforce Research;Salesforce.com;Salesforce Research;SalesForce.com;University of Montreal", "aff_domain": "salesforce.com;salesforce.com;salesforce.com;salesforce.com;umontreal.ca", "position": "Senior Research Scientist;Researcher;Research Scientist;Chief Scientist;Full Professor", "bibtex": "@misc{\narpit2021neural,\ntitle={Neural Bayes: A Generic Parameterization Method for Unsupervised Learning},\nauthor={Devansh Arpit and Huan Wang and Caiming Xiong and richard socher and Yoshua Bengio},\nyear={2021},\nurl={https://openreview.net/forum?id=INXUNEmgbnx}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=INXUNEmgbnx", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "3;4;4;5", "wc_review": "839;471;504;233", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 511.75, 215.9367673648932 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.7071067811865475, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:uWeN_XOIXCoJ:scholar.google.com/&scioq=Neural+Bayes:+A+Generic+Parameterization+Method+for+Unsupervised+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;1", "aff_unique_norm": "Salesforce;University of Montreal", "aff_unique_dep": "Salesforce Research;", "aff_unique_url": "https://research.salesforce.com;https://wwwumontreal.ca", "aff_unique_abbr": "Salesforce;UM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;1", "aff_country_unique": "United States;Canada" }, { "id": "INhwJdJtxn6", "title": "Coverage as a Principle for Discovering Transferable Behavior in Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Designing agents that acquire knowledge autonomously and use it to solve new tasks efficiently is an important challenge in reinforcement learning. Unsupervised learning provides a useful paradigm for autonomous acquisition of task-agnostic knowledge. In supervised settings, representations discovered through unsupervised pre-training offer important benefits when transferred to downstream tasks. Given the nature of the reinforcement learning problem, we explore how to transfer knowledge through behavior instead of representations. The behavior of pre-trained policies may be used for solving the task at hand (exploitation), as well as for collecting useful data to solve the problem (exploration). We argue that pre-training policies to maximize coverage will result in behavior that is useful for both strategies. When using these policies for both exploitation and exploration, our agents discover solutions that lead to larger returns. The largest gains are generally observed in domains requiring structured exploration, including settings where the behavior of the pre-trained policies is misaligned with the downstream task.", "keywords": "deep reinforcement learning;transfer learning;unsupervised learning;exploration", "primary_area": "", "supplementary_material": "", "author": "V\u00edctor Campos;Pablo Sprechmann;Steven Stenberg Hansen;Andre Barreto;Charles Blundell;Alex Vitvitskyi;Steven Kapturowski;Adria Puigdomenech Badia", "authorids": "~V\u00edctor_Campos1;~Pablo_Sprechmann1;~Steven_Stenberg_Hansen1;~Andre_Barreto1;~Charles_Blundell1;avlife@google.com;~Steven_Kapturowski1;~Adria_Puigdomenech_Badia2", "gender": "M;;M;M;;;;", "homepage": "https://imatge.upc.edu/web/people/victor-campos;;;https://sites.google.com/corp/view/andrebarreto/about;http://www.gatsby.ucl.ac.uk/~ucgtcbl/;;;", "dblp": "98/8044;https://dblp.org/pers/s/Sprechmann:Pablo.html;61/3521;72/953;35/8396;;;", "google_scholar": "8fzVqSkAAAAJ;YCPycGAAAAAJ;hIOEWsEAAAAJ;https://scholar.google.co.uk/citations?user=H-xtdV4AAAAJ;https://scholar.google.co.uk/citations?user=f31mvPsAAAAJ;;;", "orcid": "http://orcid.org/0000-0001-5260-869X;;;;;;;", "linkedin": ";;;;;;stevenkapturowski/;", "or_profile": "~V\u00edctor_Campos1;~Pablo_Sprechmann1;~Steven_Stenberg_Hansen1;~Andre_Barreto1;~Charles_Blundell1;avlife@google.com;~Steven_Kapturowski1;~Adria_Puigdomenech_Badia2", "aff": "Google DeepMind;Google DeepMind;Google DeepMind;Google DeepMind;Google DeepMind;;Google DeepMind;", "aff_domain": "deepmind.com;google.com;google.com;google.com;google.com;;deepmind.com;", "position": "Researcher;Research Scientist;Research Scientist;Research Scientist;Research Scientist;;Staff Research Engineer;", "bibtex": "@misc{\ncampos2021coverage,\ntitle={Coverage as a Principle for Discovering Transferable Behavior in Reinforcement Learning},\nauthor={V{\\'\\i}ctor Campos and Pablo Sprechmann and Steven Stenberg Hansen and Andre Barreto and Charles Blundell and Alex Vitvitskyi and Steven Kapturowski and Adria Puigdomenech Badia},\nyear={2021},\nurl={https://openreview.net/forum?id=INhwJdJtxn6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=INhwJdJtxn6", "pdf_size": 0, "rating": "4;4;5;8", "confidence": "4;4;4;3", "wc_review": "522;485;322;448", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1352;924;789;447", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 5.25, 1.6393596310755 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 444.25, 75.27408252512946 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 878.0, 324.2198328295171 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.9684959969581861, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7073317871392174959&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google DeepMind", "aff_unique_url": "https://deepmind.com", "aff_unique_abbr": "DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "IOqr2ZyXHz1", "title": "Continual Lifelong Causal Effect Inference with Real World Evidence", "track": "main", "status": "Reject", "tldr": "", "abstract": "The era of real world evidence has witnessed an increasing availability of observational data, which much facilitates the development of causal effect inference. Although significant advances have been made to overcome the challenges in causal effect estimation, such as missing counterfactual outcomes and selection bias, they only focus on source-specific and stationary observational data. In this paper, we investigate a new research problem of causal effect inference from incrementally available observational data, and present three new evaluation criteria accordingly, including extensibility, adaptability, and accessibility. We propose a Continual Causal Effect Representation Learning method for estimating causal effect with observational data, which are incrementally available from non-stationary data distributions. Instead of having access to all seen observational data, our method only stores a limited subset of feature representations learned from previous data. Combining the selective and balanced representation learning, feature representation distillation, and feature transformation, our method achieves the continual causal effect estimation for new data without compromising the estimation capability for original data. Extensive experiments demonstrate the significance of continual causal effect inference and the effectiveness of our method.", "keywords": "continual learning;incremental learning;causal effect inference;representation learning;treatment effect estimation", "primary_area": "", "supplementary_material": "/attachment/44fca51f5873b950c33c251fa81efb2e88423bc7.zip", "author": "Zhixuan Chu;Stephen Rathbun;Sheng Li", "authorids": "~Zhixuan_Chu1;rathbun@uga.edu;~Sheng_Li3", "gender": "M;;M", "homepage": ";;http://sheng-li.org", "dblp": "258/1233;;23/3439-1", "google_scholar": "a4IuTngAAAAJ;;DEncVcYAAAAJ", "orcid": ";;0000-0003-1205-8632", "linkedin": ";;sheng-li-15a70022/", "or_profile": "~Zhixuan_Chu1;rathbun@uga.edu;~Sheng_Li3", "aff": "University of Georgia;;University of Georgia", "aff_domain": "uga.edu;;uga.edu", "position": "PhD student;;Assistant Professor", "bibtex": "@misc{\nchu2021continual,\ntitle={Continual Lifelong Causal Effect Inference with Real World Evidence},\nauthor={Zhixuan Chu and Stephen Rathbun and Sheng Li},\nyear={2021},\nurl={https://openreview.net/forum?id=IOqr2ZyXHz1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=IOqr2ZyXHz1", "pdf_size": 0, "rating": "2;3;4;4", "confidence": "4;4;4;5", "wc_review": "432;249;201;328", "wc_reply_reviewers": "0;0;0;186", "wc_reply_authors": "1468;330;1081;1234", "reply_reviewers": "0;0;0;1", "reply_authors": "3;1;2;3", "rating_avg": [ 3.25, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 302.5, 87.44283847176966 ], "wc_reply_reviewers_avg": [ 46.5, 80.5403625519528 ], "wc_reply_authors_avg": [ 1028.25, 426.04247147438247 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15655598799842347536&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Georgia", "aff_unique_dep": "", "aff_unique_url": "https://www.uga.edu", "aff_unique_abbr": "UGA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "IPGZ6S3LDdw", "title": "Fast MNAS: Uncertainty-aware Neural Architecture Search with Lifelong Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Sampling-based neural architecture search (NAS) always guarantees better convergence yet suffers from huge computational resources compared with gradient-based approaches, due to the rollout bottleneck -- exhaustive training for each sampled generation on proxy tasks. This work provides a general pipeline to accelerate the convergence of the rollout process as well as the RL learning process in sampling-based NAS. It is motivated by the interesting observation that both the architecture and the parameter knowledge can be transferred between different experiments and even different tasks. We first introduce an uncertainty-aware critic (value function) in PPO to utilize the architecture knowledge in previous experiments, which stabilizes the training process and reduces the searching time by 4 times. Further, a life-long knowledge pool together with a block similarity function is proposed to utilize the lifelong parameter knowledge and reduces the searching time by 2 times. It is the first to introduce block-level weight sharing in RL-based NAS. The block similarity function guarantees a 100% hitting ratio with strict fairness. Besides, we show a simply designed off-policy correction factor that enables 'replay buffer' in RL optimization and further reduces half of the searching time. Experiments on the MNAS search space show the proposed FNAS accelerates standard RL-based NAS process by $\\sim$10x (e.g. $\\sim$256 2x2 TPUv2*days / 20,000 GPU*hour $\\rightarrow$ 2,000 GPU*hour for MNAS), and guarantees better performance on various vision tasks.", "keywords": "Neural Architecture Search;AutoML;Reinforcement Learning (RL)", "primary_area": "", "supplementary_material": "", "author": "Jihao Liu;Yangting Sun;Ming Zhang;Boxiao Liu;Yu Liu", "authorids": "~Jihao_Liu3;~Yangting_Sun1;~Ming_Zhang10;~Boxiao_Liu1;~Yu_Liu2", "gender": "M;M;M;M;M", "homepage": ";https://scholar.google.com/citations?user=cPDARWYAAAAJ&hl=zh-CN;;http://liuyu.us;https://jihaonew.github.io/", "dblp": ";;188/2274;97/2274-15;167/0509", "google_scholar": ";;-zEM0ycAAAAJ;;PP1HyToAAAAJ", "orcid": ";;0000-0002-9792-1361;;", "linkedin": "https://linkedin.com/in/yangting-sun;;;;", "or_profile": "~Yangting_Sun1;~Ming_Zhang10;~Boxiao_Liu1;~Yu_Liu2;~Jihao_Liu4", "aff": "SenseTime LTD;;State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences;SenseTime;", "aff_domain": "sensetime.com;;ict.ac.cn;sensetime.com;", "position": "Computer Vision Researcher;;PhD student;Principal Researcher;", "bibtex": "@misc{\nliu2021fast,\ntitle={Fast {\\{}MNAS{\\}}: Uncertainty-aware Neural Architecture Search with Lifelong Learning},\nauthor={Jihao Liu and Yangting Sun and Ming Zhang and Boxiao Liu and Yu Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=IPGZ6S3LDdw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=IPGZ6S3LDdw", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "5;3;3;4", "wc_review": "226;194;221;572", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "442;155;518;782", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 303.25, 155.63960774815644 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 474.25, 223.37454532690157 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:eK28sJ31FQgJ:scholar.google.com/&scioq=Fast+MNAS:+Uncertainty-aware+Neural+Architecture+Search+with+Lifelong+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "SenseTime;Chinese Academy of Sciences", "aff_unique_dep": ";Institute of Computing Technology", "aff_unique_url": "https://www.sensetime.com;http://www.cas.cn", "aff_unique_abbr": "SenseTime;CAS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "IT2s2Ub6skl", "title": "Towards end-to-end disease prediction from raw metagenomic data", "track": "main", "status": "Desk Reject", "tldr": "", "abstract": "Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and are stored as fastq files. Conventional processing pipelines consist multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. \nRecent studies have demonstrated that training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimentionality of the data. \nIn this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome.\nUsing two public real-life data-sets as well a simulated one, we demonstrated that this original approach reached very high performances, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.", "keywords": "Metagenomics;Deep Learning;End-to-End machine learning;Multiple Instance Learning;Precision Medicine;Disease Prediction;attention mechanism", "primary_area": "", "supplementary_material": "", "author": "Anonymous", "authorids": "ICLR.cc/2021/Conference/Paper3380/Authors", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nanonymous2021towards,\ntitle={Towards end-to-end disease prediction from raw metagenomic data},\nauthor={Anonymous},\nbooktitle={Submitted to International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=IT2s2Ub6skl},\nnote={under review}\n}", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=IT2s2Ub6skl", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7251049438185782161&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5 }, { "id": "IU8QxEiG4hR", "title": "SBEVNet: End-to-End Deep Stereo Layout Estimation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Accurate layout estimation is crucial for planning and navigation, for robotics applications such as self driving. In this paper, we introduce stereo bird's eye view network SBEVNet, a novel supervised end-to-end framework for estimation of bird's eye view layout from a pair of stereo images. Although our network reuses the building blocks from the state-of-the-art deep learning networks for disparity estimation, we show that accurate depth estimation is neither sufficient nor necessary. Instead, the learning of a good internal bird's eye view feature representation is essential for layout estimation. Specifically, we first generate a disparity feature volume using the features of the stereo images and then project it to the bird's eye view coordinates. This gives us coarse grained scene structural information. We also apply inverse perspective mapping (IPM) to map the input images and their features to the bird's eye view. This gives us fine grained texture information. The concatenated IPM features with the projected feature volume creates a rich bird's eye view representation which is capable of spatial reasoning. We use this representation to estimate the BEV semantic map. Additionally, we show that using the IPM features as a supervisory signal for stereo features can give an improvement in performance. We demonstrate our approach on two datasets: KITTI dataset and synthetically generated dataset using the CARLA simulator. For both of the datasets, we establish state-of-the-art performance beyond other baselines.", "keywords": "Layout Estimation;Deep Stereo;Computer Vision", "primary_area": "", "supplementary_material": "/attachment/ee4538e04e40cb7371a9ecd1773da07d8a33d9a4.zip", "author": "Divam Gupta;Wei Pu;Trenton Tabor;Jeff Schneider", "authorids": "~Divam_Gupta1;wpu@nrec.ri.cmu.edu;~Trenton_Tabor1;~Jeff_Schneider1", "gender": "M;;M;", "homepage": ";;;https://www.cs.cmu.edu/~schneide", "dblp": ";;;38/247", "google_scholar": ";;TnXeyZMAAAAJ;3bSbb20AAAAJ", "orcid": ";;;0000-0002-5080-9073", "linkedin": ";;trenton-tabor-047524a4/;jeff-schneider-1593b322/", "or_profile": "~Divam_Gupta1;wpu@nrec.ri.cmu.edu;~Trenton_Tabor1;~Jeff_Schneider1", "aff": "Carnegie Mellon University;;Carnegie Mellon University;Carnegie Mellon University", "aff_domain": "cmu.edu;;cmu.edu;cs.cmu.edu", "position": "MS student;;Researcher;Researcher", "bibtex": "@misc{\ngupta2021sbevnet,\ntitle={{\\{}SBEVN{\\}}et: End-to-End Deep Stereo Layout Estimation},\nauthor={Divam Gupta and Wei Pu and Trenton Tabor and Jeff Schneider},\nyear={2021},\nurl={https://openreview.net/forum?id=IU8QxEiG4hR}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=IU8QxEiG4hR", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "4;5;5;4", "wc_review": "335;426;1191;705", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "112;190;471;327", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 664.25, 333.277193189093 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 275.0, 136.85210995815885 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10508857987247345469&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "IUYthV32lbK", "title": "On the Certified Robustness for Ensemble Models and Beyond", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent studies show that deep neural networks (DNN) are vulnerable to adversarial examples, which aim to mislead DNNs to make arbitrarily incorrect predictions. To defend against such attacks, both empirical and theoretical defense approaches have been proposed for a single ML model. In this work, we aim to explore and characterize the robustness conditions for ensemble ML models. We prove that the diversified gradient and large confidence margin are sufficient and necessary conditions for certifiably robust ensemble models under the model-smoothness assumption. We also show that an ensemble model can achieve higher certified robustness than a single base model based on these conditions. To our best knowledge, this is the first work providing tight conditions for the ensemble robustness. Inspired by our analysis, we propose the lightweight Diversity Regularized Training (DRT) for ensemble models. We derive the certified robustness of DRT based ensembles such as standard Weighted Ensemble and Max-Margin Ensemble following the sufficient and necessary conditions. Besides, to efficiently calculate the model-smoothness, we leverage adapted randomized model smoothing to obtain the certified robustness for different ensembles in practice. We show that the certified robustness of ensembles, on the other hand, verifies the necessity of DRT. To compare different ensembles, we prove that when the adversarial transferability among base models is high, Max-Margin Ensemble can achieve higher certified robustness than Weighted Ensemble; vice versa. Extensive experiments show that ensemble models trained with DRT can achieve the state-of-the-art certified robustness under various settings. Our work will shed light on future analysis for robust ensemble models. ", "keywords": "Adversarial Machine Learning;Model Ensemble;Certified Robustness", "primary_area": "", "supplementary_material": "", "author": "Zhuolin Yang;Linyi Li;Xiaojun Xu;Bhavya Kailkhura;Bo Li", "authorids": "~Zhuolin_Yang1;~Linyi_Li1;~Xiaojun_Xu1;~Bhavya_Kailkhura1;~Bo_Li19", "gender": "M;M;M;M;F", "homepage": "https://lucas110550.github.io/about;http://linyil.com;;https://people.llnl.gov/kailkhura1;http://boli.cs.illinois.edu/", "dblp": ";99/4340-1.html;;132/8938;50/3402-26", "google_scholar": "BvSv-C0AAAAJ;-b0sk-YAAAAJ;rdMZZQwAAAAJ;SQpJmOgAAAAJ;K8vJkTcAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Zhuolin_Yang1;~Linyi_Li1;~Xiaojun_Xu1;~Bhavya_Kailkhura1;~Bo_Li19", "aff": "University of Illinois at Urbana Champaign;Fujitsu Research of America;University of Illinois, Urbana Champaign;Lawrence Livermore National Laboratory;University of Illinois, Urbana Champaign", "aff_domain": "illinois.edu;fujitsu.com;illinois.edu;llnl.gov;illinois.edu", "position": "PhD student;Research Intern;PhD student;Research Staff;Assistant Professor", "bibtex": "@misc{\nyang2021on,\ntitle={On the Certified Robustness for Ensemble Models and Beyond},\nauthor={Zhuolin Yang and Linyi Li and Xiaojun Xu and Bhavya Kailkhura and Bo Li},\nyear={2021},\nurl={https://openreview.net/forum?id=IUYthV32lbK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=IUYthV32lbK", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;2;4;4", "wc_review": "837;426;826;403", "wc_reply_reviewers": "320;0;0;0", "wc_reply_authors": "1221;1412;1176;581", "reply_reviewers": "1;0;0;0", "reply_authors": "3;2;2;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 623.0, 208.69475316835351 ], "wc_reply_reviewers_avg": [ 80.0, 138.5640646055102 ], "wc_reply_authors_avg": [ 1097.5, 311.08559915238766 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 60, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8592223056112126154&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;0;2;0", "aff_unique_norm": "University of Illinois Urbana-Champaign;Fujitsu Research of America;Lawrence Livermore National Laboratory", "aff_unique_dep": ";;", "aff_unique_url": "https://illinois.edu;https://www.fujitsu.com/us/;https://www.llnl.gov", "aff_unique_abbr": "UIUC;FRA;LLNL", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Urbana-Champaign;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "IUaOP8jQfHn", "title": "Benchmarking Unsupervised Object Representations for Video Sequences", "track": "main", "status": "Reject", "tldr": "", "abstract": "Perceiving the world in terms of objects and tracking them through time is a crucial prerequisite for reasoning and scene understanding. Recently, several methods have been proposed for unsupervised learning of object-centric representations. However, since these models have been evaluated with respect to different downstream tasks, it remains unclear how they compare in terms of basic perceptual abilities such as detection, figure-ground segmentation and tracking of individual objects. To close this gap, we design a benchmark with three datasets of varying complexity and seven additional test sets which feature challenging tracking scenarios relevant for natural videos. Using this benchmark, we compare the perceptual abilities of four unsupervised object-centric learning approaches: ViMON, a video-extension of MONet, based on a recurrent spatial attention mechanism, OP3, which exploits clustering via spatial mixture models, as well as TBA and SCALOR, which use an explicit factorization via spatial transformers. Our results suggest that architectures with unconstrained latent representations and full-image object masks such as ViMON and OP3 are able to learn more powerful representations in terms of object detection, segmentation and tracking than the explicitly parameterized spatial transformer based architecture of TBA and SCALOR. We also observe that none of the methods are able to gracefully handle the most challenging tracking scenarios despite their synthetic nature, suggesting that our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.", "keywords": "Unsupervised learning;object-centric representations;benchmark;tracking", "primary_area": "", "supplementary_material": "/attachment/b559cc50b98fec0191df5a91dea9ef17114b1a01.zip", "author": "Marissa A. Weis;Kashyap Chitta;Yash Sharma;Wieland Brendel;Matthias Bethge;Andreas Geiger;Alexander S Ecker", "authorids": "~Marissa_A._Weis1;~Kashyap_Chitta1;~Yash_Sharma1;~Wieland_Brendel1;~Matthias_Bethge2;~Andreas_Geiger3;~Alexander_S_Ecker1", "gender": ";M;;M;;M;M", "homepage": ";https://kashyap7x.github.io/;http://www.yash-sharma.com;;;http://www.cvlibs.net;http://eckerlab.org", "dblp": ";220/3765;121/9967-1;37/11107;;40/5825-1;26/7228", "google_scholar": ";vX5i2CcAAAAJ;AlGCn8wAAAAJ;v-JL-hsAAAAJ;;https://scholar.google.ca/citations?hl=en;VgYU_m8AAAAJ", "orcid": ";;;;;0000-0002-8151-3726;0000-0003-2392-5105", "linkedin": ";;yashjsharma/;;;;alexecker/", "or_profile": "~Marissa_A._Weis1;~Kashyap_Chitta1;~Yash_Sharma1;~Wieland_Brendel1;~Matthias_Bethge2;~Andreas_Geiger3;~Alexander_S_Ecker1", "aff": ";University of T\u00fcbingen;University of Tuebingen;University of Tuebingen;;University of Tuebingen;Max Planck Institute for Dynamics and Self-Organization", "aff_domain": ";uni-tuebingen.de;uni-tuebingen.de;uni-tuebingen.de;;uni-tuebingen.de;ds.mpg.de", "position": ";PhD student;PhD student;Principal Researcher;;Professor;Principal Researcher", "bibtex": "@misc{\nweis2021benchmarking,\ntitle={Benchmarking Unsupervised Object Representations for Video Sequences},\nauthor={Marissa A. Weis and Kashyap Chitta and Yash Sharma and Wieland Brendel and Matthias Bethge and Andreas Geiger and Alexander S Ecker},\nyear={2021},\nurl={https://openreview.net/forum?id=IUaOP8jQfHn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer5", "site": "https://openreview.net/forum?id=IUaOP8jQfHn", "pdf_size": 0, "rating": "4;5;5;7", "confidence": "1;2;3;3", "wc_review": "422;189;251;307", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "589;503;715;145", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 5.25, 1.0897247358851685 ], "confidence_avg": [ 2.25, 0.82915619758885 ], "wc_review_avg": [ 292.25, 85.75364423743169 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 488.0, 211.89856063692363 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.7608859102526822, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17940354200411373536&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 12, "aff_unique_index": "0;1;1;1;2", "aff_unique_norm": "University of T\u00fcbingen;University of Tuebingen;Max Planck Institute for Dynamics and Self-Organization", "aff_unique_dep": ";;", "aff_unique_url": "https://www.uni-tuebingen.de/;https://www.uni-tuebingen.de/;https://www.mpids.org", "aff_unique_abbr": "Uni T\u00fcbingen;Uni T\u00fcbingen;MPIDS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "Germany" }, { "id": "IVwXaHpiO0", "title": "SyncTwin: Transparent Treatment Effect Estimation under Temporal Confounding", "track": "main", "status": "Reject", "tldr": "", "abstract": "Estimating causal treatment effects using observational data is a problem with few solutions when the confounder has a temporal structure, e.g. the history of disease progression might impact both treatment decisions and clinical outcomes. For such a challenging problem, it is desirable for the method to be transparent --- the ability to pinpoint a small subset of data points that contributes most to the estimate and to clearly indicate whether the estimate is reliable or not. This paper develops a new method, SyncTwin, to overcome temporal confounding in a transparent way. SyncTwin estimates the treatment effect of a target individual by comparing the outcome with its synthetic twin, which is constructed to closely match the target in the representation of the temporal confounders. SyncTwin achieves transparency by enforcing the synthetic twin to only depend on the weighted combination of few other individuals in the dataset. Moreover, the quality of the synthetic twin can be assessed by a performance metric, which also indicates the reliability of the estimated treatment effect. Experiments demonstrate that SyncTwin outperforms the benchmarks in clinical observational studies while still being transparent.", "keywords": "treatment effect;interpretability;healthcare;causal inference", "primary_area": "", "supplementary_material": "", "author": "Zhaozhi Qian;Yao Zhang;Ioana Bica;Angela Wood;Mihaela van der Schaar", "authorids": "~Zhaozhi_Qian1;~Yao_Zhang3;~Ioana_Bica1;amw79@medschl.cam.ac.uk;~Mihaela_van_der_Schaar2", "gender": ";M;F;;F", "homepage": ";;https://ioanabica.github.io/;;https://www.vanderschaar-lab.com", "dblp": "194/2443;;;;", "google_scholar": "PuTDB5gAAAAJ;B3Xd8-kAAAAJ;;;DZ3S--MAAAAJ", "orcid": "0000-0002-4561-0342;;;;", "linkedin": ";;;;", "or_profile": "~Zhaozhi_Qian1;~Yao_Zhang3;~Ioana_Bica1;amw79@medschl.cam.ac.uk;~Mihaela_van_der_Schaar2", "aff": "University of Cambridge;University of Cambridge;University of Oxford;;University of California, Los Angeles", "aff_domain": "cam.ac.uk;cam.ac.uk;ox.ac.uk;;ucla.edu", "position": "PhD student;PhD student;PhD student;;Full Professor", "bibtex": "@misc{\nqian2021synctwin,\ntitle={SyncTwin: Transparent Treatment Effect Estimation under Temporal Confounding},\nauthor={Zhaozhi Qian and Yao Zhang and Ioana Bica and Angela Wood and Mihaela van der Schaar},\nyear={2021},\nurl={https://openreview.net/forum?id=IVwXaHpiO0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer5;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=IVwXaHpiO0", "pdf_size": 0, "rating": "3;4;4;7;9", "confidence": "4;3;5;4;4", "wc_review": "749;314;424;309;278", "wc_reply_reviewers": "0;0;132;0;0", "wc_reply_authors": "1514;899;724;411;112", "reply_reviewers": "0;0;1;0;0", "reply_authors": "3;2;1;1;1", "rating_avg": [ 5.4, 2.244994432064365 ], "confidence_avg": [ 4.0, 0.6324555320336759 ], "wc_review_avg": [ 414.8, 174.26577403494926 ], "wc_reply_reviewers_avg": [ 26.4, 52.79999999999999 ], "wc_reply_authors_avg": [ 732.0, 474.7458267325791 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 1.6, 0.8 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6640689226422514502&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "University of Cambridge;University of Oxford;University of California, Los Angeles", "aff_unique_dep": ";;", "aff_unique_url": "https://www.cam.ac.uk;https://www.ox.ac.uk;https://www.ucla.edu", "aff_unique_abbr": "Cambridge;Oxford;UCLA", "aff_campus_unique_index": "0;0;2", "aff_campus_unique": "Cambridge;;Los Angeles", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "United Kingdom;United States" }, { "id": "IW-EI6BCxy", "title": "Variable-Shot Adaptation for Online Meta-Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Few-shot meta-learning methods consider the problem of learning new tasks from a small, fixed number of examples, by meta-learning across static data from a set of previous tasks. However, in many real world settings, it is more natural to view the problem as one of minimizing the total amount of supervision --- both the number of examples needed to learn a new task and the amount of data needed for meta-learning. Such a formulation can be studied in a sequential learning setting, where tasks are presented in sequence. When studying meta-learning in this online setting, a critical question arises: can meta-learning improve over the sample complexity and regret of standard empirical risk minimization methods, when considering both meta-training and adaptation together? The answer is particularly non-obvious for meta-learning algorithms with complex bi-level optimizations that may demand large amounts of meta-training data. To answer this question, we extend previous meta-learning algorithms to handle the variable-shot settings that naturally arise in sequential learning: from many-shot learning at the start, to zero-shot learning towards the end. On sequential learning problems, we find that meta-learning solves the full task set with fewer overall labels and achieves greater cumulative performance, compared to standard supervised methods. These results suggest that meta-learning is an important ingredient for building learning systems that continuously learn and improve over a sequence of problems.", "keywords": "meta-learning;deep learning", "primary_area": "", "supplementary_material": "/attachment/b6ad81901ef5c652aff08cea70add554b65916f3.zip", "author": "Tianhe Yu;Xinyang Geng;Chelsea Finn;Sergey Levine", "authorids": "~Tianhe_Yu1;~Xinyang_Geng1;~Chelsea_Finn1;~Sergey_Levine1", "gender": "M;M;F;M", "homepage": "https://cs.stanford.edu/~tianheyu/;http://young-geng.xyz/;https://ai.stanford.edu/~cbfinn/;https://people.eecs.berkeley.edu/~svlevine/", "dblp": "192/1797;186/8221;131/1783;80/7594", "google_scholar": ";vYougn0AAAAJ;vfPE6hgAAAAJ;8R35rCwAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Tianhe_Yu1;~Xinyang_Geng1;~Chelsea_Finn1;~Sergey_Levine1", "aff": "Stanford University;University of California, Berkeley;Google;Google", "aff_domain": "stanford.edu;berkeley.edu;google.com;google.com", "position": "PhD student;PhD student;Research Scientist;Research Scientist", "bibtex": "@misc{\nyu2021variableshot,\ntitle={Variable-Shot Adaptation for Online Meta-Learning},\nauthor={Tianhe Yu and Xinyang Geng and Chelsea Finn and Sergey Levine},\nyear={2021},\nurl={https://openreview.net/forum?id=IW-EI6BCxy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=IW-EI6BCxy", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;3;4;4", "wc_review": "651;551;270;229", "wc_reply_reviewers": "239;0;0;203", "wc_reply_authors": "1283;786;436;1114", "reply_reviewers": "1;0;0;2", "reply_authors": "4;2;2;4", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 425.25, 179.85601880393105 ], "wc_reply_reviewers_avg": [ 110.5, 111.23061628886177 ], "wc_reply_authors_avg": [ 904.75, 324.30184627904913 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 3.0, 1.0 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16374867415505768199&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "Stanford University;University of California, Berkeley;Google", "aff_unique_dep": ";;Google", "aff_unique_url": "https://www.stanford.edu;https://www.berkeley.edu;https://www.google.com", "aff_unique_abbr": "Stanford;UC Berkeley;Google", "aff_campus_unique_index": "0;1;2;2", "aff_campus_unique": "Stanford;Berkeley;Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Characterizing signal propagation to close the performance gap in unnormalized ResNets", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3029", "id": "IX3Nnir2omJ", "poster": "", "openreview": "https://openreview.net/forum?id=IX3Nnir2omJ", "slides": "https://iclr.cc/virtual/2021/poster/3029", "video": "https://iclr.cc/virtual/2021/poster/3029", "author_site": "Andrew Brock, Soham De, Samuel Smith", "tldr": "", "abstract": "Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in ReLU networks by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain performance competitive with state-of-the-art EfficientNets on ImageNet.", "keywords": "normalizers;signal propagation;deep learning;neural networks;ResNets;EfficientNets;ImageNet;CNNs;ConvNets", "primary_area": "", "supplementary_material": "", "author": "Andrew Brock;Soham De;Samuel L Smith", "authorids": "~Andrew_Brock1;~Soham_De2;~Samuel_L_Smith1", "gender": ";M;M", "homepage": "https://www.github.com/ajbrock;https://sohamde.github.io;https://www.samtalksml.net/", "dblp": ";124/9197;", "google_scholar": "https://scholar.google.co.uk/citations?user=NIxD36wAAAAJ;lHf55pF3KVQC;https://scholar.google.co.uk/citations?user=fyEqU5oAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Andrew_Brock1;~Soham_De2;~Samuel_L_Smith1", "aff": "Google DeepMind;Google DeepMind;babylon health", "aff_domain": "deepmind.com;google.com;babylonhealth.com", "position": "Research Scientist;Research Scientist;Data scientist", "bibtex": "@inproceedings{\nbrock2021characterizing,\ntitle={Characterizing signal propagation to close the performance gap in unnormalized ResNets},\nauthor={Andrew Brock and Soham De and Samuel L Smith},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=IX3Nnir2omJ}\n}", "github": "[![github](/images/github_icon.svg) deepmind/deepmind-research](https://github.com/deepmind/deepmind-research/tree/master/nfnets) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=IX3Nnir2omJ)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "5;7;7", "confidence": "5;3;4", "wc_review": "532;278;299", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1338;720;199", "reply_reviewers": "0;0;0", "reply_authors": "2;1;1", "rating_avg": [ 6.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 369.6666666666667, 115.1067137727229 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 752.3333333333334, 465.55653672662453 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 151, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16010610845767584783&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=IX3Nnir2omJ", "email": "deepmind.com;google.com;babylonhealth.com", "author_num": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Google;Babylon Health", "aff_unique_dep": "Google DeepMind;", "aff_unique_url": "https://deepmind.com;https://www.babylonhealth.com", "aff_unique_abbr": "DeepMind;Babylon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "IZIHJ-ME9c-", "title": "Hokey Pokey Causal Discovery: Using Deep Learning Model Errors to Learn Causal Structure", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "While machine learning excels at learning predictive models from observational data, learning the causal mechanisms behind the observed phenomena presents the significant challenge of distinguishing true causal relationships from confounding and other potential sources of spurious correlations. Many existing algorithms for the discovery of causal structure from observational data rely on evaluating the conditional independence relationships among features to account for the effects of confounding. However, the choice of independence tests for these algorithms often rely on assumptions regarding the data distributions and type of causal relationships. To avoid these assumptions, we develop a novel deep learning approach, dubbed the Hokey Pokey model, to indirectly explore the conditional dependencies among a set of variables by rapidly comparing predictive errors given different combinations of input variables. We then use the results of this comparison as a predictive signal for causal relationships among the variables. We conduct rigorous experiments to evaluate model robustness and generalizability using generated datasets with known underlying causal relationships and analyze the capacity of model error comparisons to provide a predictive signal for causal structure. Our model outperforms commonly used baseline models (PC and GES) and is capable of discovering causal relationships of different complexity (graph size, density and structure) in both binary and continuous data.", "keywords": "causal discovery", "primary_area": "", "supplementary_material": "", "author": "Emily Saldanha;Dustin Arendt;Svitlana Volkova", "authorids": "~Emily_Saldanha1;dustin.arendt@pnnl.gov;~Svitlana_Volkova1", "gender": ";;F", "homepage": ";;https://www.linkedin.com/in/svitlanavolkova/", "dblp": ";;19/8609", "google_scholar": ";;DwrriFYAAAAJ", "orcid": ";;0000-0002-6131-3073", "linkedin": ";;svitlanavolkova/", "or_profile": "~Emily_Saldanha1;dustin.arendt@pnnl.gov;~Svitlana_Volkova1", "aff": ";;Pacific Northwest National Laboratory", "aff_domain": ";;pnnl.gov", "position": ";;Principal Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=IZIHJ-ME9c-", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;4;5;3", "wc_review": "211;242;455;167", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 268.75, 110.78441903083664 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:XOmEjjR8_7MJ:scholar.google.com/&scioq=Hokey+Pokey+Causal+Discovery:+Using+Deep+Learning+Model+Errors+to+Learn+Causal+Structure&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Pacific Northwest National Laboratory", "aff_unique_dep": "", "aff_unique_url": "https://www.pnnl.gov", "aff_unique_abbr": "PNNL", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "IZQm8mMRVqW", "title": "Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "The Heavy Ball Method, proposed by Polyak over five decades ago, is a first-order method for optimizing continuous functions. While its stochastic counterpart has proven extremely popular in training deep networks, there are almost no known functions where deterministic Heavy Ball is provably faster than the simple and classical gradient descent algorithm in non-convex optimization. The success of Heavy Ball has thus far eluded theoretical understanding. Our goal is to address this gap, and in the present work we identify two non-convex problems where we provably show that the Heavy Ball momentum helps the iterate to enter a benign region that contains a global optimal point faster. We show that Heavy Ball exhibits simple dynamics that clearly reveal the benefit of using a larger value of momentum parameter for the problems. The first of these optimization problems is the phase retrieval problem, which has useful applications in physical science. The second of these optimization problems is the cubic-regularized minimization, a critical subroutine required by Nesterov-Polyak cubic-regularized method to find second-order stationary points in general smooth non-convex problems.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Jun-Kun Wang;Jacob Abernethy", "authorids": "~Jun-Kun_Wang1;~Jacob_Abernethy1", "gender": "M;M", "homepage": "https://jimwang123.github.io/;https://www.cc.gatech.edu/~jabernethy9/", "dblp": "153/5463;91/2520", "google_scholar": ";FDu4ciwAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Jun-Kun_Wang1;~Jacob_Abernethy1", "aff": "Georgia Institute of Technology;Georgia Institute of Technology", "aff_domain": "gatech.edu;cc.gatech.edu", "position": "PhD student;Associate Professor", "bibtex": "@misc{\nwang2021quickly,\ntitle={Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization},\nauthor={Jun-Kun Wang and Jacob Abernethy},\nyear={2021},\nurl={https://openreview.net/forum?id=IZQm8mMRVqW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=IZQm8mMRVqW", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;3;4;3", "wc_review": "247;345;327;375", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "743;454;825;38", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 323.5, 47.37879272417143 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 515.0, 307.9504830325811 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4773347063159720667&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Georgia Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.gatech.edu", "aff_unique_abbr": "Georgia Tech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "IazZhsJK7wJ", "title": "A Simple and General Strategy for Referential Problem in Low-Resource Neural Machine Translation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "This paper aims to solve a series of referential problems in sequence decoding caused by data sparsity and corpus scarce in low-resource Neural Machine Translation (NMT), including pronoun missing, reference error, bias and so on. It is difficult to find the essential reason of these problems because they are only shown in the prediction results and involve all aspects of the model. Different from the usual solutions based on complex mathematical rule setting and adding artificial features, we expect to turn the problems in the predictions into noise as much as possible, and use adversarial training to make the model find the balance between the noise and the golden samples, instead of exploring the reason of the problem during the complex training. In this paper, only a simple noise-based preprocessing operation and a slight modification of the adversarial training can make the model generalize to a series of referential problems in low-resource NMT task. On Korean-Chinese, Mongolian-Chinese and Arabic-Chinese tasks, the evaluation of BLEU score and the accuracy of pronouns in sequence have been significantly improved.", "keywords": "machine translation;Referential Problem;low-resource", "primary_area": "", "supplementary_material": "", "author": "Yatu Ji;Nier Wu;Hongxu Hou", "authorids": "~Yatu_Ji1;~Nier_Wu1;~Hongxu_Hou1", "gender": "M;M;", "homepage": ";https://id.qq.com/index.html#info;https://ccs.imu.edu.cn/info/1166/3148.htm", "dblp": "245/8301;;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Yatu_Ji1;~Nier_Wu1;~Hongxu_Hou1", "aff": "Inner Mongolia University;inner mongolia university;Inner Mongolia University", "aff_domain": "imu.edu.cn;imu.edu.cn;imu.edu.cn", "position": "PhD student;PhD student;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=IazZhsJK7wJ", "pdf_size": 0, "rating": "2;3;4;4", "confidence": "5;3;4;3", "wc_review": "189;219;1058;176", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 410.5, 374.1593911690578 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6363636363636364, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:1FpL5ajeXXgJ:scholar.google.com/&scioq=A+Simple+and+General+Strategy+for+Referential+Problem+in+Low-Resource+Neural+Machine+Translation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Inner Mongolia University", "aff_unique_dep": "", "aff_unique_url": "http://www.imu.edu.cn/", "aff_unique_abbr": "IMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "IbFcpYnwCvd", "title": "The Logical Options Framework", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning composable policies for environments with complex rules and tasks is a challenging problem. We introduce a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable. LOF efficiently learns policies that satisfy tasks by representing the task as an automaton and integrating it into learning and planning. We provide and prove conditions under which LOF will learn satisfying, optimal policies. And lastly, we show how LOF's learned policies can be composed to satisfy unseen tasks with only 10-50 retraining steps. We evaluate LOF on four tasks in discrete and continuous domains.", "keywords": "reinforcement learning;hierarchical methods;formal methods;formal logic", "primary_area": "", "supplementary_material": "/attachment/3d051de8ae3da610b84210e818c6c167a7baf9ce.zip", "author": "Brandon Araki;Xiao Li;Kiran Vodrahalli;Jonathan DeCastro;J Micah Fry;Daniela Rus", "authorids": "~Brandon_Araki1;~Xiao_Li1;~Kiran_Vodrahalli1;jonathan.decastro@tri.global;micah.fry@ll.mit.edu;~Daniela_Rus1", "gender": "M;;M;;;F", "homepage": ";https://xli4217.github.io/;https://kiranvodrahalli.github.io;;;https://www.csail.mit.edu/person/daniela-rus", "dblp": ";;188/5863;;;r/DanielaRus", "google_scholar": "j1OfCS8AAAAJ;;7oBE9-oAAAAJ;;;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Brandon_Araki1;~Xiao_Li1;~Kiran_Vodrahalli1;jonathan.decastro@tri.global;micah.fry@ll.mit.edu;~Daniela_Rus1", "aff": "Massachusetts Institute of Technology;Massachusetts Institute of Technology;Columbia University;;;Massachusetts Institute of Technology", "aff_domain": "mit.edu;mit.edu;columbia.edu;;;mit.edu", "position": "PhD student;Postdoc;PhD student;;;Full Professor", "bibtex": "@misc{\naraki2021the,\ntitle={The Logical Options Framework},\nauthor={Brandon Araki and Xiao Li and Kiran Vodrahalli and Jonathan DeCastro and J Micah Fry and Daniela Rus},\nyear={2021},\nurl={https://openreview.net/forum?id=IbFcpYnwCvd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=IbFcpYnwCvd", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "5;4;2;4", "wc_review": "1897;377;343;468", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "3350;837;1140;976", "reply_reviewers": "0;0;0;0", "reply_authors": "5;1;2;2", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 771.25, 651.5567415812686 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1575.75, 1029.962711703681 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.5, 1.5 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.6882472016116854, "gs_citation": 39, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8491762780532620383&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "Massachusetts Institute of Technology;Columbia University", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://www.columbia.edu", "aff_unique_abbr": "MIT;Columbia", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "IeuEO1TccZn", "title": "Sufficient and Disentangled Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a novel approach to representation learning called sufficient and disentangled representation learning (SDRL). With SDRL, we seek a data representation that maps the input data to a lower-dimensional space with two properties: sufficiency and disentanglement. First, the representation is sufficient in the sense that the original input data is conditionally independent of the response or label given the representation. Second, the representation is maximally disentangled with mutually independent components and rotation invariant in distribution. We show that such a representation always exists under mild conditions on the input data distribution based on optimal transport theory. We formulate an objective function characterizing conditional independence and disentanglement. This objective function is then used to train a sufficient and disentangled representation with deep neural networks. We provide strong statistical guarantees for the learned representation by establishing an upper bound on the excess error of the objective function and show that it reaches the nonparametric minimax rate under mild conditions. We also validate the proposed method via numerical experiments and real data analysis.", "keywords": "Conditional independence;f-divergence;rotation invariant;neural network;statistical guarantee", "primary_area": "", "supplementary_material": "/attachment/b0fb6b09d79343947943a87fcb63549e74e27629.zip", "author": "Jian Huang;Yuling Jiao;Xu Liao;Jin Liu;Zhou Yu", "authorids": "~Jian_Huang5;yulingjiaomath@whu.edu.cn;liaoxu@u.duke.nus.edu;jin.liu@duke-nus.edu.sg;zyu@stat.ecnu.edu.cn", "gender": "M;;;;", "homepage": "https://www.polyu.edu.hk/ama/people/academic-staff/prof-huang-jian/;;;;", "dblp": ";;;;", "google_scholar": "https://scholar.google.com/citations?hl=en;;;;", "orcid": "0000-0002-5218-9269;;;;", "linkedin": ";;;;", "or_profile": "~Jian_Huang5;yulingjiaomath@whu.edu.cn;liaoxu@u.duke.nus.edu;jin.liu@duke-nus.edu.sg;zyu@stat.ecnu.edu.cn", "aff": ";;;;", "aff_domain": ";;;;", "position": ";;;;", "bibtex": "@misc{\nhuang2021sufficient,\ntitle={Sufficient and Disentangled Representation Learning},\nauthor={Jian Huang and Yuling Jiao and Xu Liao and Jin Liu and Zhou Yu},\nyear={2021},\nurl={https://openreview.net/forum?id=IeuEO1TccZn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=IeuEO1TccZn", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "5;4;4;4", "wc_review": "571;665;414;479", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "710;735;797;484", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 532.25, 94.79286629277543 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 681.5, 118.34377888169703 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7745966692414834, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:XZqo6vr8yBQJ:scholar.google.com/&scioq=Sufficient+and+Disentangled+Representation+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0 }, { "id": "IfEkus1dpU", "title": "Cut-and-Paste Neural Rendering", "track": "main", "status": "Reject", "tldr": "", "abstract": "Cut-and-paste methods take an object from one image and insert it into another. Doing so often results in unrealistic looking images because the inserted object's shading is inconsistent with the target scene's shading. Existing reshading methods require a geometric and physical model of the inserted object, which is then rendered using environment parameters. Accurately constructing such a model only from a single image is beyond the current understanding of computer vision.\n\nWe describe an alternative procedure -- cut-and-paste neural rendering, to render the inserted fragment's shading field consistent with the target scene. We use a Deep Image Prior (DIP) as a neural renderer trained to render an image with consistent image decomposition inferences. The resulting rendering from DIP should have an albedo consistent with composite albedo; it should have a shading field that, outside the inserted fragment, is the same as the target scene's shading field; \nand composite surface normals are consistent with the final rendering's shading field. \nThe result is a simple procedure that produces convincing and realistic shading. Moreover, our procedure does not require rendered images or image-decomposition from real images in the training or labeled annotations. In fact, our only use of simulated ground truth is our use of a pre-trained normal estimator. Qualitative results are strong, supported by a user study comparing against state-of-the-art image harmonization baseline.", "keywords": "Neural Rendering;Reshading;Relighting;Computational Photography;Image Decomposition", "primary_area": "", "supplementary_material": "/attachment/26aad1c00a6235b2bf549b798d201d21f1addc2d.zip", "author": "Anand Bhattad;David Forsyth", "authorids": "~Anand_Bhattad1;~David_Forsyth1", "gender": ";M", "homepage": "https://anandbhattad.github.io/;https://cs.illinois.edu/directory/profile/daf", "dblp": "215/4305;f/DavidAForsyth", "google_scholar": "XUsauXIAAAAJ;https://scholar.google.com.tw/citations?user=5H0arvkAAAAJ", "orcid": ";0000-0002-2278-0752", "linkedin": ";", "or_profile": "~Anand_Bhattad1;~David_Forsyth1", "aff": "University of Illinois Urbana Champaign;University of Illinois, Urbana-Champaign", "aff_domain": "illinois.edu;uiuc.edu", "position": "PhD student;Full Professor", "bibtex": "@misc{\nbhattad2021cutandpaste,\ntitle={Cut-and-Paste Neural Rendering},\nauthor={Anand Bhattad and David Forsyth},\nyear={2021},\nurl={https://openreview.net/forum?id=IfEkus1dpU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=IfEkus1dpU", "pdf_size": 0, "rating": "5;6;6", "confidence": "4;2;4", "wc_review": "325;363;305", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "729;468;537", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.9428090415820634 ], "wc_review_avg": [ 331.0, 24.055491403558285 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 578.0, 110.42644610780518 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7337435872143080155&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Illinois Urbana-Champaign;University of Illinois", "aff_unique_dep": ";", "aff_unique_url": "https://illinois.edu;https://illinois.edu", "aff_unique_abbr": "UIUC;UIUC", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Urbana-Champaign", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "Ifvv3qpPsqB", "title": "Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Adversarial training is promising for improving the robustness of deep neural networks towards adversarial perturbations, especially on the classification task. The effect of this type of training on semantic segmentation, contrarily, just commences. We make the initial attempt to explore the defense strategy on semantic segmentation by formulating a general adversarial training procedure that can perform decently on both adversarial and clean samples. We propose a dynamic divide-and-conquer adversarial training (DDC-AT) strategy to enhance the defense effect, by setting additional branches in the target model during training, and dealing with pixels with diverse properties towards adversarial perturbation. Our dynamical division mechanism divides pixels into multiple branches automatically, achieved by unsupervised learning. Note all these additional branches can be abandoned during inference and thus leave no extra parameter and computation cost. Extensive experiments with various segmentation models are conducted on PASCAL VOC 2012 and Cityscapes datasets, in which DDC-AT yields satisfying performance under both white- and black-box attacks.", "keywords": "adversarial defense;semantic segmentation;robustness", "primary_area": "", "supplementary_material": "", "author": "Xiaogang Xu;Hengshuang Zhao;Jiaya Jia", "authorids": "~Xiaogang_Xu2;~Hengshuang_Zhao2;~Jiaya_Jia1", "gender": "M;M;M", "homepage": "https://xiaogang00.github.io;https://hszhao.github.io;https://jiaya.me", "dblp": "118/2268-2;185/7848;31/5649", "google_scholar": "https://scholar.google.com.hk/citations?user=R65xDQwAAAAJ;4uE10I0AAAAJ;https://scholar.google.com.tw/citations?user=XPAkzTEAAAAJ", "orcid": "0000-0002-7928-7336;0000-0001-8277-2706;", "linkedin": ";hengshuang-zhao-347b8391/?originalSubdomain=hk;", "or_profile": "~Xiaogang_Xu2;~Hengshuang_Zhao2;~Jiaya_Jia1", "aff": "The Chinese University of Hong Kong;University of Oxford;Department of Computer Science and Engineering, Hong Kong University of Science and Technology", "aff_domain": "cuhk.edu.hk;ox.ac.uk;cse.ust.hk", "position": "PhD student;Postdoc;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=Ifvv3qpPsqB", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "4;4;2;3", "wc_review": "658;327;215;160", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 340.0, 193.2084366687956 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.48420012470625223, "gs_citation": 49, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16765063000970752217&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;2", "aff_unique_norm": "Chinese University of Hong Kong;University of Oxford;Hong Kong University of Science and Technology", "aff_unique_dep": ";;Department of Computer Science and Engineering", "aff_unique_url": "https://www.cuhk.edu.hk;https://www.ox.ac.uk;https://www.ust.hk", "aff_unique_abbr": "CUHK;Oxford;HKUST", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hong Kong SAR;", "aff_country_unique_index": "0;1;0", "aff_country_unique": "China;United Kingdom" }, { "title": "Pruning Neural Networks at Initialization: Why Are We Missing the Mark?", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3159", "id": "Ig-VyQc-MLK", "poster": "", "openreview": "https://openreview.net/forum?id=Ig-VyQc-MLK", "slides": "https://iclr.cc/virtual/2021/poster/3159", "video": "https://iclr.cc/virtual/2021/poster/3159", "author_site": "Jonathan Frankle, Gintare Dziugaite, Anonymous A Author, Michael Carbin", "tldr": "", "abstract": "Recent work has explored the possibility of pruning neural networks at initialization. We assess proposals for doing so: SNIP (Lee et al., 2019), GraSP (Wang et al., 2020), SynFlow (Tanaka et al., 2020), and magnitude pruning. Although these methods surpass the trivial baseline of random pruning, they remain below the accuracy of magnitude pruning after training, and we endeavor to understand why. We show that, unlike pruning after training, randomly shuffling the weights these methods prune within each layer or sampling new initial values preserves or improves accuracy. As such, the per-weight pruning decisions made by these methods can be replaced by a per-layer choice of the fraction of weights to prune. This property suggests broader challenges with the underlying pruning heuristics, the desire to prune at initialization, or both.", "keywords": "Pruning;Sparsity;Lottery Ticket;Science", "primary_area": "", "supplementary_material": "/attachment/53da61b6607bb8f4e57707d3ae1dbba3d15d52f3.zip", "author": "Jonathan Frankle;Gintare Karolina Dziugaite;Daniel Roy;Michael Carbin", "authorids": "~Jonathan_Frankle1;~Gintare_Karolina_Dziugaite1;~Daniel_Roy1;~Michael_Carbin1", "gender": "M;F;M;M", "homepage": "http://www.jfrankle.com;http://gkdz.org/;http://people.csail.mit.edu/mcarbin/;http://danroy.org", "dblp": "169/9776;163/1774;07/3119;04/2068", "google_scholar": "MlLJapIAAAAJ;5K1QB_8AAAAJ;mtejbKYAAAAJ;https://scholar.google.ca/citations?user=vA6ZQ_AAAAAJ", "orcid": ";;;", "linkedin": "jfrankle/;;;", "or_profile": "~Jonathan_Frankle1;~Gintare_Karolina_Dziugaite1;~Michael_Carbin1;~Daniel_M_Roy1", "aff": "Massachusetts Institute of Technology;ServiceNow;Massachusetts Institute of Technology;University of Toronto", "aff_domain": "mit.edu;servicenow.com;mit.edu;utoronto.ca", "position": "PhD student;Researcher;Assistant Professor;Associate Professor", "bibtex": "@inproceedings{\nfrankle2021pruning,\ntitle={Pruning Neural Networks at Initialization: Why Are We Missing the Mark?},\nauthor={Jonathan Frankle and Gintare Karolina Dziugaite and Daniel Roy and Michael Carbin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ig-VyQc-MLK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "4;6;7;9", "confidence": "4;5;3;5", "wc_review": "454;989;734;450", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1304;2099;728;502", "reply_reviewers": "0;0;0;0", "reply_authors": "2;3;2;2", "rating_avg": [ 6.5, 1.8027756377319946 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 656.75, 223.72457956156717 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1158.25, 616.85345707064 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.25087260300212727, "gs_citation": 275, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1954487645339953894&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Ig-VyQc-MLK", "email": "mit.edu;servicenow.com;mit.edu;utoronto.ca", "author_num": 4, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "Massachusetts Institute of Technology;ServiceNow;University of Toronto", "aff_unique_dep": ";;", "aff_unique_url": "https://web.mit.edu;https://www.servicenow.com;https://www.utoronto.ca", "aff_unique_abbr": "MIT;ServiceNow;U of T", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "United States;Canada" }, { "title": "Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3204", "id": "Ig53hpHxS4", "poster": "", "openreview": "https://openreview.net/forum?id=Ig53hpHxS4", "slides": "https://iclr.cc/virtual/2021/poster/3204", "video": "https://iclr.cc/virtual/2021/poster/3204", "author_site": "Rafael Valle, Kevin J Shih, Ryan Prenger, Bryan Catanzaro", "tldr": "", "abstract": "In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with style transfer and speech variation. Flowtron borrows insights from Autoregressive Flows and revamps Tacotron 2 in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple and stable. Flowtron learns an invertible mapping of data to a latent space that can be used to modulate many aspects of speech synthesis (timbre, expressivity, accent). Our mean opinion scores (MOS) show that Flowtron matches state-of-the-art TTS models in terms of speech quality. We provide results on speech variation, interpolation over time between samples and style transfer between seen and unseen speakers. Code and pre-trained models are publicly available at \\href{https://github.com/NVIDIA/flowtron}{https://github.com/NVIDIA/flowtron}.", "keywords": "Text to speech synthesis;normalizing flows;deep learning", "primary_area": "", "supplementary_material": "/attachment/892261d3f4575982c87b250c4aafc1bff95871f9.zip", "author": "Rafael Valle;Kevin J. Shih;Ryan Prenger;Bryan Catanzaro", "authorids": "~Rafael_Valle1;~Kevin_J._Shih1;rprenger@nvidia.com;~Bryan_Catanzaro1", "gender": "Not Specified;;;M", "homepage": "http://rafaelvalle.github.io;;;https://ctnzr.io", "dblp": ";;;14/4826", "google_scholar": "SktxU8IAAAAJ;;;UZ6kI2AAAAAJ", "orcid": ";;;0000-0003-0034-7728", "linkedin": "vallerafael/;;;bryancatanzaro/", "or_profile": "~Rafael_Valle1;~Kevin_J._Shih1;rprenger@nvidia.com;~Bryan_Catanzaro1", "aff": "NVIDIA;;;NVIDIA", "aff_domain": "nvidia.com;;;nvidia.com", "position": "Senior Research Scientist;;;Vice President", "bibtex": "@inproceedings{\nvalle2021flowtron,\ntitle={Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis},\nauthor={Rafael Valle and Kevin J. Shih and Ryan Prenger and Bryan Catanzaro},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ig53hpHxS4}\n}", "github": "[![github](/images/github_icon.svg) NVIDIA/flowtron](https://github.com/NVIDIA/flowtron) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=Ig53hpHxS4)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "5;6;6;9", "confidence": "3;3;5;5", "wc_review": "309;633;260;907", "wc_reply_reviewers": "0;0;0;58", "wc_reply_authors": "292;601;330;673", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 1.5 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 527.25, 261.93928208651715 ], "wc_reply_reviewers_avg": [ 14.5, 25.11473670974872 ], "wc_reply_authors_avg": [ 474.0, 165.52190187404204 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.6666666666666667, "gs_citation": 188, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1579689582070242490&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Ig53hpHxS4", "email": "nvidia.com;;;nvidia.com", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "NVIDIA", "aff_unique_dep": "NVIDIA Corporation", "aff_unique_url": "https://www.nvidia.com", "aff_unique_abbr": "NVIDIA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "CompOFA \u2013 Compound Once-For-All Networks for Faster Multi-Platform Deployment", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3296", "id": "IgIk8RRT-Z", "poster": "", "openreview": "https://openreview.net/forum?id=IgIk8RRT-Z", "slides": "https://iclr.cc/virtual/2021/poster/3296", "video": "https://iclr.cc/virtual/2021/poster/3296", "author_site": "Manas Sahni, Shreya Varshini, Alind Khare, Alexey Tumanov", "tldr": "", "abstract": "The emergence of CNNs in mainstream deployment has necessitated methods to design and train efficient architectures tailored to maximize the accuracy under diverse hardware and latency constraints. To scale these resource-intensive tasks with an increasing number of deployment targets, Once-For-All (OFA) proposed an approach to jointly train several models at once with a constant training cost. However, this cost remains as high as 40-50 GPU days and also suffers from a combinatorial explosion of sub-optimal model configurations. We seek to reduce this search space -- and hence the training budget -- by constraining search to models close to the accuracy-latency Pareto frontier. We incorporate insights of compound relationships between model dimensions to build CompOFA, a design space smaller by several orders of magnitude. Through experiments on ImageNet, we demonstrate that even with simple heuristics we can achieve a 2x reduction in training time and 216x speedup in model search/extraction time compared to the state of the art, without loss of Pareto optimality! We also show that this smaller design space is dense enough to support equally accurate models for a similar diversity of hardware and latency targets, while also reducing the complexity of the training and subsequent extraction algorithms. Our source code is available at https://github.com/gatech-sysml/CompOFA", "keywords": "Efficient Deep Learning;Latency-aware Neural Architecture Search;AutoML", "primary_area": "", "supplementary_material": "", "author": "Manas Sahni;Shreya Varshini;Alind Khare;Alexey Tumanov", "authorids": "~Manas_Sahni1;shreyavarshini@gatech.edu;~Alind_Khare1;atumanov@gatech.edu", "gender": ";;M;", "homepage": "https://sahnimanas.github.io/;;https://www.cc.gatech.edu/~akhare39/;", "dblp": ";;211/0360;", "google_scholar": ";;zOqYHzsAAAAJ;", "orcid": ";;0000-0003-4649-9022;", "linkedin": ";;;", "or_profile": "~Manas_Sahni1;shreyavarshini@gatech.edu;~Alind_Khare1;atumanov@gatech.edu", "aff": "Georgia Institute of Technology;;Georgia Institute of Technology;", "aff_domain": "gatech.edu;;gatech.edu;", "position": "Graduate student;;PhD student;", "bibtex": "@inproceedings{\nsahni2021compofa,\ntitle={Comp{\\{}OFA{\\}} {\\textendash} Compound Once-For-All Networks for Faster Multi-Platform Deployment},\nauthor={Manas Sahni and Shreya Varshini and Alind Khare and Alexey Tumanov},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=IgIk8RRT-Z}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=IgIk8RRT-Z)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;2;3;4", "wc_review": "534;297;281;619", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "875;297;117;510", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 432.75, 146.96662035986267 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 449.75, 282.1890988326799 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.1348399724926484, "gs_citation": 36, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17201248627661355143&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=IgIk8RRT-Z", "email": "gatech.edu;;gatech.edu;", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Georgia Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.gatech.edu", "aff_unique_abbr": "Georgia Tech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "IhUeMfEmexK", "title": "ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for Face Recognition", "track": "main", "status": "Reject", "tldr": "", "abstract": "Knowledge Distillation (KD) refers to transferring knowledge from a large model to a smaller one, which is widely used to enhance model performance in machine learning. It tries to align embedding spaces generated from the teacher and the student model (i.e. to make images corresponding to the same semantics share the same embedding across different models). In this work, we focus on its application in face recognition. We observe that existing knowledge distillation models optimize the proxy tasks that force the student to mimic the teacher\u2019s behavior, instead of directly optimizing the face recognition accuracy. Consequently, the obtained student models are not guaranteed to be optimal on the target task or able to benefit from advanced constraints, such as the large margin constraint (e.g. margin-based softmax). We then propose a novel method named ProxylessKD that directly optimizes face recognition accuracy by inheriting the teacher's classifier as the student's classifier to guide the student to learn discriminative embeddings in the teacher's embedding space. The proposed ProxylessKD is very easy to implement and sufficiently generic to be extended to other tasks beyond face recognition. We conduct extensive experiments on standard face recognition benchmarks, \nand the results demonstrate that ProxylessKD achieves superior performance over existing knowledge distillation methods.", "keywords": "inherited classifier;embedding space alignment;face recognition;knowledge distillation", "primary_area": "", "supplementary_material": "", "author": "Weidong Shi;Guanghui Ren;Yunpeng Chen;Shuicheng YAN", "authorids": "~Weidong_Shi2;~Guanghui_Ren1;~Yunpeng_Chen1;~Shuicheng_YAN3", "gender": "M;M;;M", "homepage": ";http://renguanghui.com;;https://yanshuicheng.ai/", "dblp": "73/5220.html;42/2707;;y/ShuichengYan", "google_scholar": ";oqN1dA8AAAAJ;;https://scholar.google.com.hk/citations?user=DNuiPHwAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Weidong_Shi2;~Guanghui_Ren1;~Yunpeng_Chen1;~Shuicheng_YAN3", "aff": "Northestern University of China;;;sea Group", "aff_domain": "neu.edu.cn;;;sea.com", "position": "MS student;;;Researcher", "bibtex": "@misc{\nshi2021proxylesskd,\ntitle={Proxyless{\\{}KD{\\}}: Direct Knowledge Distillation with Inherited Classifier for Face Recognition},\nauthor={Weidong Shi and Guanghui Ren and Yunpeng Chen and Shuicheng YAN},\nyear={2021},\nurl={https://openreview.net/forum?id=IhUeMfEmexK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=IhUeMfEmexK", "pdf_size": 0, "rating": "4;5;6", "confidence": "4;3;3", "wc_review": "180;259;338", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "479;462;750", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 259.0, 64.50322989329035 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 563.6666666666666, 131.9402221546645 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10898867638970526481&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Northeastern University;Sea Group", "aff_unique_dep": ";", "aff_unique_url": "http://www.neu.edu.cn/;", "aff_unique_abbr": "NEU;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0", "aff_country_unique": "China;" }, { "id": "IjIzIOkK2D6", "title": "Efficient Graph Neural Architecture Search", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recently, graph neural networks (GNN) have been demonstrated effective in various graph-based tasks. \nTo obtain state-of-the-art (SOTA) data-specific GNN architectures, researchers turn to the neural architecture search (NAS) methods. \nHowever, it remains to be a challenging problem to conduct efficient architecture search for GNN.\nIn this work, we present a novel framework for Efficient GrAph Neural architecture search (EGAN).\nBy designing a novel and expressive search space, an efficient one-shot NAS method based on stochastic relaxation and natural gradient is proposed.\nFurther, to enable architecture search in large graphs, a transfer learning paradigm is designed.\nExtensive experiments, including node-level and graph-level tasks, are conducted. The results show that the proposed EGAN can obtain SOTA data-specific architectures, and reduce the search cost by two orders of magnitude compared to existing NAS baselines.", "keywords": "graph neural network;neural architecture search;automated machine learning", "primary_area": "", "supplementary_material": "", "author": "Huan Zhao;Lanning Wei;quanming yao;Zhiqiang He", "authorids": "~Huan_Zhao2;~Lanning_Wei1;~quanming_yao1;hezq@levono.com", "gender": "M;F;M;", "homepage": "https://hzhaoaf.github.io/;https://scholar.google.com/citations?hl=zh-CN&user=pqNTu0MAAAAJ&view_op=list_works&gmla=AJsN-F5e7HFjbb-6m2VSNuXrOozQPnpEUZiGWAK7RNUZphRV1qHwDvCzZVfPXLxVBJSIxWDpYCZHKDzENSq8bIZZiUCHYFycfDo-wBnM5yzV0uFjRM56NL7P6AwVs8NT1MByZRK7AylLyYt9uGsyIfkjT4oWo7JSRg;https://lars-group.github.io/;", "dblp": ";218/2441;158/1014;", "google_scholar": "Odk4NEkAAAAJ;https://scholar.google.com/citations?hl=zh-CN;https://scholar.google.com/schhp?hl=en;", "orcid": "0000-0002-0320-8718;;;", "linkedin": ";;;", "or_profile": "~Huan_Zhao2;~Lanning_Wei1;~quanming_yao1;hezq@levono.com", "aff": "4Paradigm Inc.;Institute of Computing Technology, Chinese Academy of Sciences;4Paradigm Inc.;", "aff_domain": "4paradigm.com;ict.ac.cn;4paradigm.com;", "position": "Senior Researcher;PhD student;Senior Scientist;", "bibtex": "@misc{\nzhao2021efficient,\ntitle={Efficient Graph Neural Architecture Search},\nauthor={Huan Zhao and Lanning Wei and quanming yao and Zhiqiang He},\nyear={2021},\nurl={https://openreview.net/forum?id=IjIzIOkK2D6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=IjIzIOkK2D6", "pdf_size": 0, "rating": "3;5;5;5", "confidence": "5;3;5;4", "wc_review": "659;646;347;1398", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "675;800;533;1130", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;2", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 762.5, 387.5516094664038 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 784.5, 220.71078360605765 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1;0", "aff_unique_norm": "4Paradigm;Chinese Academy of Sciences", "aff_unique_dep": ";Institute of Computing Technology", "aff_unique_url": "https://www.4paradigm.com/;http://www.ict.ac.cn", "aff_unique_abbr": "4Paradigm;CAS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "IkYEJ5Cps5H", "title": "Succinct Network Channel and Spatial Pruning via Discrete Variable QCQP", "track": "main", "status": "Reject", "tldr": "", "abstract": " Reducing the heavy computational cost of large convolutional neural networks is crucial when deploying the networks to resource-constrained environments. In this context, recent works propose channel pruning via greedy channel selection to achieve practical acceleration and memory footprint reduction. We first show this channel-wise approach ignores the inherent quadratic coupling between channels in the neighboring layers and cannot safely remove inactive weights during the pruning procedure. Furthermore, we show that these pruning methods cannot guarantee the given resource constraints are satisfied and cause discrepancy with the true objective. To this end, we formulate a principled optimization framework with discrete variable QCQP, which provably prevents any inactive weights and enables the exact guarantee of meeting the resource constraints in terms of FLOPs and memory. Also, we extend the pruning granularity beyond channels and jointly prune individual 2D convolution filters spatially for greater efficiency. Our experiments show competitive pruning results under the target resource constraints on CIFAR-10 and ImageNet datasets on various network architectures.\n", "keywords": "Network Pruning;Channel pruning;Spatial pruning;Network Compression;MIQCQP;Specified target resource constraint", "primary_area": "", "supplementary_material": "/attachment/d177c5e28d50b406ec51b59701c6b3c3e1e3bcb2.zip", "author": "Yeonwoo Jeong;Deokjae Lee;Gaon An;Changyong Son;Hyun Oh Song", "authorids": "~Yeonwoo_Jeong1;~Deokjae_Lee1;~Gaon_An1;~Changyong_Son1;~Hyun_Oh_Song1", "gender": "M;M;;M;M", "homepage": ";https://badeok0716.github.io;;;https://mllab.snu.ac.kr/hyunoh", "dblp": ";https://dblp.org/rec/conf/aistats/JeongLASS22;241/6191;93/3010;05/10781", "google_scholar": "wSE0nWUAAAAJ;G8JsnZAAAAAJ;;N_hFOdoAAAAJ;ScoZZPsAAAAJ", "orcid": ";;;;", "linkedin": "https://kr.linkedin.com/in/yeonwoo-jeong-068a1113b;;;;hyun-oh-song-5a39b03", "or_profile": "~Yeonwoo_Jeong1;~Deokjae_Lee1;~Gaon_An1;~Changyong_Son1;~Hyun_Oh_Song1", "aff": "Seoul National University;Seoul National University;Seoul National University;;Seoul National University", "aff_domain": "snu.ac.kr;snu.ac.kr;snu.ac.kr;;snu.ac.kr", "position": "PhD student;PhD student;MS student;;Assistant Professor", "bibtex": "@misc{\njeong2021succinct,\ntitle={Succinct Network Channel and Spatial Pruning via Discrete Variable {\\{}QCQP{\\}}},\nauthor={Yeonwoo Jeong and Deokjae Lee and Gaon An and Changyong Son and Hyun Oh Song},\nyear={2021},\nurl={https://openreview.net/forum?id=IkYEJ5Cps5H}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer5;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=IkYEJ5Cps5H", "pdf_size": 0, "rating": "5;5;7;7", "confidence": "2;5;4;3", "wc_review": "265;221;155;275", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "529;1063;528;338", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.0, 1.0 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 229.0, 47.30750469005948 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 614.5, 270.3687297007551 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:IPfjoYQqnrgJ:scholar.google.com/&scioq=Succinct+Network+Channel+and+Spatial+Pruning+via+Discrete+Variable+QCQP&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Seoul National University", "aff_unique_dep": "", "aff_unique_url": "https://www.snu.ac.kr", "aff_unique_abbr": "SNU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "IlJbTsygaI6", "title": "Explainable Reinforcement Learning Through Goal-Based Interpretability", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep Reinforcement Learning agents achieve state-of-the-art performance in many tasks at the cost of making them black-boxes, hard to interpret and understand, making their use difficult in trusted applications, such as robotics or industrial applications. We introduce goal-based interpretability, where the agent produces goals which show the reason for its current actions (reach the current goal) and future goals indicate its desired future behavior without having to run the environment, a useful property in environments with no simulator. Additionally, in many environments, the goals can be visualized to make them easier to understand for non-experts. To have a goal-producing agent without requiring domain knowledge, we use 2-layer hierarchical agents where the top layer produces goals and the bottom layer attempts to reach those goals. \n\nMost classical reinforcement learning algorithms cannot be used train goal-producing hierarchical agents. We introduce a new algorithm to train these more interpretable agents, called HAC-General with Teacher, an extension of the Hindsight Actor-Critic (HAC) algorithm that adds 2 key improvements: (1) the goals now consist of a state $s$ to be reached and a reward $r$ to be collected, making it possible for the goal-producing policy to incentivize the goal-reaching policy to go through high-reward paths and (2) an expert teacher is leveraged to improve the training of the hierarchical agent, in a process similar but distinct to imitation learning and distillation. Contrarily to HAC, there is no requirement that environments need to provide the desired end state. Additionally, our experiments show that it has better performance and learns faster than HAC, and can solve environments that HAC fails to solve.", "keywords": "explainable reinforcement learning;hierarchical reinforcement learning;goal-based interpretability", "primary_area": "", "supplementary_material": "/attachment/d39e0aee6f08a7789d828b3065344fe62227fff0.zip", "author": "Gregory Bonaert;Youri Coppens;Denis Steckelmacher;Ann Nowe", "authorids": "~Gregory_Bonaert1;~Youri_Coppens1;~Denis_Steckelmacher1;~Ann_Nowe1", "gender": ";M;M;F", "homepage": "https://blog.gregbonaert.com/;;http://steckdenis.be;https://ai.vub.ac.be/team/ann-nowe/?utm_source=www.google.com&utm_medium=organic&utm_campaign=Google&referrer-analytics=1", "dblp": ";;173/5198;95/232.html", "google_scholar": ";https://scholar.google.be/citations?user=VQaoS70AAAAJ;;https://scholar.google.be/citations?user=LH5QKbgAAAAJ", "orcid": ";0000-0003-1124-0731;0000-0003-1521-8494;", "linkedin": "gregory-bonaert/;youricoppens/;;", "or_profile": "~Gregory_Bonaert1;~Youri_Coppens1;~Denis_Steckelmacher1;~Ann_Nowe1", "aff": ";Universit\u00e9 libre de Bruxelles;Vrije Universiteit Brussel;Vrije Universiteit Brussel", "aff_domain": ";ulb.ac.be;vub.be;vub.be", "position": ";PhD student;Postdoc;Full Professor", "bibtex": "@misc{\nbonaert2021explainable,\ntitle={Explainable Reinforcement Learning Through Goal-Based Interpretability},\nauthor={Gregory Bonaert and Youri Coppens and Denis Steckelmacher and Ann Nowe},\nyear={2021},\nurl={https://openreview.net/forum?id=IlJbTsygaI6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer5;AnonReviewer4", "site": "https://openreview.net/forum?id=IlJbTsygaI6", "pdf_size": 0, "rating": "3;3;3;4", "confidence": "4;5;4;4", "wc_review": "470;687;601;570", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "718;794;811;538", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.25, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 582.0, 77.57899200170108 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 715.25, 108.16047106036474 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14875369964933197111&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1", "aff_unique_norm": "Universit\u00e9 Libre de Bruxelles;Vrije Universiteit Brussel", "aff_unique_dep": ";", "aff_unique_url": "https://www.ulb.ac.be;https://www.vub.be", "aff_unique_abbr": "ULB;VUB", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Brussels", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Belgium" }, { "id": "Im43P9kuaeP", "title": "Certified Watermarks for Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models -- in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose the first certifiable watermarking method. Using the randomized smoothing technique proposed in Chiang et al., we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain $\\ell_2$ threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods.", "keywords": "certified defense;watermarking;backdoor attack", "primary_area": "", "supplementary_material": "", "author": "Arpit Amit Bansal;Ping-yeh Chiang;Michael Curry;Hossein Souri;Rama Chellappa;John P Dickerson;Rajiv Jain;Tom Goldstein", "authorids": "~Arpit_Amit_Bansal1;~Ping-yeh_Chiang1;~Michael_Curry2;~Hossein_Souri1;~Rama_Chellappa1;~John_P_Dickerson1;~Rajiv_Jain1;~Tom_Goldstein1", "gender": "M;;M;M;;M;M;M", "homepage": "https://arpitbansal297.github.io/;;https://currymj.github.io;https://hsouri.github.io/;;https://jpdickerson.com/;;https://www.cs.umd.edu/~tomg/", "dblp": "190/9114;236/4288;255/4719;250/2286;;75/8479;;25/8184", "google_scholar": "Pchxm4IAAAAJ;WUoMq1IAAAAJ;EOlowBUAAAAJ;rurbhy0AAAAJ;;https://scholar.google.com.tw/citations?user=QgDpfCQAAAAJ;https://scholar.google.com/;KmSuVtgAAAAJ", "orcid": ";;;0000-0001-5264-798X;;0000-0003-2231-680X;;", "linkedin": "arpit-bansal-970865b1/;;;hossein-souri-b7574795/;;john-dickerson-83a74a7/;;", "or_profile": "~Arpit_Amit_Bansal1;~Ping-yeh_Chiang1;~Michael_Curry2;~Hossein_Souri1;~Rama_Chellappa1;~John_P_Dickerson1;~Rajiv_Jain1;~Tom_Goldstein1", "aff": "University of Maryland, College Park;University of Maryland, College Park;SalesForce.com;Johns Hopkins University;;University of Maryland, College Park;Adobe Systems;University of Maryland, College Park", "aff_domain": "umd.edu;umd.edu;salesforce.com;jhu.edu;;umd.edu;adobe.com;umd.edu", "position": "PhD student;PhD student;Research Intern;PhD student;;Assistant Professor;Senior Research Scientist;Associate Professor", "bibtex": "@misc{\nbansal2021certified,\ntitle={Certified Watermarks for Neural Networks},\nauthor={Arpit Amit Bansal and Ping-yeh Chiang and Michael Curry and Hossein Souri and Rama Chellappa and John P Dickerson and Rajiv Jain and Tom Goldstein},\nyear={2021},\nurl={https://openreview.net/forum?id=Im43P9kuaeP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=Im43P9kuaeP", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;4;4", "wc_review": "435;330;272;251", "wc_reply_reviewers": "0;75;61;0", "wc_reply_authors": "417;859;567;385", "reply_reviewers": "0;1;1;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 322.0, 71.36876067300034 ], "wc_reply_reviewers_avg": [ 34.0, 34.35840508521896 ], "wc_reply_authors_avg": [ 557.0, 187.40864441108366 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;0;1;2;0;3;0", "aff_unique_norm": "University of Maryland;Salesforce;Johns Hopkins University;Adobe", "aff_unique_dep": ";;;Adobe Systems Incorporated", "aff_unique_url": "https://www/umd.edu;https://www.salesforce.com;https://www.jhu.edu;https://www.adobe.com", "aff_unique_abbr": "UMD;Salesforce;JHU;Adobe", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "College Park;", "aff_country_unique_index": "0;0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "InGI-IMDL18", "title": "Secure Federated Learning of User Verification Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "We consider the problem of training User Verification (UV) models in federated setup, where the conventional loss functions are not applicable due to the constraints that each user has access to the data of only one class and user embeddings cannot be shared with the server or other users. To address this problem, we propose Federated User Verification (FedUV), a framework for private and secure training of UV models. In FedUV, users jointly learn a set of vectors and maximize the correlation of their instance embeddings with a secret user-defined linear combination of those vectors. We show that choosing the linear combinations from the codewords of an error-correcting code allows users to collaboratively train the model without revealing their embedding vectors. We present the experimental results for user verification with voice, face, and handwriting data and show that FedUV is on par with existing approaches, while not sharing the embeddings with other users or the server.", "keywords": "Federated learning;User verification models", "primary_area": "", "supplementary_material": "", "author": "Hossein Hosseini;Hyunsin Park;Sungrack Yun;Christos Louizos;Joseph Soriaga;Max Welling", "authorids": "~Hossein_Hosseini4;hyunsinp@qti.qualcomm.com;~Sungrack_Yun1;~Christos_Louizos1;jsoriaga@qti.qualcomm.com;mwelling@qti.qualcomm.com", "gender": ";;M;;;", "homepage": ";;;;;", "dblp": ";;67/8053;;;", "google_scholar": ";;;;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Hossein_Hosseini4;hyunsinp@qti.qualcomm.com;~Sungrack_Yun1;~Christos_Louizos1;jsoriaga@qti.qualcomm.com;mwelling@qti.qualcomm.com", "aff": ";;Qualcomm;;;", "aff_domain": ";;qualcomm.com;;;", "position": ";;Researcher;;;", "bibtex": "@misc{\nhosseini2021secure,\ntitle={Secure Federated Learning of User Verification Models},\nauthor={Hossein Hosseini and Hyunsin Park and Sungrack Yun and Christos Louizos and Joseph Soriaga and Max Welling},\nyear={2021},\nurl={https://openreview.net/forum?id=InGI-IMDL18}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=InGI-IMDL18", "pdf_size": 0, "rating": "2;6;7;8", "confidence": "4;4;5;3", "wc_review": "306;617;174;450", "wc_reply_reviewers": "712;0;0;0", "wc_reply_authors": "1239;142;0;500", "reply_reviewers": "2;0;0;0", "reply_authors": "3;1;0;1", "rating_avg": [ 5.75, 2.277608394786075 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 386.75, 164.92327761719994 ], "wc_reply_reviewers_avg": [ 178.0, 308.3050437472602 ], "wc_reply_authors_avg": [ 470.25, 479.7772269501753 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.25, 1.0897247358851685 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.15523010514126656, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:3lII0frq4vQJ:scholar.google.com/&scioq=Secure+Federated+Learning+of+User+Verification+Models&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Qualcomm Incorporated", "aff_unique_dep": "", "aff_unique_url": "https://www.qualcomm.com", "aff_unique_abbr": "Qualcomm", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "IneoHhrfv5", "title": "Everybody's Talkin': Let Me Talk as You Want", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We present a method to edit a target portrait footage by taking a sequence of audio as input to synthesize a photo-realistic video. This method is unique because it is highly dynamic. It does not assume a person-specific rendering network yet capable of translating one source audio into one random chosen video output within a set of speech videos. Instead of learning a highly heterogeneous and nonlinear mapping from audio to the video directly, we first factorize each target video frame into orthogonal parameter spaces, i.e., expression, geometry, and pose, via monocular 3D face reconstruction. Next, a recurrent network is introduced to translate source audio into expression parameters that are primarily related to the audio content. The audio-translated expression parameters are then used to synthesize a photo-realistic human subject in each video frame, with the movement of the mouth regions precisely mapped to the source audio. The geometry and pose parameters of the target human portrait are retained, therefore preserving the con-text of the original video footage. Finally, we introduce a novel video rendering network and a dynamic programming method to construct a temporally coherent and photo-realistic video. Extensive experiments demonstrate the superiority of our method over existing approaches. Our method is end-to-end learnable and robust to voice variations in the source audio.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/11656c0208adeaf77aecdd1412f2e2a52e5456e0.zip", "author": "Linsen Song;Wayne Wu;Chen Qian;Ran He;Chen Change Loy", "authorids": "~Linsen_Song1;~Wayne_Wu1;~Chen_Qian1;~Ran_He1;~Chen_Change_Loy2", "gender": "M;;M;M;M", "homepage": ";;;https://rhe-web.github.io/;https://www.mmlab-ntu.com/person/ccloy/index.html", "dblp": "206/5247;;;61/6198-1;01/5855", "google_scholar": ";;AerkT0YAAAAJ;ayrg9AUAAAAJ;https://scholar.google.co.uk/citations?user=559LF80AAAAJ", "orcid": ";;;0000-0002-3807-991X;0000-0001-5345-1591", "linkedin": ";;;;", "or_profile": "~Linsen_Song1;~Wayne_Wu1;~Chen_Qian1;~Ran_He1;~Chen_Change_Loy2", "aff": "Institute of Automation, Chinese Academy of Sciences;;Tsinghua University;Institute of Automation, Chinese Academy of Sciences;Nanyang Technological University", "aff_domain": "ia.ac.cn;;mails.tsinghua.edu.cn;ia.ac.cn;ntu.edu.sg", "position": "MS student;;PhD student;Full Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=IneoHhrfv5", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "5;4;4;4", "wc_review": "451;686;761;558", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 614.0, 118.84653970562205 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 159, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17749484966340439703&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "Chinese Academy of Sciences;Tsinghua University;Nanyang Technological University", "aff_unique_dep": "Institute of Automation;;", "aff_unique_url": "http://www.ia.cas.cn;https://www.tsinghua.edu.cn;https://www.ntu.edu.sg", "aff_unique_abbr": "CAS;THU;NTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "China;Singapore" }, { "id": "Io8oYQb4LRK", "title": "Non-greedy Gradient-based Hyperparameter Optimization Over Long Horizons", "track": "main", "status": "Reject", "tldr": "", "abstract": "Gradient-based meta-learning has earned a widespread popularity in few-shot learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn meta-parameters online, but this introduces greediness which comes with a significant performance drop. In this work, we enable non-greedy meta-learning of hyperparameters over long horizons by sharing hyperparameters that are contiguous in time, and using the sign of hypergradients rather than their magnitude to indicate convergence. We implement this with forward-mode differentiation, which we extend to the popular momentum-based SGD optimizer. We demonstrate that the hyperparameters of this optimizer can be learned non-greedily without gradient degradation over $\\sim 10^4$ inner gradient steps, by only requiring $\\sim 10$ outer gradient steps. On CIFAR-10, we outperform greedy and random search methods for the same computational budget by nearly $10\\%$. Code will be available upon publication.", "keywords": "Hyperparameter optimization;Meta-learning", "primary_area": "", "supplementary_material": "/attachment/7384f79e20c23447211ca171930482192c531e65.zip", "author": "Paul Micaelli;Amos Storkey", "authorids": "~Paul_Micaelli1;~Amos_Storkey1", "gender": "M;Not Specified", "homepage": ";http://homepages.inf.ed.ac.uk/amos/", "dblp": ";", "google_scholar": "https://scholar.google.co.uk/citations?user=YCeFEJAAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Paul_Micaelli1;~Amos_Storkey1", "aff": "University of Edinburgh;University of Edinburgh", "aff_domain": "ed.ac.uk;ed.ac.uk", "position": "PhD student;Full Professor", "bibtex": "@misc{\nmicaelli2021nongreedy,\ntitle={Non-greedy Gradient-based Hyperparameter Optimization Over Long Horizons},\nauthor={Paul Micaelli and Amos Storkey},\nyear={2021},\nurl={https://openreview.net/forum?id=Io8oYQb4LRK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=Io8oYQb4LRK", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "2;4;2;2", "wc_review": "495;387;421;97", "wc_reply_reviewers": "0;0;161;0", "wc_reply_authors": "476;549;730;42", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 2.5, 0.8660254037844386 ], "wc_review_avg": [ 350.0, 151.19854496654392 ], "wc_reply_reviewers_avg": [ 40.25, 69.71504500464731 ], "wc_reply_authors_avg": [ 449.25, 252.6552740395498 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 20, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9570017519366621905&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Edinburgh", "aff_unique_dep": "", "aff_unique_url": "https://www.ed.ac.uk", "aff_unique_abbr": "Edinburgh", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "IohHac70h3R", "title": "On the Marginal Regret Bound Minimization of Adaptive Methods", "track": "main", "status": "Reject", "tldr": "", "abstract": "Numerous adaptive algorithms such as AMSGrad and Radam have been proposed and applied to deep learning recently. However, these modifications do not improve the convergence rate of adaptive algorithms and whether a better algorithm exists still remains an open question. In this work, we propose a new motivation for designing the proximal function of adaptive algorithms, named as marginal regret bound minimization. Based on such an idea, we propose a new class of adaptive algorithms that not only achieves marginal optimality but can also potentially converge much faster than any existing adaptive algorithms in the long term. We show the superiority of the new class of adaptive algorithms both theoretically and empirically using experiments in deep learning. ", "keywords": "Optimization Algorithm;Adaptive algorithms;Online Learning;Regret Minimization", "primary_area": "", "supplementary_material": "", "author": "Wenjie Li;Guang Cheng", "authorids": "~Wenjie_Li2;~Guang_Cheng1", "gender": "M;M", "homepage": "https://williamlwj.github.io/About//;http://www.stat.ucla.edu/~guangcheng/", "dblp": "33/3999;99/4812", "google_scholar": "4jlUpjEAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Wenjie_Li2;~Guang_Cheng1", "aff": "Purdue University;University of California, Los Angeles", "aff_domain": "purdue.edu;ucla.edu", "position": "PhD student;Full Professor", "bibtex": "@misc{\nli2021on,\ntitle={On the Marginal Regret Bound Minimization of Adaptive Methods},\nauthor={Wenjie Li and Guang Cheng},\nyear={2021},\nurl={https://openreview.net/forum?id=IohHac70h3R}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=IohHac70h3R", "pdf_size": 0, "rating": "3;4;5;5;8", "confidence": "4;4;3;3;3", "wc_review": "784;279;417;191;257", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "369;489;245;290;198", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.0, 1.6733200530681511 ], "confidence_avg": [ 3.4, 0.4898979485566356 ], "wc_review_avg": [ 385.6, 212.357811252612 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 318.2, 102.32770885737646 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7319250547114, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:PMWzONeEpiUJ:scholar.google.com/&scioq=On+the+Marginal+Regret+Bound+Minimization+of+Adaptive+Methods&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Purdue University;University of California, Los Angeles", "aff_unique_dep": ";", "aff_unique_url": "https://www.purdue.edu;https://www.ucla.edu", "aff_unique_abbr": "Purdue;UCLA", "aff_campus_unique_index": "1", "aff_campus_unique": ";Los Angeles", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "Ip195saXqIX", "title": "Knowledge Distillation By Sparse Representation Matching", "track": "main", "status": "Reject", "tldr": "", "abstract": "Knowledge Distillation refers to a class of methods that transfers the knowledge from a teacher network to a student network. In this paper, we propose Sparse Representation Matching (SRM), a method to transfer intermediate knowledge obtained from one Convolutional Neural Network (CNN) to another by utilizing sparse representation learning. SRM first extracts sparse representations of the hidden features of the teacher CNN, which are then used to generate both pixel-level and image-level labels for training intermediate feature maps of the student network. We formulate SRM as a neural processing block, which can be efficiently optimized using stochastic gradient descent and integrated into any CNN in a plug-and-play manner. Our experiments demonstrate that SRM is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets. ", "keywords": "Knowledge Distillation;Sparse Representation;Transfer Learning", "primary_area": "", "supplementary_material": "/attachment/32a5a0efbe254dfbdf838f99f59128c85c5470b0.zip", "author": "Dat Thanh Tran;Moncef Gabbouj;Alexandros Iosifidis", "authorids": "~Dat_Thanh_Tran1;~Moncef_Gabbouj1;~Alexandros_Iosifidis2", "gender": "M;M;M", "homepage": ";https://www.tuni.fi/en/moncef-gabbouj;https://www.tuni.fi/en/people/alexandros-iosifidis", "dblp": "https://dblp.uni-trier.de/pers/hd/t/Tran:Dat_Thanh;08/6597;01/9539", "google_scholar": "tkOko_QAAAAJ;cHukfSUAAAAJ;KjsL0KEAAAAJ", "orcid": "0000-0002-5922-3458;0000-0002-9788-2323;0000-0003-4807-1345", "linkedin": ";moncef-gabbouj-2186282/?originalSubdomain=fi;", "or_profile": "~Dat_Thanh_Tran1;~Moncef_Gabbouj1;~Alexandros_Iosifidis2", "aff": "Tampere University;Tampere University;Aarhus University", "aff_domain": "tuni.fi;tuni.fi;au.dk", "position": "researcher;Full Professor;Associate Professor", "bibtex": "@misc{\ntran2021knowledge,\ntitle={Knowledge Distillation By Sparse Representation Matching},\nauthor={Dat Thanh Tran and Moncef Gabbouj and Alexandros Iosifidis},\nyear={2021},\nurl={https://openreview.net/forum?id=Ip195saXqIX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=Ip195saXqIX", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;5;5;3", "wc_review": "863;383;381;563", "wc_reply_reviewers": "486;0;0;365", "wc_reply_authors": "2110;597;862;1695", "reply_reviewers": "1;0;0;3", "reply_authors": "3;1;2;5", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 547.5, 196.57250570718173 ], "wc_reply_reviewers_avg": [ 212.75, 217.00849637744602 ], "wc_reply_authors_avg": [ 1316.0, 611.7912225588072 ], "reply_reviewers_avg": [ 1.0, 1.224744871391589 ], "reply_authors_avg": [ 2.75, 1.479019945774904 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.0909090909090909, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:gNEEULI2TwMJ:scholar.google.com/&scioq=Knowledge+Distillation+By+Sparse+Representation+Matching&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Tampere University;Aarhus University", "aff_unique_dep": ";", "aff_unique_url": "https://www.tuni.fi;https://au.dk", "aff_unique_abbr": "Tuni;AU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "Finland;Denmark" }, { "id": "IpPQmzj4T_", "title": "Teleport Graph Convolutional Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We consider the limitations in message-passing graph neural networks. In message-passing operations, each node aggregates information from its neighboring nodes. To enlarge the receptive field, graph neural networks need to stack multiple message-passing graph convolution layers, which leads to the over-fitting issue and over-smoothing issue. To address these limitations, we propose a teleport graph convolution layer (TeleGCL) that uses teleport functions to enable each node to aggregate information from a much larger neighborhood. For each node, teleport functions select relevant nodes beyond the local neighborhood, thereby resulting in a larger receptive field. To apply our structure-aware teleport function, we propose a novel method to construct structural features for nodes in the graph. Based on our TeleGCL, we build a family of teleport graph convolutional networks. The empirical results on graph and node classification tasks demonstrate the effectiveness of our proposed methods.", "keywords": "over-smoothing", "primary_area": "", "supplementary_material": "", "author": "Hongyang Gao;Shuiwang Ji", "authorids": "~Hongyang_Gao1;~Shuiwang_Ji1", "gender": "M;M", "homepage": "https://faculty.sites.iastate.edu/hygao/;http://people.tamu.edu/~sji", "dblp": "200/7985;84/6405", "google_scholar": "jGmq0aEAAAAJ;BZGj6sAAAAAJ", "orcid": "0000-0002-9020-9080;0000-0002-4205-4563", "linkedin": "hongyang-gao-74924690/;shuiwang-ji-9a040715/", "or_profile": "~Hongyang_Gao1;~Shuiwang_Ji1", "aff": "Iowa State University;Texas A&M University", "aff_domain": "iastate.edu;tamu.edu", "position": "Assistant Professor;Associate Professor", "bibtex": "@misc{\ngao2021teleport,\ntitle={Teleport Graph Convolutional Networks},\nauthor={Hongyang Gao and Shuiwang Ji},\nyear={2021},\nurl={https://openreview.net/forum?id=IpPQmzj4T_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=IpPQmzj4T_", "pdf_size": 0, "rating": "3;5;5;5", "confidence": "3;4;4;5", "wc_review": "639;374;596;301", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 477.5, 143.1686068941093 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:cv4mp7cnS7YJ:scholar.google.com/&scioq=Teleport+Graph+Convolutional+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Iowa State University;Texas A&M University", "aff_unique_dep": ";", "aff_unique_url": "https://www.iastate.edu;https://www.tamu.edu", "aff_unique_abbr": "ISU;TAMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "IpsTSvfIB6", "title": "Approximate Birkhoff-von-Neumann decomposition: a differentiable approach", "track": "main", "status": "Reject", "tldr": "", "abstract": "The Birkhoff-von-Neumann (BvN) decomposition is a standard tool used to draw permutation matrices from a doubly stochastic (DS) matrix. The BvN decomposition represents such a DS matrix as a convex combination of several permutation matrices. Currently, most algorithms to compute the BvN decomposition employ either greedy strategies or custom-made heuristics. In this paper, we present a novel differentiable approach to approximate the BvN decomposition. Our algorithm builds upon recent advances in Riemannian optimization on Birkhoff polytopes. We offer an empirical evaluation of this approach in the fairness of exposure in rankings, where we show that the outcome of our method behaves similarly to greedy algorithms. Our approach is an excellent addition to existing methods for sampling from DS matrices, such as sampling from a Gumbel-Sinkhorn distribution. However, our approach is better suited for applications where the latency in prediction time is a constraint. Indeed, we can generally precompute an approximated BvN decomposition offline. Then, we select a permutation matrix at random with probability proportional to its coefficient. Finally, we provide an implementation of our method.", "keywords": "Birkhoff-von-Neumann decomposition;doubly stochastic matrices;Riemannian optimization;Fairness exposure in ranking", "primary_area": "", "supplementary_material": "", "author": "Andr\u00e9s Hoyos-Idrobo", "authorids": "~Andr\u00e9s_Hoyos-Idrobo1", "gender": "M", "homepage": "", "dblp": "", "google_scholar": "J3344dQAAAAJ", "orcid": "0000-0003-1729-1927", "linkedin": "andres-hoyos-idrobo-85b42024/", "or_profile": "~Andr\u00e9s_Hoyos-Idrobo1", "aff": "Rakuten Institute of Technology, The University of Tokyo", "aff_domain": "rakuten.co.jp", "position": "Postdoc", "bibtex": "@misc{\nhoyos-idrobo2021approximate,\ntitle={Approximate Birkhoff-von-Neumann decomposition: a differentiable approach},\nauthor={Andr{\\'e}s Hoyos-Idrobo},\nyear={2021},\nurl={https://openreview.net/forum?id=IpsTSvfIB6}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=IpsTSvfIB6", "pdf_size": 0, "rating": "4;4;5", "confidence": "1;3;3", "wc_review": "740;764;448", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 2.3333333333333335, 0.9428090415820634 ], "wc_review_avg": [ 650.6666666666666, 143.64152912333148 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17956721600392867406&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "Rakuten Institute of Technology", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "id": "IqVB8e0DlUd", "title": "Fair Differential Privacy Can Mitigate the Disparate Impact on Model Accuracy", "track": "main", "status": "Reject", "tldr": "", "abstract": "The techniques based on the theory of differential privacy (DP) has become a standard building block in the machine learning community. DP training mechanisms offer strong guarantees that an adversary cannot determine with high confidence about the training data based on analyzing the released model, let alone any details of the instances. However, DP may disproportionately affect the underrepresented and relatively complicated classes. That is, the reduction in utility is unequal for each class. This paper proposes a fair differential privacy algorithm (FairDP) to mitigate the disparate impact on model accuracy for each class. We cast the learning procedure as a two-stage optimization problem, which integrates differential privacy with fairness. FairDP establishes a self-adaptive DP mechanism and dynamically adjusts instance influence in each class depending on the theoretical bias-variance bound. Our experimental evaluation shows the effectiveness of FairDP in mitigating the disparate impact on model accuracy among the classes on several benchmark datasets and scenarios ranging from text to vision.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/d0a3b9828d138b4f7ba5d0488a63d44610e803a4.zip", "author": "Wenyan Liu;Xiangfeng Wang;Xingjian Lu;Junhong Cheng;Bo Jin;Xiaoling Wang;Hongyuan Zha", "authorids": "~Wenyan_Liu1;~Xiangfeng_Wang1;xjlu@cs.ecnu.edu.cn;jhcheng@stu.ecnu.edu.cn;~Bo_Jin1;xlwang@cs.ecnu.edu.cn;~Hongyuan_Zha1", "gender": "F;M;;;;;", "homepage": ";https://xfwang87.github.io/;;;;;", "dblp": ";84/4695;;;;;z/HongyuanZha", "google_scholar": "a8sqKFkAAAAJ;YpGMkgsAAAAJ;;;;;n1DQMIsAAAAJ", "orcid": ";;;;;;", "linkedin": "wenyan-liu/;;;;;;", "or_profile": "~Wenyan_Liu1;~Xiangfeng_Wang1;xjlu@cs.ecnu.edu.cn;jhcheng@stu.ecnu.edu.cn;~Bo_Jin1;xlwang@cs.ecnu.edu.cn;~Hongyuan_Zha1", "aff": "East China Normal University;East China Normal University;;;;;The Chinese University of Hong Kong, Shenzhen", "aff_domain": "ecnu.edu.cn;ecnu.edu.cn;;;;;cuhk.edu.cn", "position": "PhD student;Associate Professor;;;;;Full Professor", "bibtex": "@misc{\nliu2021fair,\ntitle={Fair Differential Privacy Can Mitigate the Disparate Impact on Model Accuracy},\nauthor={Wenyan Liu and Xiangfeng Wang and Xingjian Lu and Junhong Cheng and Bo Jin and Xiaoling Wang and Hongyuan Zha},\nyear={2021},\nurl={https://openreview.net/forum?id=IqVB8e0DlUd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=IqVB8e0DlUd", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "3;4;4;3", "wc_review": "379;298;467;306", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "99;105;100;207", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 362.5, 68.09001395212076 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 127.75, 45.811434162226355 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8110138492650262000&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;1", "aff_unique_norm": "East China Normal University;Chinese University of Hong Kong", "aff_unique_dep": ";", "aff_unique_url": "http://www.ecnu.edu.cn;https://www.cuhk.edu.cn", "aff_unique_abbr": "ECNU;CUHK", "aff_campus_unique_index": "1", "aff_campus_unique": ";Shenzhen", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "IqZpoAAt2oQ", "title": "Function Contrastive Learning of Transferable Representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Few-shot-learning seeks to find models that are capable of fast-adaptation to novel tasks which are not encountered during training. Unlike typical few-shot learning algorithms, we propose a contrastive learning method which is not trained to solve a set of tasks, but rather attempts to find a good representation of the underlying data-generating processes (\\emph{functions}). This allows for finding representations which are useful for an entire series of tasks sharing the same function. In particular, our training scheme is driven by the self-supervision signal indicating whether two sets of samples stem from the same underlying function. Our experiments on a number of synthetic and real-world datasets show that the representations we obtain can outperform strong baselines in terms of downstream performance and noise robustness, even when these baselines are trained in an end-to-end manner.", "keywords": "Representations Learning;Few-shot Learning;Contrastive Learning", "primary_area": "", "supplementary_material": "", "author": "Muhammad Waleed Gondal;Shruti Joshi;Nasim Rahaman;Stefan Bauer;Manuel Wuthrich;Bernhard Sch\u00f6lkopf", "authorids": "~Muhammad_Waleed_Gondal1;shruti.joshi@tuebingen.mpg.de;~Nasim_Rahaman1;~Stefan_Bauer1;~Manuel_Wuthrich1;~Bernhard_Sch\u00f6lkopf1", "gender": "M;;M;;M;", "homepage": "https://www.is.mpg.de/person/wgondal;;;https://cifar.ca/bios/stefan-bauer/;;", "dblp": ";;222/3165;;https://dblp.uni-trier.de/pers/hd/w/W=uuml=thrich:Manuel;", "google_scholar": "https://scholar.google.de/citations?user=KJTsSAQAAAAJ;;https://scholar.google.de/citations?user=iH9DuY0AAAAJ;O-oICE8AAAAJ;;", "orcid": ";;;;;", "linkedin": ";;https://de.linkedin.com/in/nasim-rahaman/de;;;", "or_profile": "~Muhammad_Waleed_Gondal1;shruti.joshi@tuebingen.mpg.de;~Nasim_Rahaman1;~Stefan_Bauer1;~Manuel_Wuthrich1;~Bernhard_Sch\u00f6lkopf1", "aff": "Max Planck Institute for Intelligent Systems, Max-Planck Institute;;Max Planck Institute for Intelligent Systems, Max-Planck Institute;Max Planck Institute for Intelligent Systems, Max-Planck Institute;Max Planck Institute for Intelligent Systems;", "aff_domain": "tuebingen.mpg.de;;tuebingen.mpg.de;tuebingen.mpg.de;mpg.tuebingen.de;", "position": "PhD student;;PhD student;Research Group Leader;Postdoc;", "bibtex": "@misc{\ngondal2021function,\ntitle={Function Contrastive Learning of Transferable Representations},\nauthor={Muhammad Waleed Gondal and Shruti Joshi and Nasim Rahaman and Stefan Bauer and Manuel Wuthrich and Bernhard Sch{\\\"o}lkopf},\nyear={2021},\nurl={https://openreview.net/forum?id=IqZpoAAt2oQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=IqZpoAAt2oQ", "pdf_size": 0, "rating": "5;5;5;5", "confidence": "3;3;4;4", "wc_review": "601;411;383;215", "wc_reply_reviewers": "244;0;0;0", "wc_reply_authors": "1188;1237;1072;794", "reply_reviewers": "2;0;0;0", "reply_authors": "3;3;3;2", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 402.5, 136.9406805883482 ], "wc_reply_reviewers_avg": [ 61.0, 105.65509926170151 ], "wc_reply_authors_avg": [ 1072.75, 171.72852849774262 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.75, 0.4330127018922193 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6300298791222318559&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Max Planck Institute for Intelligent Systems", "aff_unique_dep": "Intelligent Systems", "aff_unique_url": "https://www.mpi-is.mpg.de", "aff_unique_abbr": "MPI-IS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Germany" }, { "title": "TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3324", "id": "IqtonxWI0V3", "poster": "", "openreview": "https://openreview.net/forum?id=IqtonxWI0V3", "slides": "https://iclr.cc/virtual/2021/poster/3324", "video": "https://iclr.cc/virtual/2021/poster/3324", "author_site": "Martin Trimmel, Henning Petzka, Cristian Sminchisescu", "tldr": "", "abstract": "Deep neural networks with rectified linear (ReLU) activations are piecewise linear functions, where hyperplanes partition the input space into an astronomically high number of linear regions. Previous work focused on counting linear regions to measure the network's expressive power and on analyzing geometric properties of the hyperplane configurations. In contrast, we aim to understand the impact of the linear terms on network performance, by examining the information encoded in their coefficients. To this end, we derive TropEx, a nontrivial tropical algebra-inspired algorithm to systematically extract linear terms based on data. Applied to convolutional and fully-connected networks, our algorithm uncovers significant differences in how the different networks utilize linear regions for generalization. This underlines the importance of systematic linear term exploration, to better understand generalization in neural networks trained with complex data sets.", "keywords": "linear regions;linear terms;deep learning theory;deep neural networks;rectified linear unit;relu network;piecewise linear function;tropical function", "primary_area": "", "supplementary_material": "/attachment/2c52c79903dc90d5adddc34b9866609cec3c215f.zip", "author": "Martin Trimmel;Henning Petzka;Cristian Sminchisescu", "authorids": "~Martin_Trimmel1;~Henning_Petzka1;~Cristian_Sminchisescu1", "gender": "M;M;", "homepage": "http://www.maths.lth.se/sminchisescu/research/profile/8/martin-trimmel;;http://www.maths.lth.se/sminchisescu/", "dblp": "295/5264;206/6748;96/3826", "google_scholar": "qLHPersAAAAJ;https://scholar.google.se/citations?hl=en;https://scholar.google.se/citations?hl=en", "orcid": "0000-0001-5991-9845;;", "linkedin": ";;", "or_profile": "~Martin_Trimmel1;~Henning_Petzka1;~Cristian_Sminchisescu1", "aff": "Lund University / Lund Institute of Technology;Lund University;Lund University", "aff_domain": "lth.se;math.lth.se;lth.se", "position": "PhD student;Postdoc;Professor", "bibtex": "@inproceedings{\ntrimmel2021tropex,\ntitle={TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks},\nauthor={Martin Trimmel and Henning Petzka and Cristian Sminchisescu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=IqtonxWI0V3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "3;3;3;3", "wc_review": "453;324;560;289", "wc_reply_reviewers": "0;207;0;0", "wc_reply_authors": "381;1023;583;471", "reply_reviewers": "0;2;0;0", "reply_authors": "1;3;1;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 406.5, 107.63015376742709 ], "wc_reply_reviewers_avg": [ 51.75, 89.6336292916894 ], "wc_reply_authors_avg": [ 614.5, 246.46450048637837 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11934097320371663387&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=IqtonxWI0V3", "email": "lth.se;math.lth.se;lth.se", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Lund University", "aff_unique_dep": "Lund Institute of Technology", "aff_unique_url": "https://www.lunduniversity.lu.se", "aff_unique_abbr": "LU", "aff_campus_unique_index": "0", "aff_campus_unique": "Lund;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Sweden" }, { "title": "On the role of planning in model-based deep reinforcement learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3187", "id": "IrM64DGB21", "poster": "", "openreview": "https://openreview.net/forum?id=IrM64DGB21", "slides": "https://iclr.cc/virtual/2021/poster/3187", "video": "https://iclr.cc/virtual/2021/poster/3187", "author_site": "Jessica Hamrick, Abram Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veli\u010dkovi\u0107, Theophane Weber", "tldr": "", "abstract": "Model-based planning is often thought to be necessary for deep, careful reasoning and generalization in artificial agents. While recent successes of model-based reinforcement learning (MBRL) with deep function approximation have strengthened this hypothesis, the resulting diversity of model-based methods has also made it difficult to track which components drive success and why. In this paper, we seek to disentangle the contributions of recent methods by focusing on three questions: (1) How does planning benefit MBRL agents? (2) Within planning, what choices drive performance? (3) To what extent does planning improve generalization? To answer these questions, we study the performance of MuZero (Schrittwieser et al., 2019), a state-of-the-art MBRL algorithm with strong connections and overlapping components with many other MBRL algorithms. We perform a number of interventions and ablations of MuZero across a wide range of environments, including control tasks, Atari, and 9x9 Go. Our results suggest the following: (1) Planning is most useful in the learning process, both for policy updates and for providing a more useful data distribution. (2) Using shallow trees with simple Monte-Carlo rollouts is as performant as more complex methods, except in the most difficult reasoning tasks. (3) Planning alone is insufficient to drive strong generalization. These results indicate where and how to utilize planning in reinforcement learning settings, and highlight a number of open questions for future MBRL research.", "keywords": "model-based RL;planning;MuZero", "primary_area": "", "supplementary_material": "", "author": "Jessica B Hamrick;Abram L. Friesen;Feryal Behbahani;Arthur Guez;Fabio Viola;Sims Witherspoon;Thomas Anthony;Lars Holger Buesing;Petar Veli\u010dkovi\u0107;Theophane Weber", "authorids": "~Jessica_B_Hamrick1;~Abram_L._Friesen1;~Feryal_Behbahani1;~Arthur_Guez1;~Fabio_Viola2;switherspoon@google.com;~Thomas_Anthony1;~Lars_Holger_Buesing1;~Petar_Veli\u010dkovi\u01071;~Theophane_Weber1", "gender": "F;M;F;M;;;;M;M;M", "homepage": "http://www.jesshamrick.com;http://www.abramfriesen.com;https://feryal.github.io;https://www.gatsby.ucl.ac.uk/~aguez/;;;;;https://petar-v.com;http://www.thphn.com/", "dblp": "155/1885;47/11107;;;;;169/3283;https://dblp.uni-trier.de/pers/hd/b/Buesing:Lars;184/4786.html;", "google_scholar": "2ylcZSsAAAAJ;sfvCNiEAAAAJ;;https://scholar.google.co.uk/citations?user=iyD9aw8AAAAJ;;;;1h_mxPMAAAAJ;https://scholar.google.co.uk/citations?user=kcTK_FAAAAAJ;LZxqcX4AAAAJ", "orcid": ";;;;;;;;0000-0002-2820-4692;", "linkedin": ";;;;;;;;petarvelickovic;", "or_profile": "~Jessica_B_Hamrick1;~Abram_L._Friesen1;~Feryal_Behbahani1;~Arthur_Guez1;~Fabio_Viola2;switherspoon@google.com;~Thomas_Anthony1;~Lars_Holger_Buesing1;~Petar_Veli\u010dkovi\u01071;~Theophane_Weber1", "aff": "Google DeepMind;Google DeepMind;Google DeepMind;Google DeepMind;;;Google DeepMind;Deepmind;Google DeepMind;", "aff_domain": "google.com;google.com;google.com;google.com;;;deepmind.com;google.com;google.com;", "position": "Research Scientist;Research Scientist;Research Scientist;Research Scientist;;;Research Scientist;Postdoc;Senior Staff Research Scientist;", "bibtex": "@inproceedings{\nhamrick2021on,\ntitle={On the role of planning in model-based deep reinforcement learning},\nauthor={Jessica B Hamrick and Abram L. Friesen and Feryal Behbahani and Arthur Guez and Fabio Viola and Sims Witherspoon and Thomas Anthony and Lars Holger Buesing and Petar Veli{\\v{c}}kovi{\\'c} and Theophane Weber},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=IrM64DGB21}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;4;3;4", "wc_review": "708;389;286;315", "wc_reply_reviewers": "0;0;0;66", "wc_reply_authors": "770;456;334;263", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 424.5, 167.93227801706257 ], "wc_reply_reviewers_avg": [ 16.5, 28.578838324886476 ], "wc_reply_authors_avg": [ 455.75, 194.11900344891532 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 10, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 95, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2409550274044416235&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=IrM64DGB21", "email": "google.com;google.com;google.com;google.com;;;deepmind.com;google.com;google.com;", "author_num": 10, "aff_unique_index": "0;0;0;0;0;1;0", "aff_unique_norm": "Google;DeepMind", "aff_unique_dep": "Google DeepMind;", "aff_unique_url": "https://deepmind.com;https://deepmind.com", "aff_unique_abbr": "DeepMind;DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "IrofNLZuWF", "title": "Stochastic Optimization with Non-stationary Noise: The Power of Moment Estimation", "track": "main", "status": "Reject", "tldr": "", "abstract": "We investigate stochastic optimization under weaker assumptions on the distribution of noise than those used in usual analysis. Our assumptions are motivated by empirical observations in training neural networks. In particular, standard results on optimal convergence rates for stochastic optimization assume either there exists a uniform bound on the moments of the gradient noise, or that the noise decays as the algorithm progresses. These assumptions do not match the empirical behavior of optimization algorithms used in neural network training where the noise level in stochastic gradients could even increase with time. We address this nonstationary behavior of noise by analyzing convergence rates of stochastic gradient methods subject to changing second moment (or variance) of the stochastic oracle. When the noise variation is known, we show that it is always beneficial to adapt the step-size and exploit the noise variability. When the noise statistics are unknown, we obtain similar improvements by developing an online estimator of the noise level, thereby recovering close variants of RMSProp~\\citep{tieleman2012lecture}. Consequently, our results reveal why adaptive step size methods can outperform SGD, while still enjoying theoretical guarantees.", "keywords": "Stochastic optimization", "primary_area": "", "supplementary_material": "/attachment/4b195fa0d9588e15dab6e2d88116695f3520315e.zip", "author": "Jingzhao Zhang;Hongzhou Lin;Subhro Das;Suvrit Sra;Ali Jadbabaie", "authorids": "~Jingzhao_Zhang2;~Hongzhou_Lin1;~Subhro_Das1;~Suvrit_Sra1;~Ali_Jadbabaie1", "gender": "M;M;;;M", "homepage": "https://sites.google.com/view/jingzhao/home;;;https://optml.mit.edu;http://www.mit.edu/~jadbabai/www", "dblp": "220/5559;178/3313;;90/930;83/3158", "google_scholar": "8NudxYsAAAAJ;;;eyCw9goAAAAJ;ZBc_WwYAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Jingzhao_Zhang2;~Hongzhou_Lin1;~Subhro_Das1;~Suvrit_Sra1;~Ali_Jadbabaie1", "aff": "Massachusetts Institute of Technology;Amazon;;Massachusetts Institute of Technology;Massachusetts Institute of Technology", "aff_domain": "mit.edu;amazon.com;;mit.edu;mit.edu", "position": "PhD student;Applied Scientist;;Associate Professor;Full Professor", "bibtex": "@misc{\nzhang2021stochastic,\ntitle={Stochastic Optimization with Non-stationary Noise: The Power of Moment Estimation},\nauthor={Jingzhao Zhang and Hongzhou Lin and Subhro Das and Suvrit Sra and Ali Jadbabaie},\nyear={2021},\nurl={https://openreview.net/forum?id=IrofNLZuWF}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=IrofNLZuWF", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "5;5;4;4", "wc_review": "2140;687;685;333", "wc_reply_reviewers": "0;0;0;52", "wc_reply_authors": "1837;483;876;258", "reply_reviewers": "0;0;0;1", "reply_authors": "3;1;2;1", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 961.25, 695.6430029116947 ], "wc_reply_reviewers_avg": [ 13.0, 22.516660498395403 ], "wc_reply_authors_avg": [ 863.5, 604.0010347673256 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13818178009598360162&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Massachusetts Institute of Technology;Amazon", "aff_unique_dep": ";Amazon.com, Inc.", "aff_unique_url": "https://web.mit.edu;https://www.amazon.com", "aff_unique_abbr": "MIT;Amazon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "IuBLMxWOXMR", "title": "Unsupervised Simultaneous Depth-from-defocus and Depth-from-focus", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "If the accuracy of depth estimation from a single RGB image could be improved it would be possible to eliminate the need for expensive and bulky depth sensing hardware. The majority of efforts toward this end have been focused on utilizing geometric constraints, image sequences, or stereo image pairs with the help of a deep neural network. In this work, we propose a framework for simultaneous depth estimation from a single image and image focal stacks using depth-from-defocus and depth-from-focus algorithms. The proposed network is able to learn optimal depth mapping from the information contained in the blurring of a single image, generate a simulated image focal stack and all-in-focus image, and train a depth estimator from an image focal stack. As there is no large dataset specifically designed for our problem, we first learned on a synthetic indoor dataset: NYUv2. Then we compare the performance by comparing with other existing methods on DSLR dataset. Finally, we collected our own dataset using a DSLR and further verify on it. Experiments demonstrate that our system is able to provide comparable results compared with other state-of-the-art methods.", "keywords": "Depth-from-defocus;Depth-from-focus;Unsupervised learning", "primary_area": "", "supplementary_material": "/attachment/18250252be375f3443a6612469ddba70241857c1.zip", "author": "Yawen Lu;Guoyu Lu", "authorids": "~Yawen_Lu1;~Guoyu_Lu3", "gender": "M;M", "homepage": ";http://www.cis.rit.edu/~glpci/", "dblp": ";120/8962", "google_scholar": ";", "orcid": ";", "linkedin": "yawen-lu-9ba85b13;", "or_profile": "~Yawen_Lu1;~Guoyu_Lu3", "aff": "Rochester Institute of Technology;Rochester Institute of Technology", "aff_domain": "rit.edu;rit.edu", "position": "Researcher;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=IuBLMxWOXMR", "pdf_size": 0, "rating": "3;4;4;6", "confidence": "5;5;4;3", "wc_review": "676;677;547;96", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "754;825;623;202", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.0897247358851685 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 499.0, 238.60322713660014 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 601.0, 241.49016543122414 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.899228803025897, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:0ognoTie-C0J:scholar.google.com/&scioq=Unsupervised+Simultaneous+Depth-from-defocus+and+Depth-from-focus&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Rochester Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.rit.edu", "aff_unique_abbr": "RIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "Iuq6u10sCdl", "title": "$Graph Embedding via Topology and Functional Analysis$", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Graphs have been ubiquitous in Machine Learning due to their versatile nature in modelling real world situations .Graph embedding is an important precursor to using graphs in Machine Learning , and much of performance of algorithms developed later depends heavily on this. However very little theoretical work exists in this area , resulting in the proliferation of several benchmarks without any mathematical validation , which is detrimental .In this paper we present an analysis of deterministic graph embedding in general , using tools from Functional Analysis and Topology . We prove several important results pertaining to graph embedding which may have practical importance .One limitation of our work in it's present form is it's applicable to deterministic embedding approaches only, although we strongly hope to extend it to random graph embedding methods as well in future.We sincerely hope that this work will be beneficial to researchers working in field of graph embedding.", "keywords": "Graph embedding;Theory;Topology;Functional analysis", "primary_area": "", "supplementary_material": "", "author": "Phani raj Chinnalingu", "authorids": "~Phani_raj_Chinnalingu1", "gender": "M", "homepage": "https://ece.iisc.ac.in/~nextgenwrl/Members.html", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~Phani_raj_Chinnalingu1", "aff": "Indian Institute of Science, Dhirubhai Ambani Institute Of Information and Communication Technology", "aff_domain": "iisc.ac.in", "position": "MS student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=Iuq6u10sCdl", "pdf_size": 0, "rating": "2;2;2;3", "confidence": "5;5;5;3", "wc_review": "674;438;283;790", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 2.25, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 546.25, 197.9600654172452 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:24H_blEzNssJ:scholar.google.com/&scioq=%24Graph+Embedding+via+Topology+and+Functional+Analysis%24&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Indian Institute of Science", "aff_unique_dep": "", "aff_unique_url": "https://www.iisc.ac.in", "aff_unique_abbr": "IISc", "aff_country_unique_index": "0", "aff_country_unique": "India" }, { "title": "NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-end Learning and Control", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3070", "id": "Iw4ZGwenbXf", "poster": "", "openreview": "https://openreview.net/forum?id=Iw4ZGwenbXf", "slides": "https://iclr.cc/virtual/2021/poster/3070", "video": "https://iclr.cc/virtual/2021/poster/3070", "author_site": "Ioannis Exarchos, Marcus A Pereira, Ziyi Wang, Evangelos Theodorou", "tldr": "", "abstract": "In this work we propose the use of adaptive stochastic search as a building block for general, non-convex optimization operations within deep neural network architectures. Specifically, for an objective function located at some layer in the network and parameterized by some network parameters, we employ adaptive stochastic search to perform optimization over its output. This operation is differentiable and does not obstruct the passing of gradients during backpropagation, thus enabling us to incorporate it as a component in end-to-end learning. We study the proposed optimization module's properties and benchmark it against two existing alternatives on a synthetic energy-based structured prediction task, and further showcase its use in stochastic optimal control applications.", "keywords": "deep neural networks;nested optimization;stochastic control;deep FBSDEs", "primary_area": "", "supplementary_material": "/attachment/5344cbd3208a33aa57b3eee2edee22b6df97cd27.zip", "author": "Ioannis Exarchos;Marcus Aloysius Pereira;Ziyi Wang;Evangelos Theodorou", "authorids": "~Ioannis_Exarchos1;~Marcus_Aloysius_Pereira1;~Ziyi_Wang1;~Evangelos_Theodorou1", "gender": "M;M;M;M", "homepage": ";;;", "dblp": ";;;155/9964", "google_scholar": "http://scholar.google.com/citations?user=Nj44yVYAAAAJ;;ZBq4JcoAAAAJ;", "orcid": "0000-0002-5836-4750;;;", "linkedin": "http://www.linkedin.com/pub/ioannis-exarchos/49/b4b/291/;;;", "or_profile": "~Ioannis_Exarchos1;~Marcus_Aloysius_Pereira1;~Ziyi_Wang1;~Evangelos_Theodorou1", "aff": "Stanford University;Georgia Institute of Technology;Georgia Institute of Technology;Georgia Institute of Technology", "aff_domain": "stanford.edu;gatech.edu;gatech.edu;gatech.edu", "position": "Postdoc;PhD student;PhD student;Assistant Professor", "bibtex": "@inproceedings{\nexarchos2021novas,\ntitle={{\\{}NOVAS{\\}}: Non-convex Optimization via Adaptive Stochastic Search for End-to-end Learning and Control},\nauthor={Ioannis Exarchos and Marcus Aloysius Pereira and Ziyi Wang and Evangelos Theodorou},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Iw4ZGwenbXf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "4;6;6;6", "confidence": "2;3;2;2", "wc_review": "937;294;257;491", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1255;155;192;456", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 2.25, 0.4330127018922193 ], "wc_review_avg": [ 494.75, 270.38155909750947 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 514.5, 443.0036681563709 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13833271149008816947&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=Iw4ZGwenbXf", "email": "stanford.edu;gatech.edu;gatech.edu;gatech.edu", "author_num": 4, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "Stanford University;Georgia Institute of Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.stanford.edu;https://www.gatech.edu", "aff_unique_abbr": "Stanford;Georgia Tech", "aff_campus_unique_index": "0", "aff_campus_unique": "Stanford;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2880", "id": "Iz3zU3M316D", "poster": "", "openreview": "https://openreview.net/forum?id=Iz3zU3M316D", "slides": "https://iclr.cc/virtual/2021/poster/2880", "video": "https://iclr.cc/virtual/2021/poster/2880", "author_site": "Byeongho Heo, Sanghyuk Chun, Seong Joon Oh, Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha", "tldr": "", "abstract": "Normalization techniques, such as batch normalization (BN), are a boon for modern deep learning. They let weights converge more quickly with often better generalization performances. It has been argued that the normalization-induced scale invariance among the weights provides an advantageous ground for gradient descent (GD) optimizers: the effective step sizes are automatically reduced over time, stabilizing the overall training procedure. It is often overlooked, however, that the additional introduction of momentum in GD optimizers results in a far more rapid reduction in effective step sizes for scale-invariant weights, a phenomenon that has not yet been studied and may have caused unwanted side effects in the current practice. This is a crucial issue because arguably the vast majority of modern deep neural networks consist of (1) momentum-based GD (e.g. SGD or Adam) and (2) scale-invariant parameters (e.g. more than 90% of the weights in ResNet are scale-invariant due to BN). In this paper, we verify that the widely-adopted combination of the two ingredients lead to the premature decay of effective step sizes and sub-optimal model performances. We propose a simple and effective remedy, SGDP and AdamP: get rid of the radial component, or the norm-increasing direction, at each optimizer step. Because of the scale invariance, this modification only alters the effective step sizes without changing the effective update directions, thus enjoying the original convergence properties of GD optimizers. Given the ubiquity of momentum GD and scale invariance in machine learning, we have evaluated our methods against the baselines on 13 benchmarks. They range from vision tasks like classification (e.g. ImageNet), retrieval (e.g. CUB and SOP), and detection (e.g. COCO) to language modelling (e.g. WikiText) and audio classification (e.g. DCASE) tasks. We verify that our solution brings about uniform gains in performances in those benchmarks. Source code is available at https://github.com/clovaai/adamp", "keywords": "momentum optimizer;scale-invariant weights;normalize layer;effective learning rate", "primary_area": "", "supplementary_material": "/attachment/5edaddd83ee27b35b98f888c67f01d20fb4c8e8f.zip", "author": "Byeongho Heo;Sanghyuk Chun;Seong Joon Oh;Dongyoon Han;Sangdoo Yun;Gyuwan Kim;Youngjung Uh;Jung-Woo Ha", "authorids": "~Byeongho_Heo1;~Sanghyuk_Chun1;~Seong_Joon_Oh1;~Dongyoon_Han1;~Sangdoo_Yun1;~Gyuwan_Kim1;~Youngjung_Uh2;~Jung-Woo_Ha1", "gender": "M;M;M;M;M;M;;M", "homepage": "https://sites.google.com/view/byeongho-heo/home;https://sanghyukchun.github.io/home/;https://seongjoonoh.com;https://dongyoonhan.github.io/;https://sangdooyun.github.io/;https://gyuwankim.github.io/;https://vilab.yonsei.ac.kr/member/professor;https://aidljwha.wordpress.com/", "dblp": "142/2705;213/1095.html;168/8835;151/8876;124/3009.html;172/0889;57/10511;66/867-1", "google_scholar": "https://scholar.google.co.kr/citations?user=4_7rLDIAAAAJ;https://scholar.google.co.kr/citations?user=4_uj0xcAAAAJ;https://scholar.google.de/citations?user=kmXOOdsAAAAJ;jcP7m1QAAAAJ;o0qtjzYAAAAJ;LAl0EukAAAAJ;BWBGrEEAAAAJ;https://scholar.google.co.kr/citations?user=eGj3ay4AAAAJ", "orcid": ";0000-0002-4533-2610;0000-0002-8985-7689;0000-0002-9130-8195;;;;0000-0002-7400-7681", "linkedin": "byeongho-heo-1a7756122/;https://kr.linkedin.com/in/sanghyukchun/en;seong-joon-oh-32113479/;https://linkedin.com/in/dongyoon-han-04961a120/en;;gyuwankim;youngjung-uh-78b459b5/;jung-woo-ha-b2782862?trk=hp-identity-name", "or_profile": "~Byeongho_Heo1;~Sanghyuk_Chun1;~Seong_Joon_Oh1;~Dongyoon_Han1;~Sangdoo_Yun1;~Gyuwan_Kim1;~Youngjung_Uh2;~Jung-Woo_Ha1", "aff": "NAVER AI Lab;NAVER AI Lab;NAVER;NAVER;NAVER;NAVER Clova & AI LAB;Yonsei University;NAVER AI Lab", "aff_domain": "navercorp.com;navercorp.com;navercorp.com;navercorp.com;navercorp.com;navercorp.com;yonsei.ac.kr;navercorp.com", "position": "Researcher;Lead research scientist;Research scientist;Research Scientist;Research Scientist;Research Scientist;Associate Professor;Head (Executive Director)", "bibtex": "@inproceedings{\nheo2021adamp,\ntitle={AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights},\nauthor={Byeongho Heo and Sanghyuk Chun and Seong Joon Oh and Dongyoon Han and Sangdoo Yun and Gyuwan Kim and Youngjung Uh and Jung-Woo Ha},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Iz3zU3M316D}\n}", "github": "[![github](/images/github_icon.svg) clovaai/AdamP](https://github.com/clovaai/AdamP) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=Iz3zU3M316D)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;4;2;4", "wc_review": "897;284;110;751", "wc_reply_reviewers": "355;0;0;69", "wc_reply_authors": "2221;1372;186;1293", "reply_reviewers": "1;0;0;1", "reply_authors": "5;3;1;3", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 510.5, 323.62207897484376 ], "wc_reply_reviewers_avg": [ 106.0, 146.4940271819981 ], "wc_reply_authors_avg": [ 1268.0, 722.9062871493096 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 3.0, 1.4142135623730951 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 204, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6696661613902754889&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=Iz3zU3M316D", "email": "navercorp.com;navercorp.com;navercorp.com;navercorp.com;navercorp.com;navercorp.com;yonsei.ac.kr;navercorp.com", "author_num": 8, "aff_unique_index": "0;0;0;0;0;0;1;0", "aff_unique_norm": "NAVER Corporation;Yonsei University", "aff_unique_dep": "NAVER AI Lab;", "aff_unique_url": "https://www.naver.com;https://www.yonsei.ac.kr", "aff_unique_abbr": "NAVER;Yonsei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "J150Q1eQfJ4", "title": "Fully Convolutional Approach for Simulating Wave Dynamics", "track": "main", "status": "Reject", "tldr": "", "abstract": "We investigate the performance of fully convolutional networks to predict the motion and interaction of surface waves in open and closed complex geometries. We focus on a U-Net type architecture and assess its ability to capture and extrapolate wave propagation in time as well as the reflection, interference and diffraction of waves. We investigate how well the network generalises both to long-time predictions and to geometric configurations not seen during training. We demonstrate that this neural network is capable of accurately predicting the height distribution of waves on a liquid surface within curved and multi-faceted open and closed geometries, when only simple box and right-angled corner geometries were seen during training. We found that the RMSE of the predictions remained of order $1\\times10^{-4}$ times the characteristic length of the domain for at least 20 time-steps.", "keywords": "Convolutional neural network;spatio-temporal forecasting;data-driven physics;wave dynamics", "primary_area": "", "supplementary_material": "", "author": "Mario Lino Valencia;Chris D Cantwell;Eduardo Pignatelli;Stathi Fotiadis;Anil Anthony Bharath", "authorids": "~Mario_Lino_Valencia1;~Chris_D_Cantwell1;~Eduardo_Pignatelli1;~Stathi_Fotiadis1;~Anil_Anthony_Bharath2", "gender": "M;M;M;M;M", "homepage": ";http://www.imperial.ac.uk/people/c.cantwell;https://epignatelli.com;https://www.linkedin.com/in/stathifotiadis/;", "dblp": ";;;;71/4319", "google_scholar": ";https://scholar.google.co.uk/citations?user=gBCrORQAAAAJ;https://scholar.google.co.uk/citations?user=d-TVZ1YAAAAJ;ZHZczW8AAAAJ;", "orcid": ";0000-0002-2448-3540;0000-0003-0730-2303;;", "linkedin": "mario-lino-valencia-b004ba17;chrisdcantwell/;eduardo-pignatelli/;;", "or_profile": "~Mario_Lino_Valencia1;~Chris_D_Cantwell1;~Eduardo_Pignatelli1;~Stathi_Fotiadis1;~Anil_A._Bharath1", "aff": "Imperial College London;Imperial College London;University College London;Imperial College London;", "aff_domain": "imperial.ac.uk;imperial.ac.uk;ucl.ac.uk;imperial.ac.uk;", "position": "PhD student;Associate Professor;PhD student;PhD student;", "bibtex": "@misc{\nvalencia2021fully,\ntitle={Fully Convolutional Approach for Simulating Wave Dynamics},\nauthor={Mario Lino Valencia and Chris D Cantwell and Eduardo Pignatelli and Stathi Fotiadis and Anil Anthony Bharath},\nyear={2021},\nurl={https://openreview.net/forum?id=J150Q1eQfJ4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=J150Q1eQfJ4", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "4;4;4;4", "wc_review": "458;500;277;540", "wc_reply_reviewers": "0;0;0;241", "wc_reply_authors": "0;0;0;310", "reply_reviewers": "0;0;0;1", "reply_authors": "0;0;0;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 443.75, 100.54445534190336 ], "wc_reply_reviewers_avg": [ 60.25, 104.35606115602485 ], "wc_reply_authors_avg": [ 77.5, 134.23393758658798 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 0.25, 0.4330127018922193 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:WbVj-2ktb-8J:scholar.google.com/&scioq=Fully+Convolutional+Approach+for+Simulating+Wave+Dynamics&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "Imperial College London;University College London", "aff_unique_dep": ";", "aff_unique_url": "https://www.imperial.ac.uk;https://www.ucl.ac.uk", "aff_unique_abbr": "ICL;UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United Kingdom" }, { "title": "Mapping the Timescale Organization of Neural Language Models", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3232", "id": "J3OUycKwz-", "poster": "", "openreview": "https://openreview.net/forum?id=J3OUycKwz-", "slides": "https://iclr.cc/virtual/2021/poster/3232", "video": "https://iclr.cc/virtual/2021/poster/3232", "author_site": "Hsiang-Yun Sherry Chien, Jinhan Zhang, Christopher Honey", "tldr": "", "abstract": "In the human brain, sequences of language input are processed within a distributed and hierarchical architecture, in which higher stages of processing encode contextual information over longer timescales. In contrast, in recurrent neural networks which perform natural language processing, we know little about how the multiple timescales of contextual information are functionally organized. Therefore, we applied tools developed in neuroscience to map the \u201cprocessing timescales\u201d of individual units within a word-level LSTM language model. This timescale-mapping method assigned long timescales to units previously found to track long-range syntactic dependencies. Additionally, the mapping revealed a small subset of the network (less than 15% of units) with long timescales and whose function had not previously been explored. We next probed the functional organization of the network by examining the relationship between the processing timescale of units and their network connectivity. We identified two classes of long-timescale units: \u201ccontroller\u201d units composed a densely interconnected subnetwork and strongly projected to the rest of the network, while \u201cintegrator\u201d units showed the longest timescales in the network, and expressed projection profiles closer to the mean projection profile. Ablating integrator and controller units affected model performance at different positions within a sentence, suggesting distinctive functions of these two sets of units. Finally, we tested the generalization of these results to a character-level LSTM model and models with different architectures. In summary, we demonstrated a model-free technique for mapping the timescale organization in recurrent neural networks, and we applied this method to reveal the timescale and functional organization of neural language models", "keywords": "natural language processing;LSTM;timescale;hierarchy;temporal context", "primary_area": "", "supplementary_material": "", "author": "Hsiang-Yun Sherry Chien;Jinhan Zhang;Christopher Honey", "authorids": "~Hsiang-Yun_Sherry_Chien1;~Jinhan_Zhang1;~Christopher_Honey1", "gender": "F;;", "homepage": "https://sherrychien.github.io/;;https://www.honeylab.org", "dblp": ";;60/11540", "google_scholar": "E9iWhoYAAAAJ;;https://scholar.google.com/citations?hl=en", "orcid": ";;0000-0002-0745-5089", "linkedin": "hsiang-yun-sherry-chien-55868a6b/;jinhan-zhang-a66134186?challengeId=AQFtCUY4mWd3pwAAAXTS01oNsZeHnYAL9dR73kx3DEoT0gp0-kqUHOhswZx4m7rm9lQWuyZVnknSgVEQQ0edJUImI7spz5s4nA&submissionId=fa2ab56e-f1d5-3816-012e-a061115fb254;", "or_profile": "~Hsiang-Yun_Sherry_Chien1;~Jinhan_Zhang1;~Christopher_Honey1", "aff": "Johns Hopkins University;Johns Hopkins University;Johns Hopkins University", "aff_domain": "jhu.edu;jhu.edu;jhu.edu", "position": "PhD student;Undergrad student;Assistant Professor", "bibtex": "@inproceedings{\nchien2021mapping,\ntitle={Mapping the Timescale Organization of Neural Language Models},\nauthor={Hsiang-Yun Sherry Chien and Jinhan Zhang and Christopher Honey},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=J3OUycKwz-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "3;6;6;7", "confidence": "4;4;4;3", "wc_review": "1591;540;644;214", "wc_reply_reviewers": "0;117;0;0", "wc_reply_authors": "2787;1985;2182;495", "reply_reviewers": "0;1;0;0", "reply_authors": "5;4;4;1", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 747.25, 512.3189314284609 ], "wc_reply_reviewers_avg": [ 29.25, 50.66248612138966 ], "wc_reply_authors_avg": [ 1862.25, 842.8883007255469 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 3.5, 1.5 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896258, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13680660324075458803&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=J3OUycKwz-", "email": "jhu.edu;jhu.edu;jhu.edu", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Johns Hopkins University", "aff_unique_dep": "", "aff_unique_url": "https://www.jhu.edu", "aff_unique_abbr": "JHU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "J40FkbdldTX", "title": "Exploring single-path Architecture Search ranking correlations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recently presented benchmarks for Neural Architecture Search (NAS) provide the results of training thousands of different architectures in a specific search space, thus enabling the fair and rapid comparison of different methods.\nBased on these results, we quantify the ranking correlations of single-path architecture search methods\nin different search space subsets and under several training variations;\nstudying their impact on the expected search results.\nThe experiments support the few-shot approach and Linear Transformers,\nprovide evidence against disabling cell topology sharing during the training phase or using strong regularization in the NAS-Bench-201 search space,\nand show the necessity of further research regarding super-network size and path sampling strategies.", "keywords": "Neural Architecture Search;AutoML;Neural Networks", "primary_area": "", "supplementary_material": "/attachment/735bf091f7d64b75f2be16b590bcac601c1be822.zip", "author": "Kevin Alexander Laube;Andreas Zell", "authorids": "~Kevin_Alexander_Laube1;~Andreas_Zell1", "gender": "M;M", "homepage": ";https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/kognitive-systeme/", "dblp": "232/1731;05/4192", "google_scholar": ";", "orcid": ";", "linkedin": "laubeke/;", "or_profile": "~Kevin_Alexander_Laube1;~Andreas_Zell1", "aff": "University of Tuebingen;Eberhard-Karls-Universit\u00e4t T\u00fcbingen", "aff_domain": "uni-tuebingen.de;uni-tuebingen.de", "position": "PhD student;Full Professor", "bibtex": "@misc{\nlaube2021exploring,\ntitle={Exploring single-path Architecture Search ranking correlations},\nauthor={Kevin Alexander Laube and Andreas Zell},\nyear={2021},\nurl={https://openreview.net/forum?id=J40FkbdldTX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=J40FkbdldTX", "pdf_size": 0, "rating": "5;5;5;8", "confidence": "4;4;4;5", "wc_review": "265;410;297;1316", "wc_reply_reviewers": "0;78;0;0", "wc_reply_authors": "990;748;637;1704", "reply_reviewers": "0;1;0;0", "reply_authors": "2;1;1;5", "rating_avg": [ 5.75, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 572.0, 432.91280877331405 ], "wc_reply_reviewers_avg": [ 19.5, 33.77499074759311 ], "wc_reply_authors_avg": [ 1019.75, 415.15923150039674 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 1.6393596310755 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3483593064505202813&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Tuebingen;Eberhard Karls University of T\u00fcbingen", "aff_unique_dep": ";", "aff_unique_url": "https://www.uni-tuebingen.de/;https://www.uni-tuebingen.de/", "aff_unique_abbr": "Uni T\u00fcbingen;Uni T\u00fcbingen", "aff_campus_unique_index": "1", "aff_campus_unique": ";T\u00fcbingen", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "J4XaMT9OcZ", "title": "Mitigating Deep Double Descent by Concatenating Inputs", "track": "main", "status": "Reject", "tldr": "", "abstract": "The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of parameters. In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting. In particular, we propose a construction which augments the existing dataset by artificially increasing the number of samples. This construction empirically mitigates the double descent curve in this setting. We reproduce existing work on deep double descent, and observe a smooth descent into the overparameterized region for our construction. This occurs both with respect to the model size, and with respect to the number epochs.", "keywords": "deep double descent;feedforward neural network;image classificaiton", "primary_area": "", "supplementary_material": "", "author": "John Chen;Qihan Wang;Anastasios Kyrillidis", "authorids": "~John_Chen3;~Qihan_Wang1;~Anastasios_Kyrillidis2", "gender": ";;M", "homepage": "https://johnchenresearch.github.io/;http://wangqihan.com/;http://akyrillidis.github.io", "dblp": "71/1897;;53/9879", "google_scholar": "NbcgY4oAAAAJ;;TEGzkZMAAAAJ", "orcid": ";;", "linkedin": "john-c/;;", "or_profile": "~John_Chen3;~Qihan_Wang1;~Anastasios_Kyrillidis2", "aff": "Rice University;Rice University;Rice University", "aff_domain": "rice.edu;rice.edu;rice.edu", "position": "PhD student;Undergrad student;Assistant Professor", "bibtex": "@misc{\nchen2021mitigating,\ntitle={Mitigating Deep Double Descent by Concatenating Inputs},\nauthor={John Chen and Qihan Wang and Anastasios Kyrillidis},\nyear={2021},\nurl={https://openreview.net/forum?id=J4XaMT9OcZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=J4XaMT9OcZ", "pdf_size": 0, "rating": "2;3;4;5", "confidence": "4;3;3;4", "wc_review": "355;334;629;345", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 1.118033988749895 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 415.75, 123.34377771091657 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9007958993033305419&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0", "aff_unique_norm": "Rice University", "aff_unique_dep": "", "aff_unique_url": "https://www.rice.edu", "aff_unique_abbr": "Rice", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "J5LS3YJH7Zi", "title": "CaLFADS: latent factor analysis of dynamical systems in calcium imaging data", "track": "main", "status": "Reject", "tldr": "", "abstract": "Dynamic latent variable modelling has been a hugely powerful tool in understanding how spiking activity in populations of neurons can perform computations necessary for adaptive behaviour. The success of such approaches has been enabled by the ability to construct models derived with the characterization of spiking activity as point-processes since spiking dynamics occur on a much faster time-scale than the computational dynamics being inferred. Other experimental techniques, such as calcium imaging, pose a problem for latent variable modelling of computational dynamics, since the time-scales of calcium dynamics and computational dynamics overlap. As such, the success of dynamic latent variable modelling in calcium imaging data rests on being able to disentangle the contribution of these two sources of variation. Here we extend recent advances using variational autoencoders to analyze neural data, by incorporating a ladder architecture that can infer a hierarchy of dynamical systems. Using built-in inductive biases for calcium dynamics, we can capture calcium flux as well as underlying dynamics of neural computation. First, we demonstrate with synthetic calcium data that we can correctly infer an underlying Lorenz attractor at the same time as calcium dynamics. Next, we show that we can infer appropriate rotational dynamics in spiking data from macaque motor cortex after it has been converted into calcium fluorescence data via a calcium dynamics model. Finally, we show that our method applied to real calcium imaging data from primary visual cortex in mice allows us to infer latent factors that carry salient sensory information about unexpected stimuli. These results demonstrate that variational ladder autoencoders are a promising approach for inferring hierarchical dynamics in experimental settings where the measured variable has its own slow dynamics, such as calcium imaging data, thereby providing the neuroscience community with a new analysis tool for a wider array of data modalities.", "keywords": "latent variable modelling;lfads;neuroscience;variational autoencoders;dynamical systems;calcium imaging;neural data analysis", "primary_area": "", "supplementary_material": "", "author": "Luke Yuri Prince;Shahab Bakhtiari;Colleen J Gillon;Blake Aaron Richards", "authorids": "~Luke_Yuri_Prince1;~Shahab_Bakhtiari1;~Colleen_J_Gillon1;~Blake_Aaron_Richards1", "gender": "M;M;F;M", "homepage": ";;;http://linclab.org", "dblp": "182/3837;315/3050;;70/10850", "google_scholar": "eIaKr8IAAAAJ;f_JDOhEAAAAJ;https://scholar.google.ca/citations?user=IYc7UKwAAAAJ;https://scholar.google.ca/citations?user=1CPY1LsAAAAJ", "orcid": ";;0000-0002-2253-7816;0000-0001-9662-2151", "linkedin": ";ShahabBakht/;;", "or_profile": "~Luke_Yuri_Prince1;~Shahab_Bakhtiari1;~Colleen_J_Gillon1;~Blake_Aaron_Richards1", "aff": "Mila;McGill University;Toronto University;Mila - Quebec Artificial Intelligence Institute", "aff_domain": "mila.quebec;mcgill.ca;utoronto.ca;mila.quebec", "position": "Postdoc;Postdoc;PhD student;Associate Professor", "bibtex": "@misc{\nprince2021calfads,\ntitle={Ca{\\{}LFADS{\\}}: latent factor analysis of dynamical systems in calcium imaging data},\nauthor={Luke Yuri Prince and Shahab Bakhtiari and Colleen J Gillon and Blake Aaron Richards},\nyear={2021},\nurl={https://openreview.net/forum?id=J5LS3YJH7Zi}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=J5LS3YJH7Zi", "pdf_size": 0, "rating": "4;5;5;7", "confidence": "4;4;4;4", "wc_review": "518;1009;1183;897", "wc_reply_reviewers": "276;514;213;0", "wc_reply_authors": "1294;2024;1340;647", "reply_reviewers": "1;2;1;0", "reply_authors": "2;3;2;1", "rating_avg": [ 5.25, 1.0897247358851685 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 901.75, 243.87022676005367 ], "wc_reply_reviewers_avg": [ 250.75, 183.19303343740995 ], "wc_reply_authors_avg": [ 1326.25, 487.20240916892027 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12958563888246675176&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Mila;McGill University;University of Toronto;Quebec Artificial Intelligence Institute", "aff_unique_dep": "Quebec Artificial Intelligence Institute;;;Artificial Intelligence", "aff_unique_url": "https://mila.quebec;https://www.mcgill.ca;https://www.utoronto.ca;https://mila.quebec", "aff_unique_abbr": "Mila;McGill;U of T;Mila", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Canada" }, { "id": "J7bUsLCb0zf", "title": "Compute- and Memory-Efficient Reinforcement Learning with Latent Experience Replay", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent advances in off-policy deep reinforcement learning (RL) have led to impressive success in complex tasks from visual observations. Experience replay improves sample-efficiency by reusing experiences from the past, and convolutional neural networks (CNNs) process high-dimensional inputs effectively. However, such techniques demand high memory and computational bandwidth. In this paper, we present Latent Vector Experience Replay (LeVER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements without sacrificing the performance of RL agents. To reduce the computational overhead of gradient updates in CNNs, we freeze the lower layers of CNN encoders early in training due to early convergence of their parameters. Additionally, we reduce memory requirements by storing the low-dimensional latent vectors for experience replay instead of high-dimensional images, enabling an adaptive increase in the replay buffer capacity, a useful technique in constrained-memory settings. In our experiments, we show that LeVER does not degrade the performance of RL agents while significantly saving computation and memory across a diverse set of DeepMind Control environments and Atari games. Finally, we show that LeVER is useful for computation-efficient transfer learning in RL because lower layers of CNNs extract generalizable features, which can be used for different tasks and domains.", "keywords": "reinforcement learning;deep learning;computational efficiency;memory efficiency", "primary_area": "", "supplementary_material": "", "author": "Lili Chen;Kimin Lee;Aravind Srinivas;Pieter Abbeel", "authorids": "~Lili_Chen1;~Kimin_Lee1;~Aravind_Srinivas1;~Pieter_Abbeel2", "gender": ";M;;M", "homepage": "http://www.lilichen.me;https://sites.google.com/view/kiminlee;https://people.eecs.berkeley.edu/~aravind/;https://people.eecs.berkeley.edu/~pabbeel/", "dblp": "92/169;183/6849;218/5157;", "google_scholar": "https://scholar.google.com/citations?hl=en;92M8xv4AAAAJ;GhrKC1gAAAAJ;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ", "orcid": ";;;", "linkedin": "lili-chen/;;;", "or_profile": "~Lili_Chen1;~Kimin_Lee1;~Aravind_Srinivas1;~Pieter_Abbeel2", "aff": "University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;Covariant", "aff_domain": "berkeley.edu;berkeley.edu;berkeley.edu;covariant.ai", "position": "Undergrad student;Postdoc;PhD student;Founder", "bibtex": "@misc{\nchen2021compute,\ntitle={Compute- and Memory-Efficient Reinforcement Learning with Latent Experience Replay},\nauthor={Lili Chen and Kimin Lee and Aravind Srinivas and Pieter Abbeel},\nyear={2021},\nurl={https://openreview.net/forum?id=J7bUsLCb0zf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=J7bUsLCb0zf", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "5;4;3;4", "wc_review": "556;614;232;398", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "655;710;526;469", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 450.0, 148.62705002791384 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 590.0, 96.64626221432466 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Y8DENc1LqusJ:scholar.google.com/&scioq=Compute-+and+Memory-Efficient+Reinforcement+Learning+with+Latent+Experience+Replay&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "University of California, Berkeley;Covariant", "aff_unique_dep": ";", "aff_unique_url": "https://www.berkeley.edu;", "aff_unique_abbr": "UC Berkeley;", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States;" }, { "title": "Trajectory Prediction using Equivariant Continuous Convolution", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3189", "id": "J8_GttYLFgr", "poster": "", "openreview": "https://openreview.net/forum?id=J8_GttYLFgr", "slides": "https://iclr.cc/virtual/2021/poster/3189", "video": "https://iclr.cc/virtual/2021/poster/3189", "author_site": "Robin Walters, Jinxi Li, Rose Yu", "tldr": "", "abstract": "Trajectory prediction is a critical part of many AI applications, for example, the safe operation of autonomous vehicles. However, current methods are prone to making inconsistent and physically unrealistic predictions. We leverage insights from fluid dynamics to overcome this limitation by considering internal symmetry in real-world trajectories. We propose a novel model, Equivariant Continous COnvolution (ECCO) for improved trajectory prediction. ECCO uses rotationally-equivariant continuous convolutions to embed the symmetries of the system. On both vehicle and pedestrian trajectory datasets, ECCO attains competitive accuracy with significantly fewer parameters. It is also more sample efficient, generalizing automatically from few data points in any orientation. Lastly, ECCO improves generalization with equivariance, resulting in more physically consistent predictions. Our method provides a fresh perspective towards increasing trust and transparency in deep learning models. Our code and data can be found at https://github.com/Rose-STL-Lab/ECCO.", "keywords": "equivariant;symmetry;trajectory prediction;continuous convolution;argoverse", "primary_area": "", "supplementary_material": "/attachment/a94e7095669db49b0d0ad074f11baa558f4f7bbb.zip", "author": "Robin Walters;Jinxi Li;Rose Yu", "authorids": "~Robin_Walters1;li.jinxi1@northeastern.edu;~Rose_Yu1", "gender": "M;;F", "homepage": "http://www.robinwalters.com;;http://roseyu.com", "dblp": "258/3416;;164/7314", "google_scholar": "fnprJmUAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Robin_Walters1;li.jinxi1@northeastern.edu;~Rose_Yu1", "aff": ";;University of California, San Diego", "aff_domain": ";;ucsd.edu", "position": ";;Assistant Professor", "bibtex": "@inproceedings{\nwalters2021trajectory,\ntitle={Trajectory Prediction using Equivariant Continuous Convolution},\nauthor={Robin Walters and Jinxi Li and Rose Yu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=J8_GttYLFgr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "2;4;2;3", "wc_review": "400;685;148;331", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "997;724;675;684", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;2", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 2.75, 0.82915619758885 ], "wc_review_avg": [ 391.0, 193.11007223860696 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 770.0, 132.34991499808376 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.4264014327112209, "gs_citation": 59, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7554778751681531662&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=J8_GttYLFgr", "email": ";;ucsd.edu", "author_num": 3, "aff_unique_index": "0", "aff_unique_norm": "University of California, San Diego", "aff_unique_dep": "", "aff_unique_url": "https://www.ucsd.edu", "aff_unique_abbr": "UCSD", "aff_campus_unique_index": "0", "aff_campus_unique": "San Diego", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "JAlqRs9duhz", "title": "Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Advanced large-scale neural language models have led to significant success in many natural language generation tasks. However, the most commonly used training objective, Maximum Likelihood Estimation (MLE), has been shown to be problematic, where the trained model prefers using dull and repetitive phrases. In this work, we introduce ScaleGrad, a modification straight to the gradient of the loss function, to remedy the degeneration issues of the standard MLE objective. By directly maneuvering the gradient information, ScaleGrad makes the model learn to use novel tokens during training. Empirical results show the effectiveness of our method not only in open-ended generation, but also in directed generation. With the simplicity in architecture, our method can serve as a general training objective that is applicable to most of the neural text generation tasks.", "keywords": "text generation;text degeneration;language model;summarization;image captioning", "primary_area": "", "supplementary_material": "/attachment/986342bac3092252dc21c0bfec5358b85b6c8a6e.zip", "author": "Xiang Lin;SIMENG HAN;Shafiq Joty", "authorids": "~Xiang_Lin2;~SIMENG_HAN1;~Shafiq_Joty1", "gender": "M;F;M", "homepage": "https://shawnlimn.github.io;https://shirleyhan6.github.io/;https://raihanjoty.github.io/", "dblp": "29/6347;;62/2078", "google_scholar": "R4ZlMwIAAAAJ;D0dpploAAAAJ;hR249csAAAAJ", "orcid": ";;", "linkedin": ";simeng-sophia-han-746135159/;", "or_profile": "~Xiang_Lin2;~SIMENG_HAN1;~Shafiq_Joty1", "aff": "Nanyang Technological University;Nanyang Technological University;Nanyang Technological University", "aff_domain": "ntu.edu.sg;ntu.edu;ntu.edu.sg", "position": "PhD student;Undergrad student;Assistant Professor", "bibtex": "@misc{\nlin2021straight,\ntitle={Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation},\nauthor={Xiang Lin and SIMENG HAN and Shafiq Joty},\nyear={2021},\nurl={https://openreview.net/forum?id=JAlqRs9duhz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=JAlqRs9duhz", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;5;4;3", "wc_review": "1139;385;961;197", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 670.5, 390.38282492958115 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.42640143271122083, "gs_citation": 26, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=743520526432802506&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "Nanyang Technological University", "aff_unique_dep": "", "aff_unique_url": "https://www.ntu.edu.sg", "aff_unique_abbr": "NTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Singapore" }, { "title": "Individually Fair Gradient Boosting", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2749", "id": "JBAa9we1AL", "poster": "", "openreview": "https://openreview.net/forum?id=JBAa9we1AL", "slides": "https://iclr.cc/virtual/2021/poster/2749", "video": "https://iclr.cc/virtual/2021/poster/2749", "author_site": "Alexander Vargo, Fan Zhang, Mikhail Yurochkin, Yuekai Sun", "tldr": "", "abstract": "We consider the task of enforcing individual fairness in gradient boosting. Gradient boosting is a popular method for machine learning from tabular data, which arise often in applications where algorithmic fairness is a concern. At a high level, our approach is a functional gradient descent on a (distributionally) robust loss function that encodes our intuition of algorithmic fairness for the ML task at hand. Unlike prior approaches to individual fairness that only work with smooth ML models, our approach also works with non-smooth models such as decision trees. We show that our algorithm converges globally and generalizes. We also demonstrate the efficacy of our algorithm on three ML problems susceptible to algorithmic bias.", "keywords": "Algorithmic fairness;boosting;non-smooth models", "primary_area": "", "supplementary_material": "/attachment/9c42d4ae144146dd6c1c7264db703ca449cf9fb6.zip", "author": "Alexander Vargo;Fan Zhang;Mikhail Yurochkin;Yuekai Sun", "authorids": "ahsvargo@umich.edu;zhangfan4@shanghaitech.edu.cn;~Mikhail_Yurochkin1;~Yuekai_Sun1", "gender": ";;M;", "homepage": ";;https://moonfolk.github.io/;https://yuekai.github.io/", "dblp": ";;191/6719;", "google_scholar": ";;QjBF9sUAAAAJ;6T1XtW8AAAAJ", "orcid": ";;;", "linkedin": ";;mikhail-yurochkin-a45659114/;", "or_profile": "ahsvargo@umich.edu;zhangfan4@shanghaitech.edu.cn;~Mikhail_Yurochkin1;~Yuekai_Sun1", "aff": ";;IBM Research;University of Michigan - Ann Arbor", "aff_domain": ";;ibm.com;umich.edu", "position": ";;Researcher;Assistant \u2192 Associate Professor of Statistics", "bibtex": "@inproceedings{\nvargo2021individually,\ntitle={Individually Fair Gradient Boosting},\nauthor={Alexander Vargo and Fan Zhang and Mikhail Yurochkin and Yuekai Sun},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=JBAa9we1AL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "7;7;7", "confidence": "4;4;2", "wc_review": "524;854;213", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "265;677;223", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 3.3333333333333335, 0.9428090415820634 ], "wc_review_avg": [ 530.3333333333334, 261.72547109943696 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 388.3333333333333, 204.83706261861457 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 19, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9721608171776658872&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=JBAa9we1AL", "email": ";;ibm.com;umich.edu", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "IBM;University of Michigan", "aff_unique_dep": "IBM Research;", "aff_unique_url": "https://www.ibm.com/research;https://www.umich.edu", "aff_unique_abbr": "IBM;UM", "aff_campus_unique_index": "1", "aff_campus_unique": ";Ann Arbor", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Fantastic Four: Differentiable and Efficient Bounds on Singular Values of Convolution Layers", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2548", "id": "JCRblSgs34Z", "poster": "", "openreview": "https://openreview.net/forum?id=JCRblSgs34Z", "slides": "https://iclr.cc/virtual/2021/poster/2548", "video": "https://iclr.cc/virtual/2021/poster/2548", "author_site": "Sahil Singla, Soheil Feizi", "tldr": "", "abstract": "In deep neural networks, the spectral norm of the Jacobian of a layer bounds the factor by which the norm of a signal changes during forward/backward propagation. Spectral norm regularizations have been shown to improve generalization, robustness and optimization of deep learning methods. Existing methods to compute the spectral norm of convolution layers either rely on heuristics that are efficient in computation but lack guarantees or are theoretically-sound but computationally expensive. In this work, we obtain the best of both worlds by deriving {\\it four} provable upper bounds on the spectral norm of a standard 2D multi-channel convolution layer. These bounds are differentiable and can be computed efficiently during training with negligible overhead. One of these bounds is in fact the popular heuristic method of Miyato et al. (multiplied by a constant factor depending on filter sizes). Each of these four bounds can achieve the tightest gap depending on convolution filters. Thus, we propose to use the minimum of these four bounds as a tight, differentiable and efficient upper bound on the spectral norm of convolution layers. Moreover, our spectral bound is an effective regularizer and can be used to bound either the lipschitz constant or curvature values (eigenvalues of the Hessian) of neural networks. Through experiments on MNIST and CIFAR-10, we demonstrate the effectiveness of our spectral bound in improving generalization and robustness of deep networks.", "keywords": "spectral regularization;spectral normalization", "primary_area": "", "supplementary_material": "/attachment/70f5843d86434a7bbc31c7222f198b47c1176821.zip", "author": "Sahil Singla;Soheil Feizi", "authorids": "~Sahil_Singla1;~Soheil_Feizi2", "gender": "M;M", "homepage": "https://singlasahil14.github.io/;https://www.cs.umd.edu/~sfeizi/", "dblp": "55/8911-2;57/2132", "google_scholar": "jjjbOI4AAAAJ;lptAmrMAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Sahil_Singla1;~Soheil_Feizi2", "aff": "University of Maryland, College Park;University of Maryland, College Park", "aff_domain": "umd.edu;umd.edu", "position": "PhD student;Assistant Professor", "bibtex": "@inproceedings{\nsingla2021fantastic,\ntitle={Fantastic Four: Differentiable and Efficient Bounds on Singular Values of Convolution Layers},\nauthor={Sahil Singla and Soheil Feizi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=JCRblSgs34Z}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "3;4;5;8", "confidence": "4;4;4;5", "wc_review": "423;858;332;307", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "271;274;197;27", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.8708286933869707 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 480.0, 222.46685146331353 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 192.25, 100.26807817047258 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.9258200997725515, "gs_citation": 41, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7739339146709410018&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=JCRblSgs34Z", "email": "umd.edu;umd.edu", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Maryland", "aff_unique_dep": "", "aff_unique_url": "https://www/umd.edu", "aff_unique_abbr": "UMD", "aff_campus_unique_index": "0;0", "aff_campus_unique": "College Park", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "JCz05AtXO3y", "title": "Structural Landmarking and Interaction Modelling: on Resolution Dilemmas in Graph Classification", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph neural networks are promising architecture for learning and inference with graph-structured data. However, generating informative graph level features has long been a challenge. Current practice of graph-pooling typically summarizes a graph by squeezing it into a single vector. This may lead to significant loss of predictive, iterpretable structural information, because properties of a complex system are believed to arise largely from the interaction among its components. In this paper, we analyze the intrinsic difficulty in graph classification under the unified concept of ``\"resolution dilemmas\" and propose `SLIM, an inductive neural network model for Structural Landmarking and Interaction Modelling, to remedy the information loss in graph pooling. We show that, by projecting graphs onto end-to-end optimizable, and well-aligned substructure landmarks (representatives), the resolution dilemmas can be resolved effectively, so that explicit interacting relation between component parts of a graph can be leveraged directly in explaining its complexity and predicting its property. Empirical evaluations, in comparison with state-of-the-art, demonstrate promising results of our approach on a number of benchmark datasets for graph classification.\n", "keywords": "Graph Pooling;Graph Classiciation;Interaction Preserving Graph Pooling;Structure Landmarking", "primary_area": "", "supplementary_material": "/attachment/34a227989f5ea17da45d174029dd45c9ffa61a72.zip", "author": "Kai Zhang;Yaokang Zhu;Jun Wang;Haibin Ling;Jie Zhang;Hongyuan Zha", "authorids": "~Kai_Zhang1;~Yaokang_Zhu1;~Jun_Wang4;~Haibin_Ling1;~Jie_Zhang10;~Hongyuan_Zha1", "gender": "M;M;M;M;M;", "homepage": "https://cis.temple.edu/user/635;https://openreview.net/forum?id=JCz05AtXO3y;;https://www3.cs.stonybrook.edu/~hling/;https://istbi.fudan.edu.cn/lnen/info/1157/1639.htm;", "dblp": "55/957-1.html;;;93/3488;84/6889-12;z/HongyuanZha", "google_scholar": "I6ifR7YAAAAJ;;;https://scholar.google.com/citations?hl=en;https://scholar.google.com.hk/citations?user=epTfECgAAAAJ;n1DQMIsAAAAJ", "orcid": "0000-0001-6297-4423;;;;;", "linkedin": "kai-zhang-1b939430/;;;;;", "or_profile": "~Kai_Zhang1;~Yaokang_Zhu1;~Jun_Wang4;~Haibin_Ling1;~Jie_Zhang10;~Hongyuan_Zha1", "aff": "Temple University;East China Normal University;;State University of New York, Stony Brook;Fudan University;The Chinese University of Hong Kong, Shenzhen", "aff_domain": "temple.edu;ecnu.edu.cn;;stonybrook.edu;fudan.edu.cn;cuhk.edu.cn", "position": "Associate Professor;PhD student;;Professor;Full Professor;Full Professor", "bibtex": "@misc{\nzhang2021structural,\ntitle={Structural Landmarking and Interaction Modelling: on Resolution Dilemmas in Graph Classification},\nauthor={Kai Zhang and Yaokang Zhu and Jun Wang and Haibin Ling and Jie Zhang and Hongyuan Zha},\nyear={2021},\nurl={https://openreview.net/forum?id=JCz05AtXO3y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=JCz05AtXO3y", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "4;4;2;3", "wc_review": "621;540;565;399", "wc_reply_reviewers": "21;0;0;0", "wc_reply_authors": "805;625;584;760", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 531.25, 81.79356637291224 ], "wc_reply_reviewers_avg": [ 5.25, 9.093266739736606 ], "wc_reply_authors_avg": [ 693.5, 91.56555029048862 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:z8NM5T73vssJ:scholar.google.com/&scioq=Structural+Landmarking+and+Interaction+Modelling:+on+Resolution+Dilemmas+in+Graph+Classification&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0;1;2;3;4", "aff_unique_norm": "Temple University;East China Normal University;State University of New York;Fudan University;Chinese University of Hong Kong", "aff_unique_dep": ";;;;", "aff_unique_url": "https://www.temple.edu;http://www.ecnu.edu.cn;https://www.stonybrook.edu;https://www.fudan.edu.cn;https://www.cuhk.edu.cn", "aff_unique_abbr": "Temple;ECNU;SUNY Stony Brook;Fudan;CUHK", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Stony Brook;Shenzhen", "aff_country_unique_index": "0;1;0;1;1", "aff_country_unique": "United States;China" }, { "id": "JE7a-YejzfN", "title": "Geometry matters: Exploring language examples at the decision boundary", "track": "main", "status": "Reject", "tldr": "", "abstract": " A growing body of recent evidence has highlighted the limitations of natural language processing (NLP) datasets and classifiers. These include the presence of annotation artifacts in datasets, classifiers relying on shallow features like a single word (e.g., if a movie review has the word \"romantic\", the review tends to be positive), or unnecessary words (e.g., learning a proper noun to classify a movie as positive or negative). The presence of such artifacts has subsequently led to the development of challenging datasets to force the model to generalize better. While a variety of heuristic strategies, such as counterfactual examples and contrast sets, have been proposed, the theoretical justification about what makes these examples difficult for the classifier is often lacking or unclear. In this paper, using tools from information geometry, we propose a theoretical way to quantify the difficulty of an example in NLP. Using our approach, we explore difficult examples for several deep learning architectures. We discover that BERT, CNN and fasttext are susceptible to word substitutions in high difficulty examples. These classifiers tend to perform poorly on the FIM test set. (generated by sampling and perturbing difficult examples, with accuracy dropping below 50%). We replicate our experiments on 5 NLP datasets (YelpReviewPolarity, AGNEWS, SogouNews, YelpReviewFull and Yahoo Answers). On YelpReviewPolarity we observe a correlation coefficient of -0.4 between resilience to perturbations and the difficulty score. Similarly we observe a correlation of 0.35 between the difficulty score and the empirical success probability of random substitutions. Our approach is simple, architecture agnostic and can be used to study the fragilities of text classification models. All the code used will be made publicly available, including a tool to explore the difficult examples for other datasets.\n ", "keywords": "Natural Language Processing;Text Classification;Information Geomtery;Sentiment Analysis", "primary_area": "", "supplementary_material": "/attachment/11133cdb73ce1422d4aae45ebc88deed513c655c.zip", "author": "Debajyoti Datta;Shashwat Kumar;Laura Barnes;Tom Fletcher", "authorids": "~Debajyoti_Datta1;sk9epp@virginia.edu;~Laura_Barnes1;~Tom_Fletcher1", "gender": ";;F;M", "homepage": ";;https://www.s2helab.com/;http://www.sci.utah.edu/~fletcher/", "dblp": "147/8345;;;20/546.html", "google_scholar": "L6lx408AAAAJ;;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Debajyoti_Datta1;sk9epp@virginia.edu;~Laura_Barnes1;~Tom_Fletcher1", "aff": "University of Virginia;;University of Virginia;University of Virginia", "aff_domain": "virginia.edu;;virginia.edu;virginia.edu", "position": "PhD student;;Professor;Associate Professor", "bibtex": "@misc{\ndatta2021geometry,\ntitle={Geometry matters: Exploring language examples at the decision boundary},\nauthor={Debajyoti Datta and Shashwat Kumar and Laura Barnes and Tom Fletcher},\nyear={2021},\nurl={https://openreview.net/forum?id=JE7a-YejzfN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=JE7a-YejzfN", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;5;2;4", "wc_review": "1315;750;480;706", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "270;349;214;112", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 812.75, 307.5348557480924 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 236.25, 86.29129446241956 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.48420012470625223, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11399775648719515697&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Virginia", "aff_unique_dep": "", "aff_unique_url": "https://www.virginia.edu", "aff_unique_abbr": "UVA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3339", "id": "JFKR3WqwyXR", "poster": "", "openreview": "https://openreview.net/forum?id=JFKR3WqwyXR", "slides": "https://iclr.cc/virtual/2021/poster/3339", "video": "https://iclr.cc/virtual/2021/poster/3339", "author_site": "Calypso Herrera, Florian Krach, Josef Teichmann", "tldr": "", "abstract": "Combinations of neural ODEs with recurrent neural networks (RNN), like GRU-ODE-Bayes or ODE-RNN are well suited to model irregularly observed time series. While those models outperform existing discrete-time approaches, no theoretical guarantees for their predictive capabilities are available. Assuming that the irregularly-sampled time series data originates from a continuous stochastic process, the $L^2$-optimal online prediction is the conditional expectation given the currently available information. We introduce the Neural Jump ODE (NJ-ODE) that provides a data-driven approach to learn, continuously in time, the conditional expectation of a stochastic process. Our approach models the conditional expectation between two observations with a neural ODE and jumps whenever a new observation is made. We define a novel training framework, which allows us to prove theoretical guarantees for the first time. In particular, we show that the output of our model converges to the $L^2$-optimal prediction. This can be interpreted as solution to a special filtering problem. We provide experiments showing that the theoretical results also hold empirically. Moreover, we experimentally show that our model outperforms the baselines in more complex learning tasks and give comparisons on real-world datasets.", "keywords": "Neural ODE;conditional expectation;irregular-observed data modelling", "primary_area": "", "supplementary_material": "", "author": "Calypso Herrera;Florian Krach;Josef Teichmann", "authorids": "~Calypso_Herrera1;~Florian_Krach1;jteichma@math.ethz.ch", "gender": ";M;", "homepage": "https://people.math.ethz.ch/~cherrera/index.html;https://floriankrach.github.io/;", "dblp": ";;", "google_scholar": "LPdzJWgAAAAJ;CZMnoE4AAAAJ;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Calypso_Herrera1;~Florian_Krach1;jteichma@math.ethz.ch", "aff": "Swiss Federal Institute of Technology;Swiss Federal Institute of Technology;", "aff_domain": "ethz.ch;ethz.ch;", "position": "PhD student;PhD student;", "bibtex": "@inproceedings{\nherrera2021neural,\ntitle={Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering},\nauthor={Calypso Herrera and Florian Krach and Josef Teichmann},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=JFKR3WqwyXR}\n}", "github": "[![github](/images/github_icon.svg) HerreraKrachTeichmann/ControlledODERNN](https://github.com/HerreraKrachTeichmann/ControlledODERNN) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=JFKR3WqwyXR)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "3;4;5;2", "wc_review": "259;340;765;374", "wc_reply_reviewers": "0;517;0;16", "wc_reply_authors": "540;1754;800;417", "reply_reviewers": "0;1;0;1", "reply_authors": "1;3;1;2", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 434.5, 195.33368885064348 ], "wc_reply_reviewers_avg": [ 133.25, 221.65443261978768 ], "wc_reply_authors_avg": [ 877.75, 524.4579940281204 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.18257418583505536, "gs_citation": 48, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15466207538708166602&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 11, "pdf": "https://openreview.net/pdf?id=JFKR3WqwyXR", "email": "ethz.ch;ethz.ch;", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Swiss Federal Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Switzerland" }, { "title": "Accurate Learning of Graph Representations with Graph Multiset Pooling", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3311", "id": "JHcqXGaqiGn", "poster": "", "openreview": "https://openreview.net/forum?id=JHcqXGaqiGn", "slides": "https://iclr.cc/virtual/2021/poster/3311", "video": "https://iclr.cc/virtual/2021/poster/3311", "author_site": "Jinheon Baek, Minki Kang, Sung Ju Hwang", "tldr": "", "abstract": "Graph neural networks have been widely used on modeling graph data, achieving impressive results on node classification and link prediction tasks. Yet, obtaining an accurate representation for a graph further requires a pooling function that maps a set of node representations into a compact form. A simple sum or average over all node representations considers all node features equally without consideration of their task relevance, and any structural dependencies among them. Recently proposed hierarchical graph pooling methods, on the other hand, may yield the same representation for two different graphs that are distinguished by the Weisfeiler-Lehman test, as they suboptimally preserve information from the node features. To tackle these limitations of existing graph pooling methods, we first formulate the graph pooling problem as a multiset encoding problem with auxiliary information about the graph structure, and propose a Graph Multiset Transformer (GMT) which is a multi-head attention based global pooling layer that captures the interaction between nodes according to their structural dependencies. We show that GMT satisfies both injectiveness and permutation invariance, such that it is at most as powerful as the Weisfeiler-Lehman graph isomorphism test. Moreover, our methods can be easily extended to the previous node clustering approaches for hierarchical graph pooling. Our experimental results show that GMT significantly outperforms state-of-the-art graph pooling methods on graph classification benchmarks with high memory and time efficiency, and obtains even larger performance gain on graph reconstruction and generation tasks.", "keywords": "Graph representation learning;Graph pooling", "primary_area": "", "supplementary_material": "", "author": "Jinheon Baek;Minki Kang;Sung Ju Hwang", "authorids": "~Jinheon_Baek1;~Minki_Kang1;~Sung_Ju_Hwang1", "gender": "M;M;", "homepage": "https://jinheonbaek.github.io;https://nardien.github.io;", "dblp": "262/6003;232/2406;", "google_scholar": "U1FHaSUAAAAJ;90G751oAAAAJ;", "orcid": "0000-0002-9367-560X;;", "linkedin": "jinheon-baek-8100a8144/;;", "or_profile": "~Jinheon_Baek1;~Minki_Kang1;~Sung_Ju_Hwang1", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;", "aff_domain": "kaist.ac.kr;kaist.ac.kr;", "position": "MS student;MS student;", "bibtex": "@inproceedings{\nbaek2021accurate,\ntitle={Accurate Learning of Graph Representations with Graph Multiset Pooling},\nauthor={Jinheon Baek and Minki Kang and Sung Ju Hwang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=JHcqXGaqiGn}\n}", "github": "[![github](/images/github_icon.svg) JinheonBaek/GMT](https://github.com/JinheonBaek/GMT)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "4;5;4;5", "wc_review": "289;437;623;455", "wc_reply_reviewers": "0;236;0;467", "wc_reply_authors": "2504;2941;825;2688", "reply_reviewers": "0;1;0;3", "reply_authors": "7;6;3;7", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 451.0, 118.36384583140241 ], "wc_reply_reviewers_avg": [ 175.75, 193.79934855411668 ], "wc_reply_authors_avg": [ 2239.5, 831.2678569510553 ], "reply_reviewers_avg": [ 1.0, 1.224744871391589 ], "reply_authors_avg": [ 5.75, 1.6393596310755 ], "replies_avg": [ 34, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.40824829046386296, "gs_citation": 251, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8033778925255724792&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=JHcqXGaqiGn", "email": "kaist.ac.kr;kaist.ac.kr;", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "id": "JHx9ZDCQEA", "title": "PolyRetro: Few-shot Polymer Retrosynthesis via Domain Adaptation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Polymers appear everywhere in our daily lives -- fabrics, plastics, rubbers, etc. -- and we could hardly live without them. To make polymers, chemists develop processes that combine smaller building blocks~(monomers) to form long chains or complex networks~(polymers). These processes are called polymerizations and will usually take lots of human efforts to develop. Although machine learning models for small molecules have generated lots of promising results, the prediction problem for polymerization is new and suffers from the scarcity of polymerization datasets available in the field. Furthermore, the problem is made even more challenging by the large size of the polymers and the additional recursive constraints, which are not present in the small molecule problem. In this paper, we make an initial step towards this challenge and propose a learning-based search framework that can automatically identify a sequence of reactions that lead to the polymerization of a target polymer with minimal polymerization data involved. Our method transfers models trained on small molecule datasets for retrosynthesis to check the validity of polymerization reaction. Furthermore, our method also incorporates a template prior learned on a limited amount of polymer data into the framework to adapt the model from small molecule to the polymer domain. We demonstrate that our method is able to propose high-quality polymerization plans for a dataset of 52 real-world polymers, of which a significant portion successfully recovers the currently-in-used polymerization processes in the real world.", "keywords": "ML for Chemistry;Polymer Retrosynthesis;Few-show Learning;Domain Adaptation", "primary_area": "", "supplementary_material": "/attachment/08ac1c689024b04b414f3165dce43ee06d5bde2c.zip", "author": "Binghong Chen;Chengtao Li;Hanjun Dai;Rampi Ramprasad;Le Song", "authorids": "~Binghong_Chen1;~Chengtao_Li1;~Hanjun_Dai1;rrampi790@gmail.com;~Le_Song1", "gender": "M;;M;;M", "homepage": "http://binghongchen.net/;;https://hanjun-dai.github.io;;http://www.cc.gatech.edu/~lsong", "dblp": "192/2022;;144/7311;;94/3481", "google_scholar": "6Px5HxsAAAAJ;;obpl7GQAAAAJ;;Xl4E0CsAAAAJ", "orcid": ";;;;", "linkedin": "binghong-chen-91b697181/;;hanjun-dai;;", "or_profile": "~Binghong_Chen1;~Chengtao_Li1;~Hanjun_Dai1;rrampi790@gmail.com;~Le_Song1", "aff": "Georgia Institute of Technology;;Google Research;;College of Computing, Georgia Institute of Technology", "aff_domain": "gatech.edu;;google.com;;cc.gatech.edu", "position": "PhD student;;Researcher;;Associate Professor", "bibtex": "@misc{\nchen2021polyretro,\ntitle={PolyRetro: Few-shot Polymer Retrosynthesis via Domain Adaptation},\nauthor={Binghong Chen and Chengtao Li and Hanjun Dai and Rampi Ramprasad and Le Song},\nyear={2021},\nurl={https://openreview.net/forum?id=JHx9ZDCQEA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=JHx9ZDCQEA", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;3;3;3", "wc_review": "843;212;598;410", "wc_reply_reviewers": "182;14;0;64", "wc_reply_authors": "860;43;533;97", "reply_reviewers": "1;1;0;1", "reply_authors": "2;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 515.75, 233.07978784098805 ], "wc_reply_reviewers_avg": [ 65.0, 71.61703707917552 ], "wc_reply_authors_avg": [ 383.25, 334.44908057879303 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:hlKF30iwF2wJ:scholar.google.com/&scioq=PolyRetro:+Few-shot+Polymer+Retrosynthesis+via+Domain+Adaptation&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Georgia Institute of Technology;Google", "aff_unique_dep": ";Google Research", "aff_unique_url": "https://www.gatech.edu;https://research.google", "aff_unique_abbr": "Georgia Tech;Google Research", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Mountain View;Atlanta", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "JI2TGOehNT0", "title": "Combining Imitation and Reinforcement Learning with Free Energy Principle", "track": "main", "status": "Reject", "tldr": "", "abstract": "Imitation Learning (IL) and Reinforcement Learning (RL) from high dimensional sensory inputs are often introduced as separate problems, but a more realistic problem setting is how to merge the techniques so that the agent can reduce exploration costs by partially imitating experts at the same time it maximizes its return. Even when the experts are suboptimal (e.g. Experts learned halfway with other RL methods or human-crafted experts), it is expected that the agent outperforms the suboptimal experts\u2019 performance. In this paper, we propose to address the issue by using and theoretically extending Free Energy Principle, a unified brain theory that explains perception, action and model learning in a Bayesian probabilistic way. We find that both IL and RL can be achieved based on the same free energy objective function. Our results show that our approach is promising in visual control tasks especially with sparse-reward environments.", "keywords": "Imitation;Reinforcement Learning;Free Energy Principle", "primary_area": "", "supplementary_material": "/attachment/3938ad292c9c9665152d7f19e0a7c48b5dbc2ee8.zip", "author": "Ryoya Ogishima;Izumi Karino;Yasuo Kuniyoshi", "authorids": "~Ryoya_Ogishima1;karino@isi.imi.i.u-tokyo.ac.jp;~Yasuo_Kuniyoshi1", "gender": ";;M", "homepage": "http://www.isi.imi.i.u-tokyo.ac.jp/member/;;http://www.isi.imi.i.u-tokyo.ac.jp/", "dblp": ";;42/4337", "google_scholar": ";;https://scholar.google.co.jp/citations?hl=ja", "orcid": ";;0000-0001-8443-4161", "linkedin": ";;", "or_profile": "~Ryoya_Ogishima1;karino@isi.imi.i.u-tokyo.ac.jp;~Yasuo_Kuniyoshi1", "aff": ";;The University of Tokyo", "aff_domain": ";;u-tokyo.ac.jp", "position": ";;Full Professor", "bibtex": "@misc{\nogishima2021combining,\ntitle={Combining Imitation and Reinforcement Learning with Free Energy Principle},\nauthor={Ryoya Ogishima and Izumi Karino and Yasuo Kuniyoshi},\nyear={2021},\nurl={https://openreview.net/forum?id=JI2TGOehNT0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=JI2TGOehNT0", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;2;2;5", "wc_review": "284;748;205;697", "wc_reply_reviewers": "120;0;0;0", "wc_reply_authors": "818;335;314;459", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 1.299038105676658 ], "wc_review_avg": [ 483.5, 241.30116037847807 ], "wc_reply_reviewers_avg": [ 30.0, 51.96152422706632 ], "wc_reply_authors_avg": [ 481.5, 202.0253696940065 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.2721655269759087, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9796117524050811468&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "id": "JNP-CqSjkDb", "title": "Transforming Recurrent Neural Networks with Attention and Fixed-point Equations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Transformer has achieved state of the art performance in multiple Natural Language Processing tasks recently. Yet the Feed Forward Network(FFN) in a Transformer block is computationally expensive. In this paper, we present a framework to transform Recurrent Neural Networks(RNNs) and their variants into self-attention-style models, with an approximation of Banach Fixed-point Theorem. Within this framework, we propose a new model, StarSaber, by solving a set of equations obtained from RNN with Fixed-point Theorem and further approximate it with a Multi-layer Perceptron. It provides a view of stacking layers. StarSaber achieves better performance than both the vanilla Transformer and an improved version called ReZero on three datasets and is more computationally efficient, due to the reduction of Transformer's FFN layer. It has two major parts. One is a way to encode position information with two different matrices. For every position in a sequence, we have a matrix operating on positions before it and another matrix operating on positions after it. The other is the introduction of direct paths from the input layer to the rest of layers. Ablation studies show the effectiveness of these two parts. We additionally show that other RNN variants such as RNNs with gates can also be transformed in the same way, outperforming the two kinds of Transformers as well.", "keywords": "Fixed-point;Attention;Feed Forward Network;Transformer;Recurrent Neural Network;Deep Learning", "primary_area": "", "supplementary_material": "", "author": "Zhaobin Xu;Baotian Hu;Buzhou Tang", "authorids": "~Zhaobin_Xu2;~Baotian_Hu1;~Buzhou_Tang1", "gender": "M;M;M", "homepage": "http://www.hitsz.edu.cn/index.html;;", "dblp": ";155/1902;00/7437", "google_scholar": ";5NiJ1VoAAAAJ;https://scholar.google.com/citations?hl=zh-CN", "orcid": ";0000-0001-7490-684X;", "linkedin": ";;", "or_profile": "~Zhaobin_Xu2;~Baotian_Hu1;~Buzhou_Tang1", "aff": ";Harbin Institute of Technology, Shenzhen;Harbin Institute of Technology", "aff_domain": ";hit.edu.cn;hit.edu.cn", "position": ";Assistant Professor;Associate Professor", "bibtex": "@misc{\nxu2021transforming,\ntitle={Transforming Recurrent Neural Networks with Attention and Fixed-point Equations},\nauthor={Zhaobin Xu and Baotian Hu and Buzhou Tang},\nyear={2021},\nurl={https://openreview.net/forum?id=JNP-CqSjkDb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=JNP-CqSjkDb", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "4;4;4;4", "wc_review": "487;331;259;360", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 359.25, 82.41472865938466 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:H4f3zSYs74oJ:scholar.google.com/&scioq=Transforming+Recurrent+Neural+Networks+with+Attention+and+Fixed-point+Equations&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Harbin Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "http://en.hhit.edu.cn/", "aff_unique_abbr": "HIT", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Shenzhen;Harbin", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "JNtw9rUJnV", "title": "Real-Time AutoML", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present a new zero-shot approach to automated machine learning (AutoML) that predicts a high-quality model for a supervised learning task and dataset in real-time without fitting a single model. In contrast, most AutoML systems require tens or hundreds of model evaluations. Hence our approach accelerates AutoML by orders of magnitude. Our method uses a transformer-based language embedding to represent datasets and algorithms using their free-text descriptions and a meta-feature extractor to represent the data. We train a graph neural network in which each node represents a dataset to predict the best machine learning pipeline for a new test dataset. The graph neural network generalizes to new datasets and new sets of datasets. Our approach leverages the progress of unsupervised representation learning in natural language processing to provide a significant boost to AutoML. Performance is competitive with state-of-the-art AutoML systems while reducing running time from minutes to seconds and prediction time from minutes to milliseconds, providing AutoML in real-time.", "keywords": "Automated machine learning;zero-shot learning;graph neural networks;transformers", "primary_area": "", "supplementary_material": "/attachment/e3bd579ab28308325ec1ed2e6dfcc1f641b86366.zip", "author": "Iddo Drori;Brandon Kates;Anant Kharkar;Lu Liu;Qiang Ma;Jonah Deykin;Nihar Sidhu;Madeleine Udell", "authorids": "~Iddo_Drori1;bjk224@cornell.edu;agk2151@columbia.edu;ll3252@columbia.edu;~Qiang_Ma3;jd3599@columbia.edu;ns625@cornell.edu;~Madeleine_Udell1", "gender": "M;;;;M;;;F", "homepage": "https://www.cs.columbia.edu/~idrori;;;;;;;https://people.orie.cornell.edu/mru8", "dblp": "86/2557;;;;;;;153/2166", "google_scholar": "https://scholar.google.com/citations?hl=en;;;;;;;tZ9pEDMAAAAJ", "orcid": "0000-0001-9797-3885;;;;;;;0000-0002-3985-915X", "linkedin": "iddodrori;;;;;;;", "or_profile": "~Iddo_Drori1;bjk224@cornell.edu;agk2151@columbia.edu;ll3252@columbia.edu;~Qiang_Ma3;jd3599@columbia.edu;ns625@cornell.edu;~Madeleine_Udell1", "aff": "Massachusetts Institute of Technology;;;;;;;Google", "aff_domain": "mit.edu;;;;;;;google.com", "position": "Lecturer;;;;;;;Visiting researcher", "bibtex": "@misc{\ndrori2021realtime,\ntitle={Real-Time Auto{\\{}ML{\\}}},\nauthor={Iddo Drori and Brandon Kates and Anant Kharkar and Lu Liu and Qiang Ma and Jonah Deykin and Nihar Sidhu and Madeleine Udell},\nyear={2021},\nurl={https://openreview.net/forum?id=JNtw9rUJnV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer5", "site": "https://openreview.net/forum?id=JNtw9rUJnV", "pdf_size": 0, "rating": "2;4;4;4;4", "confidence": "5;5;4;4;4", "wc_review": "558;609;690;836;478", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 3.6, 0.8000000000000002 ], "confidence_avg": [ 4.4, 0.48989794855663565 ], "wc_review_avg": [ 634.2, 122.21031053065859 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.6123724356957944, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:wMUR1YNy9mUJ:scholar.google.com/&scioq=Real-Time+AutoML&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Massachusetts Institute of Technology;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://web.mit.edu;https://www.google.com", "aff_unique_abbr": "MIT;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "JRJTVcG0f-N", "title": "Hierarchical Binding in Convolutional Neural Networks Confers Adversarial Robustness", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We approach the issue of robust machine vision by presenting a novel deep-learning architecture, inspired by work in theoretical neuroscience on how the primate brain performs visual 'feature binding'. Feature binding describes how separately represented features are encoded in a relationally meaningful way, such as a small edge composing part of the larger contour of an object, or the ear of a cat forming part of its head representation. We propose that the absence of such representations from current models such as convolutional neural networks might partly explain their vulnerability to small, often humanly-imperceptible changes to images known as adversarial examples. It has been proposed that adversarial examples are a result of 'off-manifold' perturbations of images, as the decision boundary is often unpredictable in these directions. Our novel architecture is designed to capture hierarchical feature binding, providing representations in these otherwise vulnerable directions. Having introduced these representations into convolutional neural networks, we provide empirical evidence of enhanced robustness against a broad range of $L_0$, $L_2$ and $L_\\infty$ attacks in both the black-box and white-box setting on MNIST, Fashion-MNIST, and CIFAR-10. We further provide evidence, through the controlled manipulation of a key hyperparameter, synthetic data-sets, and ablation analyses, that this robustness is dependent on the introduction of the hierarchical binding representations.", "keywords": "adversarial examples;robust representations;feature binding", "primary_area": "", "supplementary_material": "", "author": "Niels Leadholm;Simon Stringer", "authorids": "~Niels_Leadholm1;simon.stringer@psy.ox.ac.uk", "gender": "M;", "homepage": "https://github.com/nielsleadholm;", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Niels_Leadholm1;simon.stringer@psy.ox.ac.uk", "aff": "University of Oxford;", "aff_domain": "oxford.ac.uk;", "position": "PhD student;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=JRJTVcG0f-N", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "3;4;3;3", "wc_review": "793;288;444;707", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "261;42;95;139", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 558.0, 202.06558341291077 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 134.25, 80.83741398634669 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:IsmrMAAYqzIJ:scholar.google.com/&scioq=Hierarchical+Binding+in+Convolutional+Neural+Networks+Confers+Adversarial+Robustness&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "JU8ceIgm5xB", "title": "Decomposing Mutual Information for Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many self-supervised representation learning methods maximize mutual information (MI) across views. In this paper, we transform each view into a set of subviews and then decompose the original MI bound into a sum of bounds involving conditional MI between the subviews. E.g.,~given two views $x$ and $y$ of the same input example, we can split $x$ into two subviews, $x^{\\prime}$ and $x^{\\prime\\prime}$, which depend only on $x$ but are otherwise unconstrained. The following holds: $I(x; y) \\geq I(x^{\\prime\\prime}; y) + I(x^{\\prime}; y | x^{\\prime\\prime})$, due to the chain rule and information processing inequality. By maximizing both terms in the decomposition, our approach explicitly rewards the encoder for any information about $y$ which it extracts from $x^{\\prime\\prime}$, and for information about $y$ extracted from $x^{\\prime}$ in excess of the information from $x^{\\prime\\prime}$. We provide a novel contrastive lower-bound on conditional MI, that relies on sampling contrast sets from $p(y|x^{\\prime\\prime})$. By decomposing the original MI into a sum of increasingly challenging MI bounds between sets of increasingly informed views, our representations can capture more of the total information shared between the original views. We empirically test the method in a vision domain and for dialogue generation.", "keywords": "Mutual Information;Self-supervised learning", "primary_area": "", "supplementary_material": "", "author": "Alessandro Sordoni;Nouha Dziri;Hannes Schulz;Geoff Gordon;Remi Tachet des Combes;Philip Bachman", "authorids": "~Alessandro_Sordoni2;dziri@ualberta.ca;~Hannes_Schulz1;~Geoff_Gordon2;~Remi_Tachet_des_Combes1;~Philip_Bachman1", "gender": ";;M;;M;M", "homepage": ";;;;;", "dblp": ";;12/2966;;146/0392;", "google_scholar": ";;tg-4hxoAAAAJ;;1MZF70cAAAAJ;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Alessandro_Sordoni2;dziri@ualberta.ca;~Hannes_Schulz1;~Geoff_Gordon2;~Remi_Tachet_des_Combes1;~Philip_Bachman1", "aff": ";;;;Microsoft Research;Microsoft", "aff_domain": ";;;;microsoft.com;microsoft.com", "position": ";;;;Researcher;Researcher", "bibtex": "@misc{\nsordoni2021decomposing,\ntitle={Decomposing Mutual Information for Representation Learning},\nauthor={Alessandro Sordoni and Nouha Dziri and Hannes Schulz and Geoff Gordon and Remi Tachet des Combes and Philip Bachman},\nyear={2021},\nurl={https://openreview.net/forum?id=JU8ceIgm5xB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=JU8ceIgm5xB", "pdf_size": 0, "rating": "5;5;6", "confidence": "5;3;3", "wc_review": "1632;237;150", "wc_reply_reviewers": "375;0;0", "wc_reply_authors": "1411;402;294", "reply_reviewers": "1;0;0", "reply_authors": "2;1;1", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 673.0, 679.0449175128255 ], "wc_reply_reviewers_avg": [ 125.0, 176.7766952966369 ], "wc_reply_authors_avg": [ 702.3333333333334, 503.0389867815637 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Gtg1wMPizYQJ:scholar.google.com/&scioq=Decomposing+Mutual+Information+for+Representation+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Microsoft Research", "aff_unique_url": "https://www.microsoft.com/en-us/research", "aff_unique_abbr": "MSR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "JUc6-1xuOX", "title": "DeepLTRS: A Deep Latent Recommender System based on User Ratings and Reviews", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We introduce a deep latent recommender system named deepLTRS in order to provide users with high quality recommendations based on observed user ratings and texts of product reviews. The underlying motivation is that, when a user scores only a few products, the texts used in the reviews represent a significant source of information. The addition of review information can alleviate data sparsity, thereby enhancing the predictive ability of the model. Our approach adopts a variational auto-encoder architecture as a generative deep latent variable model for both an ordinal matrix encoding users scores about products, and a document-term matrix encoding the reviews. Moreover, different from unique user-based or item-based models, deepLTRS assumes latent representations for both users and products. An alternated user/product mini-batching optimization structure is proposed to jointly capture user and product preferences. Numerical experiments on simulated and real-world data sets demonstrate that deepLTRS outperforms the state-of-the-art, in particular in contexts of extreme data sparsity.", "keywords": "representation learning for recommender system;optimization for representation learning;variational auto-encoder;topic modeling", "primary_area": "", "supplementary_material": "", "author": "Dingge LIANG;Marco Corneli;pierre Latouche;Charles Bouveyron", "authorids": "~Dingge_LIANG1;~Marco_Corneli1;pierre.latouche@parisdescartes.fr;~Charles_Bouveyron2", "gender": "F;M;;M", "homepage": ";https://math.unice.fr/~mcorneli/;;http://math.unice.fr/~cbouveyr/", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": "dinggeliang/;;;", "or_profile": "~Dingge_LIANG1;~Marco_Corneli1;pierre.latouche@parisdescartes.fr;~Charles_Bouveyron2", "aff": "INRIA;;;Universit\u00e9 C\u00f4te d'Azur", "aff_domain": "inria.fr;;;univ-cotedazur.fr", "position": "PhD student;;;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=JUc6-1xuOX", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "3;4;3;5", "wc_review": "137;210;346;447", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 285.0, 119.88953248720257 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.4545454545454545, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8978657100285630490&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1", "aff_unique_norm": "INRIA;Universit\u00e9 C\u00f4te d'Azur", "aff_unique_dep": ";", "aff_unique_url": "https://www.inria.fr;https://www.univ-cotedazur.fr", "aff_unique_abbr": "INRIA;UCA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "France" }, { "id": "JUgC3lqn6r2", "title": "Noisy Differentiable Architecture Search", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Simplicity is the ultimate sophistication. Differentiable Architecture Search (DARTS) has now become one of the mainstream paradigms of neural architecture search. However, it largely suffers from the well-known performance collapse issue. Such aggregation is thought to have overly benefited from the residual structure which accelerates the information flow. To weaken this impact, we propose to inject unbiased random noise to allow fair competition for candidate operations. We name this novel approach as NoisyDARTS. In effect, a network optimizer should perceive this difficulty at each training step and refrain from overshooting, especially on skip connections. In the long run, since we add no bias to the gradient in terms of expectation, it is still likely to converge to the right solution area. We also prove that the injected noise plays a role in smoothing the loss landscape, which makes the optimization easier. Compared with the existing work, our method features extreme simplicity and acts as a new strong baseline. \n", "keywords": "neural architecture search;stabilize DARTS;noise injection", "primary_area": "", "supplementary_material": "/attachment/4b3f2c7a0ef3807af8d5cc7f7c18fa7709dee300.zip", "author": "Xiangxiang Chu;Bo Zhang", "authorids": "~Xiangxiang_Chu1;~Bo_Zhang7", "gender": "M;M", "homepage": "https://cxxgtxy.github.io/;", "dblp": "207/8002;36/2259-46", "google_scholar": "jn21pUsAAAAJ;uUNQnu0AAAAJ", "orcid": "0000-0003-2548-0605;0000-0003-0564-617X", "linkedin": ";bo-zhang-20a86588/", "or_profile": "~Xiangxiang_Chu1;~Bo_Zhang7", "aff": "MeiTuan;Meituan Inc.", "aff_domain": "meituan.com;meituan.com", "position": "Senior Engineer;Senior Software Engineer", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=JUgC3lqn6r2", "pdf_size": 0, "rating": "2;5;5;5", "confidence": "5;4;4;3", "wc_review": "815;515;302;537", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "643;558;176;526", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 1.299038105676658 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 542.25, 182.26543144546088 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 475.75, 178.2643752969168 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 68, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9320725238475689815&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1", "aff_unique_norm": "Meituan;Meituan Inc.", "aff_unique_dep": ";", "aff_unique_url": "https://www.meituan.com;https://www.meituan.com", "aff_unique_abbr": "MeiTuan;Meituan", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "JVs1OrQgR3A", "title": "Time Series Counterfactual Inference with Hidden Confounders", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present augmented counterfactual ordinary differential equations (ACODEs), a new approach to counterfactual inference on time series data with a focus on healthcare applications. ACODEs model interventions in continuous time with differential equations, augmented by auxiliary confounding variables to reduce inference bias. Experiments on tumor growth simulation and sepsis patient treatment response show that ACODEs outperform other methods like counterfactual Gaussian processes, recurrent marginal structural networks, and time series deconfounders in the accuracy of counterfactual inference. The learned auxiliary variables also reveal new insights into causal interventions and hidden confounders.", "keywords": "Time Series Analysis;Counterfactual Inference;Differential Equations.", "primary_area": "", "supplementary_material": "", "author": "Guangyu Li;Jiahao Chen;Samuel A Assefa;Yan Liu", "authorids": "~Guangyu_Li2;~Jiahao_Chen1;samuel.a.assefa@jpmorgan.com;~Yan_Liu1", "gender": ";M;;F", "homepage": ";https://jiahao.github.io;;http://www-bcf.usc.edu/~liu32/", "dblp": "131/6213;149/2661-1;;150/4295", "google_scholar": ";TQYNuFAAAAAJ;;UUKLPMYAAAAJ", "orcid": ";0000-0002-4357-6574;;0000-0002-7055-9518", "linkedin": ";https://linkedin.com/in/jiahao;;", "or_profile": "~Guangyu_Li2;~Jiahao_Chen1;samuel.a.assefa@jpmorgan.com;~Yan_Liu1", "aff": "University of Southern California;J.P. Morgan Chase;;University of Southern California", "aff_domain": "usc.edu;jpmorgan.com;;usc.edu", "position": "PhD student;AI Research Director;;Professor", "bibtex": "@misc{\nli2021time,\ntitle={Time Series Counterfactual Inference with Hidden Confounders},\nauthor={Guangyu Li and Jiahao Chen and Samuel A Assefa and Yan Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=JVs1OrQgR3A}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=JVs1OrQgR3A", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;3;4;3", "wc_review": "244;489;206;330", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 317.25, 108.86086303166992 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1356614609741269263&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Southern California;JPMorgan Chase & Co.", "aff_unique_dep": ";", "aff_unique_url": "https://www.usc.edu;https://www.jpmorganchase.com", "aff_unique_abbr": "USC;JPM", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Free Lunch for Few-shot Learning: Distribution Calibration", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2633", "id": "JWOiYxMG92s", "poster": "", "openreview": "https://openreview.net/forum?id=JWOiYxMG92s", "slides": "https://iclr.cc/virtual/2021/poster/2633", "video": "https://iclr.cc/virtual/2021/poster/2633", "author_site": "Shuo Yang, Lu Liu, Min Xu", "tldr": "", "abstract": "Learning from a limited number of samples is challenging since the learned model can easily become overfitted based on the biased distribution formed by only a few training examples. In this paper, we calibrate the distribution of these few-sample classes by transferring statistics from the classes with sufficient examples. Then an adequate number of examples can be sampled from the calibrated distribution to expand the inputs to the classifier. We assume every dimension in the feature representation follows a Gaussian distribution so that the mean and the variance of the distribution can borrow from that of similar classes whose statistics are better estimated with an adequate number of samples. Our method can be built on top of off-the-shelf pretrained feature extractors and classification models without extra parameters. We show that a simple logistic regression classifier trained using the features sampled from our calibrated distribution can outperform the state-of-the-art accuracy on three datasets (~5% improvement on miniImageNet compared to the next best). The visualization of these generated features demonstrates that our calibrated distribution is an accurate estimation. ", "keywords": "few-shot learning;image classification;distribution estimation", "primary_area": "", "supplementary_material": "", "author": "Shuo Yang;Lu Liu;Min Xu", "authorids": "~Shuo_Yang5;~Lu_Liu7;~Min_Xu5", "gender": "M;F;F", "homepage": "https://faculty.hitsz.edu.cn/yangshuo;https://www.uts.edu.au/staff/min.xu;https://liulu112601.github.io/", "dblp": "78/1102-6;09/0-1.html;", "google_scholar": "mVtxxCkAAAAJ;https://scholar.google.com.au/citations?user=Ac6VCMkAAAAJ;epMGJ28AAAAJ", "orcid": ";0000-0001-9581-8849;", "linkedin": ";;lu-liu-2b5b93187/", "or_profile": "~Shuo_Yang5;~Min_Xu5;~Lu_Liu4", "aff": "University of Technology Sydney, Australia;University of Technology Sydney;University of Technology Sydney", "aff_domain": "student.uts.edu.au;uts.edu.au;uts.edu.au", "position": "PhD student;Associate Professor;PhD student", "bibtex": "@inproceedings{\nyang2021free,\ntitle={Free Lunch for Few-shot Learning: Distribution Calibration},\nauthor={Shuo Yang and Lu Liu and Min Xu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=JWOiYxMG92s}\n}", "github": "[![github](/images/github_icon.svg) ShuoYang-1998/ICLR2021-Oral_Distribution_Calibration](https://github.com/ShuoYang-1998/ICLR2021-Oral_Distribution_Calibration) + [![Papers with Code](/images/pwc_icon.svg) 5 community implementations](https://paperswithcode.com/paper/?openreview=JWOiYxMG92s)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "7;7;7", "confidence": "4;4;5", "wc_review": "252;473;509", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "459;722;870", "reply_reviewers": "0;0;0", "reply_authors": "1;1;2", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 411.3333333333333, 113.62022511663827 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 683.6666666666666, 169.96535594709357 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 448, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8416872865746615998&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=JWOiYxMG92s", "email": "student.uts.edu.au;uts.edu.au;uts.edu.au", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Technology Sydney", "aff_unique_dep": "", "aff_unique_url": "https://www.uts.edu.au", "aff_unique_abbr": "UTS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Australia" }, { "id": "JYVODnDjU20", "title": "UNSUPERVISED ANOMALY DETECTION FROM SEMANTIC SIMILARITY SCORES", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper we present SemSAD, a simple and generic framework for detecting examples that lie out-of-distribution (OOD) for a given training set. The approach is based on learning a semantic similarity measure to find for a given test example the semantically closest example in the training set and then using a discriminator to classify whether the two examples show sufficient semantic dissimilarity such that the test example can be rejected as OOD. We are able to outperform previous approaches for anomaly, novelty, or out-of-distribution detection in the visual domain by a large margin. In particular we obtain AUROC values close to one for the challenging task of detecting examples from CIFAR-10 as out-of-distribution given CIFAR-100 as in-distribution, without making use of label information. ", "keywords": "Anomaly Detection;Out-of-Distribution Detection;Novelty Detection", "primary_area": "", "supplementary_material": "", "author": "Nima Rafiee;Rahil Gholamipoor;Markus Kollmann", "authorids": "nima.rafiee@hhu.de;rahil.gholamipoorfard@hhu.de;~Markus_Kollmann1", "gender": ";;M", "homepage": ";;https://www.mathmodeling.hhu.de/en.html", "dblp": ";;26/10996", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "nima.rafiee@hhu.de;rahil.gholamipoorfard@hhu.de;~Markus_Kollmann1", "aff": ";;Institute for Mathematical Modelling of Biological Systems", "aff_domain": ";;hhu.de", "position": ";;Full Professor", "bibtex": "@misc{\nrafiee2021unsupervised,\ntitle={{\\{}UNSUPERVISED{\\}} {\\{}ANOMALY{\\}} {\\{}DETECTION{\\}} {\\{}FROM{\\}} {\\{}SEMANTIC{\\}} {\\{}SIMILARITY{\\}} {\\{}SCORES{\\}}},\nauthor={Nima Rafiee and Rahil Gholamipoor and Markus Kollmann},\nyear={2021},\nurl={https://openreview.net/forum?id=JYVODnDjU20}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=JYVODnDjU20", "pdf_size": 0, "rating": "5;5;5;7", "confidence": "3;4;4;3", "wc_review": "555;977;504;370", "wc_reply_reviewers": "123;0;0;0", "wc_reply_authors": "676;270;532;110", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 601.5, 227.07983177728488 ], "wc_reply_reviewers_avg": [ 30.75, 53.26056233274298 ], "wc_reply_authors_avg": [ 397.0, 220.54704713507275 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6903671104175378437&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Institute for Mathematical Modelling of Biological Systems", "aff_unique_dep": "", "aff_unique_url": "https://www.imbs.uzh.ch", "aff_unique_abbr": "", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "id": "J_pvI6ap5Mn", "title": "Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph neural networks (GNNs) have been shown with superior performance in various applications, but training dedicated GNNs can be costly for large-scale graphs. Some recent work started to study the pre-training of GNNs. However, none of them provide theoretical insights into the design of their frameworks, or clear requirements and guarantees towards the transferability of GNNs. In this work, we establish a theoretically grounded and practically useful framework for the transfer learning of GNNs. Firstly, we propose a novel view towards the essential graph information and advocate the capturing of it as the goal of transferable GNN training, which motivates the design of EGI (ego-graph information maximization) to analytically achieve this goal. Secondly, we specify the requirement of structure-respecting node features as the GNN input, and conduct a rigorous analysis of GNN transferability based on the difference between the local graph Laplacians of the source and target graphs. Finally, we conduct controlled synthetic experiments to directly justify our theoretical conclusions. Extensive experiments on real-world networks towards role identification show consistent results in the rigorously analyzed setting of direct-transfering (freezing parameters), while those towards large-scale relation prediction show promising results in the more generalized and practical setting of transfering with fine-tuning.", "keywords": "Transfer learning;graph neural networks", "primary_area": "", "supplementary_material": "/attachment/3e920c0ae838ca213170faeef9708fbb5707f70b.zip", "author": "Qi Zhu;Yidan Xu;Haonan Wang;Chao Zhang;Jiawei Han;Carl Yang", "authorids": "~Qi_Zhu7;~Yidan_Xu1;~Haonan_Wang1;~Chao_Zhang9;~Jiawei_Han1;~Carl_Yang1", "gender": "M;;M;;M;M", "homepage": "https://gentlezhu.github.io/;;http://charles-haonan-wang.me/;;http://hanj.cs.illinois.edu/;https://cs.emory.edu/~jyang71/", "dblp": "66/5923-8;;;;h/JiaweiHan.html;305/0254", "google_scholar": "xCHy4c8AAAAJ;;cLziVZMAAAAJ;;https://scholar.google.com.tw/citations?user=Kv9AbjMAAAAJ;mOINlwcAAAAJ", "orcid": "0000-0003-0129-8542;;0009-0006-6963-8987;;0000-0002-3629-2696;0000-0001-9145-4531", "linkedin": "qi-zhu-22633598/;;;;;", "or_profile": "~Qi_Zhu7;~Yidan_Xu1;~Haonan_Wang1;~Chao_Zhang9;~Jiawei_Han1;~Carl_Yang1", "aff": "University of Illinois, Urbana Champaign;;University of Illinois, Urbana Champaign;;University of Illinois at Urbana-Champaign (UIUC);Emory University", "aff_domain": "illinois.edu;;illinois.edu;;illinois.edu;emory.edu", "position": "PhD student;;PhD student;;Full Professor;Assistant Professor", "bibtex": "@misc{\nzhu2021transfer,\ntitle={Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization},\nauthor={Qi Zhu and Yidan Xu and Haonan Wang and Chao Zhang and Jiawei Han and Carl Yang},\nyear={2021},\nurl={https://openreview.net/forum?id=J_pvI6ap5Mn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=J_pvI6ap5Mn", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "5;3;4;3", "wc_review": "1004;179;375;98", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1457;62;530;248", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 414.0, 355.2119085841577 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 574.25, 536.2006970342355 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.899228803025897, "gs_citation": 147, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5328682952509931138&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 14, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "University of Illinois Urbana-Champaign;Emory University", "aff_unique_dep": ";", "aff_unique_url": "https://illinois.edu;https://www.emory.edu", "aff_unique_abbr": "UIUC;Emory", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Urbana-Champaign;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Interpreting and Boosting Dropout from a Game-Theoretic View", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3349", "id": "Jacdvfjicf7", "poster": "", "openreview": "https://openreview.net/forum?id=Jacdvfjicf7", "slides": "https://iclr.cc/virtual/2021/poster/3349", "video": "https://iclr.cc/virtual/2021/poster/3349", "author_site": "Hao Zhang, Sen Li, YinChao Ma, Mingjie Li, Yichen Xie, Quanshi Zhang", "tldr": "", "abstract": "This paper aims to understand and improve the utility of the dropout operation from the perspective of game-theoretical interactions. We prove that dropout can suppress the strength of interactions between input variables of deep neural networks (DNNs). The theoretical proof is also verified by various experiments. Furthermore, we find that such interactions were strongly related to the over-fitting problem in deep learning. So, the utility of dropout can be regarded as decreasing interactions to alleviating the significance of over-fitting. Based on this understanding, we propose the interaction loss to further improve the utility of dropout. Experimental results on various DNNs and datasets have shown that the interaction loss can effectively improve the utility of dropout and boost the performance of DNNs.", "keywords": "Dropout;Interpretability;Interactions", "primary_area": "", "supplementary_material": "/attachment/171b677c01f7f08be17656c278682f7247b93f02.zip", "author": "Hao Zhang;Sen Li;YinChao Ma;Mingjie Li;Yichen Xie;Quanshi Zhang", "authorids": "~Hao_Zhang22;~Sen_Li2;~YinChao_Ma1;~Mingjie_Li3;~Yichen_Xie1;~Quanshi_Zhang1", "gender": "M;;M;M;M;M", "homepage": "https://haozhang37.github.io;;https://www.linkedin.com/in/morale-yc;http://lmjjjjjj.github.io;;http://qszhang.com", "dblp": "55/2270-63;;;48/10103;;http://dblp.uni-trier.de/pers/hd/z/Zhang:Quanshi", "google_scholar": "3g6LlgwAAAAJ;;9K1_K1wAAAAJ;7dXDygoAAAAJ;SdX6DaEAAAAJ;iFFhHK0AAAAJ", "orcid": ";0009-0002-3661-4744;;;;", "linkedin": ";;;;;", "or_profile": "~Hao_Zhang22;~Sen_Li2;~YinChao_Ma1;~Mingjie_Li3;~Yichen_Xie1;~Quanshi_Zhang1", "aff": "Shanghai Jiaotong University;;Huazhong University of Science and Technology;Shanghai Jiaotong University;Shanghai Jiaotong University;Shanghai Jiaotong University", "aff_domain": "sjtu.edu.cn;;hust.edu.cn;sjtu.edu.cn;sjtu.edu.cn;sjtu.edu.cn", "position": "MS student;;Undergrad student;Undergrad student;Undergrad student;Associate Professor", "bibtex": "@inproceedings{\nzhang2021interpreting,\ntitle={Interpreting and Boosting Dropout from a Game-Theoretic View},\nauthor={Hao Zhang and Sen Li and YinChao Ma and Mingjie Li and Yichen Xie and Quanshi Zhang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Jacdvfjicf7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "5;7;7;7", "confidence": "1;5;5;4", "wc_review": "163;602;247;386", "wc_reply_reviewers": "0;0;0;234", "wc_reply_authors": "266;1357;426;982", "reply_reviewers": "0;0;0;1", "reply_authors": "1;2;1;2", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 1.6393596310755 ], "wc_review_avg": [ 349.5, 166.11517089056014 ], "wc_reply_reviewers_avg": [ 58.5, 101.32497224277932 ], "wc_reply_authors_avg": [ 757.75, 436.2524355232874 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.9684959969581861, "gs_citation": 55, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16247101326677026750&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Jacdvfjicf7", "email": "sjtu.edu.cn;;hust.edu.cn;sjtu.edu.cn;sjtu.edu.cn;sjtu.edu.cn", "author_num": 6, "aff_unique_index": "0;1;0;0;0", "aff_unique_norm": "Shanghai Jiao Tong University;Huazhong University of Science and Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.sjtu.edu.cn;http://www.hust.edu.cn", "aff_unique_abbr": "SJTU;HUST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "China" }, { "id": "JbAqsfbYsJy", "title": "Action and Perception as Divergence Minimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "We introduce a unified objective for action and perception of intelligent agents. Extending representation learning and control, we minimize the joint divergence between the combined system of agent and environment and a target distribution. Intuitively, such agents use perception to align their beliefs with the world, and use actions to align the world with their beliefs. Minimizing the joint divergence to an expressive target maximizes the mutual information between the agent's representations and inputs, thus inferring representations that are informative of past inputs and exploring future inputs that are informative of the representations. This lets us explain intrinsic objectives, such as representation learning, information gain, empowerment, and skill discovery from minimal assumptions. Moreover, interpreting the target distribution as a latent variable model suggests powerful world models as a path toward highly adaptive agents that seek large niches in their environments, rendering task rewards optional. The framework provides a common language for comparing a wide range of objectives, advances the understanding of latent variables for decision making, and offers a recipe for designing novel objectives. We recommend deriving future agent objectives the joint divergence to facilitate comparison, to point out the agent's target distribution, and to identify the intrinsic objective terms needed to reach that distribution.", "keywords": "objective functions;reinforcement learning;information theory;probabilistic modeling;control as inference;exploration;intrinsic motivation;world models", "primary_area": "", "supplementary_material": "", "author": "Danijar Hafner;Pedro A Ortega;Jimmy Ba;Thomas Parr;Karl Friston;Nicolas Heess", "authorids": "~Danijar_Hafner1;~Pedro_A_Ortega1;~Jimmy_Ba1;thomas.parr.12@ucl.ac.uk;~Karl_Friston1;~Nicolas_Heess1", "gender": ";M;M;;M;", "homepage": "https://danijar.com;http://www.adaptiveagents.org;http://jimmylba.github.io;;https://www.fil.ion.ucl.ac.uk/~karl/;", "dblp": "184/8088;07/2797;https://dblp.org/pers/b/Ba:Jimmy.html;;;76/9181", "google_scholar": "VINmGpYAAAAJ;GK-j94AAAAAJ;https://scholar.google.ca/citations?user=ymzxRhAAAAAJ;;https://scholar.google.co.uk/citations?user=q_4u0aoAAAAJ;79k7bGEAAAAJ", "orcid": "0000-0002-9534-7271;;;;0000-0001-7984-8909;", "linkedin": ";https://uk.linkedin.com/in/pedro-a-ortega;;;;", "or_profile": "~Danijar_Hafner1;~Pedro_A_Ortega1;~Jimmy_Ba1;thomas.parr.12@ucl.ac.uk;~Karl_Friston1;~Nicolas_Heess1", "aff": "University of Toronto;Google DeepMind;Department of Computer Science, University of Toronto;;University College London;Google DeepMind", "aff_domain": "cs.toronto;deepmind.com;cs.toronto.edu;;ucl.ac.uk;google.com", "position": "PhD student;Researcher;Assistant Professor;;Principal Researcher;Research Scientist", "bibtex": "@misc{\nhafner2021action,\ntitle={Action and Perception as Divergence Minimization},\nauthor={Danijar Hafner and Pedro A Ortega and Jimmy Ba and Thomas Parr and Karl Friston and Nicolas Heess},\nyear={2021},\nurl={https://openreview.net/forum?id=JbAqsfbYsJy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=JbAqsfbYsJy", "pdf_size": 0, "rating": "3;6;6;7", "confidence": "4;3;4;3", "wc_review": "194;304;356;667", "wc_reply_reviewers": "322;153;0;16", "wc_reply_authors": "1641;1041;1060;903", "reply_reviewers": "1;2;0;1", "reply_authors": "3;4;2;3", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 380.25, 175.58242366478484 ], "wc_reply_reviewers_avg": [ 122.75, 129.49782816711638 ], "wc_reply_authors_avg": [ 1161.25, 283.5333975037156 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 3.0, 0.7071067811865476 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.6666666666666667, "gs_citation": 74, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6714302475483888091&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;0;2;1", "aff_unique_norm": "University of Toronto;Google;University College London", "aff_unique_dep": ";Google DeepMind;", "aff_unique_url": "https://www.utoronto.ca;https://deepmind.com;https://www.ucl.ac.uk", "aff_unique_abbr": "U of T;DeepMind;UCL", "aff_campus_unique_index": "1", "aff_campus_unique": ";Toronto", "aff_country_unique_index": "0;1;0;1;1", "aff_country_unique": "Canada;United Kingdom" }, { "title": "Directed Acyclic Graph Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3236", "id": "JbuYF437WB6", "poster": "", "openreview": "https://openreview.net/forum?id=JbuYF437WB6", "slides": "https://iclr.cc/virtual/2021/poster/3236", "video": "https://iclr.cc/virtual/2021/poster/3236", "author_site": "Veronika Thost, Jie Chen", "tldr": "", "abstract": "Graph-structured data ubiquitously appears in science and engineering. Graph neural networks (GNNs) are designed to exploit the relational inductive bias exhibited in graphs; they have been shown to outperform other forms of neural networks in scenarios where structure information supplements node features. The most common GNN architecture aggregates information from neighborhoods based on message passing. Its generality has made it broadly applicable. In this paper, we focus on a special, yet widely used, type of graphs---DAGs---and inject a stronger inductive bias---partial ordering---into the neural network design. We propose the directed acyclic graph neural network, DAGNN, an architecture that processes information according to the flow defined by the partial order. DAGNN can be considered a framework that entails earlier works as special cases (e.g., models for trees and models updating node representations recurrently), but we identify several crucial components that prior architectures lack. We perform comprehensive experiments, including ablation studies, on representative DAG datasets (i.e., source code, neural architectures, and probabilistic graphical models) and demonstrate the superiority of DAGNN over simpler DAG architectures as well as general graph architectures.", "keywords": "Graph Neural Networks;Graph Representation Learning;Directed Acyclic Graphs;DAG;Inductive Bias", "primary_area": "", "supplementary_material": "", "author": "Veronika Thost;Jie Chen", "authorids": "~Veronika_Thost1;~Jie_Chen1", "gender": "F;", "homepage": "https://mitibmwatsonailab.mit.edu/people/veronika-thost/;https://jiechenjiechen.github.io", "dblp": "132/3874;92/6289-7", "google_scholar": "TyScgJ0AAAAJ;Z-lkme8AAAAJ", "orcid": "0000-0003-4984-1532;", "linkedin": ";", "or_profile": "~Veronika_Thost1;~Jie_Chen1", "aff": "IBM Research;International Business Machines", "aff_domain": "ibm.com;ibm.com", "position": "Research Scientist;Research Staff Member", "bibtex": "@inproceedings{\nthost2021directed,\ntitle={Directed Acyclic Graph Neural Networks},\nauthor={Veronika Thost and Jie Chen},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=JbuYF437WB6}\n}", "github": "[![github](/images/github_icon.svg) vthost/DAGNN](https://github.com/vthost/DAGNN)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;7;7", "confidence": "4;3;3", "wc_review": "586;292;469", "wc_reply_reviewers": "0;25;189", "wc_reply_authors": "783;217;1059", "reply_reviewers": "0;1;2", "reply_authors": "1;1;3", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 449.0, 120.85528536228773 ], "wc_reply_reviewers_avg": [ 71.33333333333333, 83.82654048026131 ], "wc_reply_authors_avg": [ 686.3333333333334, 350.4752329179465 ], "reply_reviewers_avg": [ 1.0, 0.816496580927726 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9999999999999997, "gs_citation": 136, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13529849835566425247&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=JbuYF437WB6", "email": "ibm.com;ibm.com", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "IBM;International Business Machines Corporation", "aff_unique_dep": "IBM Research;", "aff_unique_url": "https://www.ibm.com/research;https://www.ibm.com", "aff_unique_abbr": "IBM;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "JdCUjf9xvlc", "title": "Fourier Representations for Black-Box Optimization over Categorical Variables", "track": "main", "status": "Reject", "tldr": "", "abstract": "Optimization of real-world black-box functions defined over purely categorical variables is an active area of research. In particular, optimization and design of biological sequences with specific functional or structural properties have a profound impact in medicine, materials science, and biotechnology. Standalone acquisition methods, such as simulated annealing (SA) and Monte Carlo tree search (MCTS), are typically used for such optimization problems. In order to improve the performance and sample efficiency of such acquisition methods, we propose to use existing acquisition methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables. To this end, we present two different representations, a group-theoretic Fourier expansion and an abridged one-hot encoded Boolean Fourier expansion. To learn such models, characters of each representation are considered as experts and their respective coefficients are updated via an exponential weight update rule each time the black box is evaluated. Numerical experiments over synthetic benchmarks as well as real-world RNA sequence optimization and design problems demonstrate the representational power of the proposed methods, which achieve competitive or superior performance compared to state-of-the-art counterparts, while improving the computational cost and/or sample efficiency substantially.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/077faf49d4fd7e7592f8624de833939b4a2168e2.zip", "author": "Hamid Dadkhahi;Jesus Rios;Karthikeyan Shanmugam;Payel Das", "authorids": "~Hamid_Dadkhahi1;jriosal@us.ibm.com;~Karthikeyan_Shanmugam1;~Payel_Das1", "gender": ";;M;F", "homepage": ";;https://sites.google.com/corp/view/karthikeyan-shanmugam/;", "dblp": "124/3214;;;56/7926", "google_scholar": "https://scholar.google.com/citations?hl=en;;https://scholar.google.ca/citations?user=m4DyPcUAAAAJ;", "orcid": ";;0009-0008-2879-5868;", "linkedin": ";;;", "or_profile": "~Hamid_Dadkhahi1;jriosal@us.ibm.com;~Karthikeyan_Shanmugam1;~Payel_Das1", "aff": "Amazon;;International Business Machines;IBM, International Business Machines", "aff_domain": "amazon.com;;ibm.com;us.ibm.com", "position": "Applied Scientist;;Research Staff Member;Principal Researcher", "bibtex": "@misc{\ndadkhahi2021fourier,\ntitle={Fourier Representations for Black-Box Optimization over Categorical Variables},\nauthor={Hamid Dadkhahi and Jesus Rios and Karthikeyan Shanmugam and Payel Das},\nyear={2021},\nurl={https://openreview.net/forum?id=JdCUjf9xvlc}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=JdCUjf9xvlc", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;3;1;3", "wc_review": "814;565;254;341", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1146;741;631;737", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 2.5, 0.8660254037844386 ], "wc_review_avg": [ 493.5, 217.05356481753532 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 813.75, 196.83162220537633 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7718597832067085230&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1;2", "aff_unique_norm": "Amazon;International Business Machines Corporation;International Business Machines", "aff_unique_dep": "Amazon.com, Inc.;;", "aff_unique_url": "https://www.amazon.com;https://www.ibm.com;https://www.ibm.com", "aff_unique_abbr": "Amazon;IBM;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "JeweO9-QqV-", "title": "SoGCN: Second-Order Graph Convolutional Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We introduce a second-order graph convolution (SoGC), a maximally localized kernel, that can express a polynomial spectral filter with arbitrary coefficients. We contrast our SoGC with vanilla GCN, first-order (one-hop) aggregation, and higher-order (multi-hop) aggregation by analyzing graph convolutional layers via generalized filter space. We argue that SoGC is a simple design capable of forming the basic building block of graph convolution, playing the same role as $3 \\times 3$ kernels in CNNs. We build purely topological Second-Order Graph Convolutional Networks (SoGCN) and demonstrate that SoGCN consistently achieves state-of-the-art performance on the latest benchmark. Moreover, we introduce the Gated Recurrent Unit (GRU) to spectral GCNs. This explorative attempt further improves our experimental results.", "keywords": "Graph Convolutional Networks;Filter Representation Power;Graph Polynomial Filters", "primary_area": "", "supplementary_material": "", "author": "Peihao Wang;Yuehao Wang;Hua Lin;Jianbo Shi", "authorids": "~Peihao_Wang1;~Yuehao_Wang1;~Hua_Lin1;~Jianbo_Shi1", "gender": "M;;M;M", "homepage": "https://peihaowang.github.io/;;http://linhuavvv.com/;http://www.cs.cmu.edu/~jshi/", "dblp": "239/4075;;;71/3879", "google_scholar": "fqf2tBsAAAAJ;;;", "orcid": ";;;", "linkedin": "peihao-wang-25a411162/;;hualin95/;", "or_profile": "~Peihao_Wang1;~Yuehao_Wang1;~Hua_Lin1;~Jianbo_Shi1", "aff": "ShanghaiTech University;;;University of Pennsylvania", "aff_domain": "shanghaitech.edu.cn;;;upenn.edu", "position": "Undergrad student;;;Professor", "bibtex": "@misc{\nwang2021sogcn,\ntitle={So{\\{}GCN{\\}}: Second-Order Graph Convolutional Networks},\nauthor={Peihao Wang and Yuehao Wang and Hua Lin and Jianbo Shi},\nyear={2021},\nurl={https://openreview.net/forum?id=JeweO9-QqV-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=JeweO9-QqV-", "pdf_size": 0, "rating": "5;5;5;7", "confidence": "4;3;5;4", "wc_review": "440;188;431;207", "wc_reply_reviewers": "0;0;0;145", "wc_reply_authors": "997;701;973;291", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;2;1", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 316.5, 119.23191686792593 ], "wc_reply_reviewers_avg": [ 36.25, 62.7868417743718 ], "wc_reply_authors_avg": [ 740.5, 284.36728011499497 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10515331628276633407&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "ShanghaiTech University;University of Pennsylvania", "aff_unique_dep": ";", "aff_unique_url": "https://www.shanghaitech.edu.cn;https://www.upenn.edu", "aff_unique_abbr": "ShanghaiTech;UPenn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "China;United States" }, { "id": "Jf24xdaAwF9", "title": "Self-Activating Neural Ensembles for Continual Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "The ability for an agent to continuously learn new skills without catastrophically forgetting existing knowledge is of critical importance for the development of generally intelligent agents. Most methods devised to address this problem depend heavily on well-defined task boundaries which simplify the problem considerably. Our task-agnostic method, Self-Activating Neural Ensembles (SANE), uses a hierarchical modular architecture designed to avoid catastrophic forgetting without making any such assumptions. At each timestep a path through the SANE tree is activated; during training only activated nodes are updated, ensuring that unused nodes do not undergo catastrophic forgetting. Additionally, new nodes are created as needed, allowing the system to leverage and retain old skills while growing and learning new ones. We demonstrate our approach on MNIST and a set of grid world environments, demonstrating that SANE does not undergo catastrophic forgetting where existing methods do.", "keywords": "continual reinforcement learning;lifelong learning;deep reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/0b8e80420a68d9c42a0f81ddbd628c895e35c40a.zip", "author": "Sam Powers;Abhinav Gupta", "authorids": "~Sam_Powers1;~Abhinav_Gupta1", "gender": ";M", "homepage": "https://www.ri.cmu.edu/ri-people/samantha-powers/;http://www.cs.cmu.edu/~abhinavg", "dblp": ";36/7024-1", "google_scholar": ";https://scholar.google.com.tw/citations?user=bqL73OkAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Sam_Powers1;~Abhinav_Gupta1", "aff": "Carnegie Mellon University;Meta Facebook", "aff_domain": "cmu.edu;fb.com", "position": "PhD student;Researcher", "bibtex": "@misc{\npowers2021selfactivating,\ntitle={Self-Activating Neural Ensembles for Continual Reinforcement Learning},\nauthor={Sam Powers and Abhinav Gupta},\nyear={2021},\nurl={https://openreview.net/forum?id=Jf24xdaAwF9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=Jf24xdaAwF9", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;4;4;3", "wc_review": "353;1376;1265;480", "wc_reply_reviewers": "54;51;0;0", "wc_reply_authors": "354;1196;759;717", "reply_reviewers": "1;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 868.5, 455.9169332235862 ], "wc_reply_reviewers_avg": [ 26.25, 26.271419832205492 ], "wc_reply_authors_avg": [ 756.5, 298.635647570748 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6156387137989122698&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1", "aff_unique_norm": "Carnegie Mellon University;Meta", "aff_unique_dep": ";Meta Platforms, Inc.", "aff_unique_url": "https://www.cmu.edu;https://meta.com", "aff_unique_abbr": "CMU;Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "JiNvAGORcMW", "title": "Cross-State Self-Constraint for Feature Generalization in Deep Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Representation learning on visualized input is an important yet challenging task for deep reinforcement learning (RL). The feature space learned from visualized input not only dominates the agent's generalization ability in new environments but also affect the data efficiency during training. To help the RL agent learn general and discriminative representation among various states, we present cross-state self-constraint(CSSC), a novel constraint that regularizes the representation feature space by comparing similarity of different pairs of representations. Based on the representation-behavior connection derived from the agent's experience, this constraint helps reinforce the general feature recognition during the learning process and thus enhance the generalization to unseen environment. We test our proposed method on the OpenAI ProcGen benchmark and see significant improvement on generalization performance across most of ProcGen games.", "keywords": "reinforcement learning;generalization;regularization", "primary_area": "", "supplementary_material": "", "author": "Guan Ting Liu;Pu-Jen Cheng;GuanYu Lin", "authorids": "~Guan_Ting_Liu1;~Pu-Jen_Cheng1;r09944017@csie.ntu.edu.tw", "gender": "M;M;", "homepage": "https://dannyliu15.github.io/;https://www.csie.ntu.edu.tw/~pjcheng/;", "dblp": "71/7317;45/160;", "google_scholar": "https://scholar.google.com/citations?hl=zh-TW;https://scholar.google.com.tw/citations?user=uYdM_rwAAAAJ;", "orcid": "0000-0002-7300-9036;;", "linkedin": ";;", "or_profile": "~Guan_Ting_Liu1;~Pu-Jen_Cheng1;r09944017@csie.ntu.edu.tw", "aff": "Department of computer science and information engineering, National Taiwan University;National Taiwan University;", "aff_domain": "csie.ntu.edu.tw;ntu.edu.tw;", "position": "PhD student;Full Professor;", "bibtex": "@misc{\nliu2021crossstate,\ntitle={Cross-State Self-Constraint for Feature Generalization in Deep Reinforcement Learning},\nauthor={Guan Ting Liu and Pu-Jen Cheng and GuanYu Lin},\nyear={2021},\nurl={https://openreview.net/forum?id=JiNvAGORcMW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=JiNvAGORcMW", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "4;4;4;3", "wc_review": "262;1008;325;237", "wc_reply_reviewers": "89;199;0;84", "wc_reply_authors": "208;614;255;234", "reply_reviewers": "1;1;0;1", "reply_authors": "1;2;1;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 458.0, 319.15748463728687 ], "wc_reply_reviewers_avg": [ 93.0, 70.67885115082163 ], "wc_reply_authors_avg": [ 327.75, 166.1029424784522 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6721923301688346124&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "National Taiwan University", "aff_unique_dep": "Department of Computer Science and Information Engineering", "aff_unique_url": "https://www.ntu.edu.tw", "aff_unique_abbr": "NTU", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Taiwan", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "title": "On Statistical Bias In Active Learning: How and When to Fix It", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3037", "id": "JiYq3eqTKY", "poster": "", "openreview": "https://openreview.net/forum?id=JiYq3eqTKY", "slides": "https://iclr.cc/virtual/2021/poster/3037", "video": "https://iclr.cc/virtual/2021/poster/3037", "author_site": "Sebastian Farquhar, Yarin Gal, Tom Rainforth", "tldr": "", "abstract": "Active learning is a powerful tool when labelling data is expensive, but it introduces a bias because the training data no longer follows the population distribution. We formalize this bias and investigate the situations in which it can be harmful and sometimes even helpful. We further introduce novel corrective weights to remove bias when doing so is beneficial. Through this, our work not only provides a useful mechanism that can improve the active learning approach, but also an explanation for the empirical successes of various existing approaches which ignore this bias. In particular, we show that this bias can be actively helpful when training overparameterized models---like neural networks---with relatively modest dataset sizes.", "keywords": "Active Learning;Monte Carlo;Risk Estimation", "primary_area": "", "supplementary_material": "", "author": "Sebastian Farquhar;Yarin Gal;Tom Rainforth", "authorids": "~Sebastian_Farquhar1;~Yarin_Gal1;~Tom_Rainforth1", "gender": ";;M", "homepage": "https://sebastianfarquhar.com/;http://www.cs.ox.ac.uk/people/yarin.gal/website//;http://www.robots.ox.ac.uk/~twgr", "dblp": "215/5432;67/9076;166/1198", "google_scholar": "bvShhTEAAAAJ;https://scholar.google.co.uk/citations?user=SIayDoQAAAAJ;https://scholar.google.co.uk/citations?user=ieLRNKMAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Sebastian_Farquhar1;~Yarin_Gal1;~Tom_Rainforth1", "aff": "University of Oxford;University of Oxford;", "aff_domain": "ox.ac.uk;ox.ac.uk;ox.ac.uk", "position": "PhD student;Associate Professor;Postdoc", "bibtex": "@inproceedings{\nfarquhar2021on,\ntitle={On Statistical Bias In Active Learning: How and When to Fix It},\nauthor={Sebastian Farquhar and Yarin Gal and Tom Rainforth},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=JiYq3eqTKY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "4;7;7;8", "confidence": "4;4;3;4", "wc_review": "333;1080;274;561", "wc_reply_reviewers": "0;392;0;0", "wc_reply_authors": "988;1360;446;610", "reply_reviewers": "0;1;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 6.5, 1.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 562.0, 317.6908874991538 ], "wc_reply_reviewers_avg": [ 98.0, 169.74097914175 ], "wc_reply_authors_avg": [ 851.0, 353.5378339018329 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.19245008972987526, "gs_citation": 111, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3347578608688188589&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=JiYq3eqTKY", "email": "ox.ac.uk;ox.ac.uk;ox.ac.uk", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "title": "Group Equivariant Stand-Alone Self-Attention For Vision", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3118", "id": "JkfYjnOEo6M", "poster": "", "openreview": "https://openreview.net/forum?id=JkfYjnOEo6M", "slides": "https://iclr.cc/virtual/2021/poster/3118", "video": "https://iclr.cc/virtual/2021/poster/3118", "author_site": "David W. Romero, Jean-Baptiste Cordonnier", "tldr": "", "abstract": "We provide a general self-attention formulation to impose group equivariance to arbitrary symmetry groups. This is achieved by defining positional encodings that are invariant to the action of the group considered. Since the group acts on the positional encoding directly, group equivariant self-attention networks (GSA-Nets) are steerable by nature. Our experiments on vision benchmarks demonstrate consistent improvements of GSA-Nets over non-equivariant self-attention networks.", "keywords": "group equivariant transformers;group equivariant self-attention;group equivariance;self-attention;transformers", "primary_area": "", "supplementary_material": "/attachment/451605a5d602c17829a8fb2c95e8e91e89a69e3e.zip", "author": "David W. Romero;Jean-Baptiste Cordonnier", "authorids": "~David_W._Romero1;~Jean-Baptiste_Cordonnier2", "gender": "M;M", "homepage": "https://davidwromero.xyz/;http://jbcordonnier.com", "dblp": "254/1396;227/3062", "google_scholar": "7tdzmVoAAAAJ;3YUTuIUAAAAJ", "orcid": ";", "linkedin": "david-w-romero-05893567/;", "or_profile": "~David_W._Romero1;~Jean-Baptiste_Cordonnier2", "aff": "Qualcomm AI Research;Swiss Federal Institute of Technology Lausanne", "aff_domain": "qti.qualcomm.com;epfl.ch", "position": "Intern;PhD student", "bibtex": "@inproceedings{\nromero2021group,\ntitle={Group Equivariant Stand-Alone Self-Attention For Vision},\nauthor={David W. Romero and Jean-Baptiste Cordonnier},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=JkfYjnOEo6M}\n}", "github": "[![github](/images/github_icon.svg) dwromero/g_selfatt](https://github.com/dwromero/g_selfatt)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "5;3;4;5", "wc_review": "712;640;597;365", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1346;1570;1038;428", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 578.5, 129.93171283408836 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1095.5, 429.1744983104192 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4545454545454545, "gs_citation": 67, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6833601088061308138&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=JkfYjnOEo6M", "email": "qti.qualcomm.com;epfl.ch", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Qualcomm;Swiss Federal Institute of Technology Lausanne", "aff_unique_dep": "Qualcomm AI Research;", "aff_unique_url": "https://www.qualcomm.com/research;https://www.epfl.ch", "aff_unique_abbr": "QAI;EPFL", "aff_campus_unique_index": "1", "aff_campus_unique": ";Lausanne", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Switzerland" }, { "id": "JmnFvgMSjgg", "title": "Diversity Augmented Conditional Generative Adversarial Network for Enhanced Multimodal Image-to-Image Translation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Conditional generative adversarial networks (cGANs) play an important role in multimodal image-to-image translation. We propose Diversity Augmented conditional Generative Adversarial Network (DivAugGAN), a highly effective solution to further resolve the mode collapse problem and enhance the diversity for the generated images. DivAugGAN functions as a regularizer to maximize the distinction of the generating samples when different noise vectors are injected. We also exert extra constraint on the generator to ensure the relative variation consistency in the translation process. This guarantees that the changing scale of the generated images in the image space is coherent to the difference of the injected noise vectors in the latent space. It also reduces the chances to bring about unexpected mode override and mode fusion issues. Experimental results on both two-domain and multi-domain multimodal image-to-image translation tasks demonstrate its effectiveness. DivAugGAN leads to consistent diversity augmentations and visual quality improvements for the developed models. We also achieves state-of-the-art performances on multiple datasets in terms of widely used quantitative evaluation metrics. DivAugGAN can be easily integrated into any objectives in conditional generative models as a regularizer for diversity augmentations and quality enhancements without any additional computation overheads compromise The source code and pre-trained models of our method is available at https://github.com/anomymous-gan/DivAugGAN. ", "keywords": "Conditional Generative Adversarial Network;Multimodal Image-to-Image Translation", "primary_area": "", "supplementary_material": "/attachment/d12eb0e7aaa79b0742082cadd2153945ca81a616.zip", "author": "Yunlong MENG;Lin Xu", "authorids": "~Yunlong_MENG1;~Lin_Xu2", "gender": "M;M", "homepage": ";", "dblp": "275/7843.html;40/1068", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Yunlong_MENG1;~Lin_Xu2", "aff": "Em-Data Technology;the Institute of Artificial Intelligence, Shanghai Em-Data Technology Co., Ltd.", "aff_domain": "em-data.com.cn;em-data.com.cn", "position": "Researcher;Research Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=JmnFvgMSjgg", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;4;4;3", "wc_review": "194;652;337;501", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 421.0, 172.0072672883329 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17593035864510778639&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Em-Data Technology;Shanghai Em-Data Technology Co., Ltd.", "aff_unique_dep": ";Institute of Artificial Intelligence", "aff_unique_url": ";", "aff_unique_abbr": ";", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "1", "aff_country_unique": ";China" }, { "title": "Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3367", "id": "Jnspzp-oIZE", "poster": "", "openreview": "https://openreview.net/forum?id=Jnspzp-oIZE", "slides": "https://iclr.cc/virtual/2021/poster/3367", "video": "https://iclr.cc/virtual/2021/poster/3367", "author_site": "Pim De Haan, Maurice Weiler, Taco Cohen, Max Welling", "tldr": "", "abstract": "A common approach to define convolutions on meshes is to interpret them as a graph and apply graph convolutional networks (GCNs). Such GCNs utilize isotropic kernels and are therefore insensitive to the relative orientation of vertices and thus to the geometry of the mesh as a whole. We propose Gauge Equivariant Mesh CNNs which generalize GCNs to apply anisotropic gauge equivariant kernels. Since the resulting features carry orientation information, we introduce a geometric message passing scheme defined by parallel transporting features over mesh edges. Our experiments validate the significantly improved expressivity of the proposed model over conventional GCNs and other methods.", "keywords": "symmetry;equivariance;mesh;geometric;convolution", "primary_area": "", "supplementary_material": "/attachment/34214db9419f1031f30c2ec75f64bd146bdf3c39.zip", "author": "Pim De Haan;Maurice Weiler;Taco Cohen;Max Welling", "authorids": "~Pim_De_Haan1;~Maurice_Weiler1;~Taco_Cohen1;~Max_Welling1", "gender": "M;;M;M", "homepage": "https://pimdehaan.com;https://maurice-weiler.gitlab.io/;http://www.ta.co.nl;https://staff.fnwi.uva.nl/m.welling/", "dblp": ";210/0855;142/2903;16/2286", "google_scholar": "AZeK-REAAAAJ;uQePx6EAAAAJ;a3q4YxEAAAAJ;https://scholar.google.nl/citations?user=8200InoAAAAJ", "orcid": ";;;0000-0003-1484-2121", "linkedin": "https://nl.linkedin.com/in/pim-de-haan;maurice-weiler-78b6931a6/;;", "or_profile": "~Pim_De_Haan1;~Maurice_Weiler1;~Taco_Cohen1;~Max_Welling1", "aff": "Qualcomm;University of Amsterdam;University of Amsterdam;University of Amsterdam", "aff_domain": "qualcomm.com;uva.nl;uva.nl;uva.nl", "position": "Researcher;PhD student;PhD student;Full Professor", "bibtex": "@inproceedings{\nhaan2021gauge,\ntitle={Gauge Equivariant Mesh {\\{}CNN{\\}}s: Anisotropic convolutions on geometric graphs},\nauthor={Pim De Haan and Maurice Weiler and Taco Cohen and Max Welling},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Jnspzp-oIZE}\n}", "github": "[![github](/images/github_icon.svg) qualcomm-ai-research/gauge-equivariant-mesh-cnn](https://github.com/qualcomm-ai-research/gauge-equivariant-mesh-cnn)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer5", "pdf_size": 0, "rating": "7;7;7;9", "confidence": "4;4;3;4", "wc_review": "922;606;684;287", "wc_reply_reviewers": "35;0;0;0", "wc_reply_authors": "468;382;596;44", "reply_reviewers": "1;0;0;0", "reply_authors": "3;2;2;1", "rating_avg": [ 7.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 624.75, 227.09840928549016 ], "wc_reply_reviewers_avg": [ 8.75, 15.155444566227676 ], "wc_reply_authors_avg": [ 372.5, 204.37404434027331 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 141, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17703338276692634777&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=Jnspzp-oIZE", "email": "qualcomm.com;uva.nl;uva.nl;uva.nl", "author_num": 4, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "Qualcomm Incorporated;University of Amsterdam", "aff_unique_dep": ";", "aff_unique_url": "https://www.qualcomm.com;https://www.uva.nl", "aff_unique_abbr": "Qualcomm;UvA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;1", "aff_country_unique": "United States;Netherlands" }, { "title": "ARMOURED: Adversarially Robust MOdels using Unlabeled data by REgularizing Diversity", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3369", "id": "JoCR4h9O3Ew", "poster": "", "openreview": "https://openreview.net/forum?id=JoCR4h9O3Ew", "slides": "https://iclr.cc/virtual/2021/poster/3369", "video": "https://iclr.cc/virtual/2021/poster/3369", "author_site": "Kangkang Lu, Cuong Nguyen, Xun Xu, Kiran Chari, Yu Jing Goh, Chuan-Sheng Foo", "tldr": "", "abstract": "Adversarial attacks pose a major challenge for modern deep neural networks. Recent advancements show that adversarially robust generalization requires a large amount of labeled data for training. If annotation becomes a burden, can unlabeled data help bridge the gap? In this paper, we propose ARMOURED, an adversarially robust training method based on semi-supervised learning that consists of two components. The first component applies multi-view learning to simultaneously optimize multiple independent networks and utilizes unlabeled data to enforce labeling consistency. The second component reduces adversarial transferability among the networks via diversity regularizers inspired by determinantal point processes and entropy maximization. Experimental results show that under small perturbation budgets, ARMOURED is robust against strong adaptive adversaries. Notably, ARMOURED does not rely on generating adversarial samples during training. When used in combination with adversarial training, ARMOURED yields competitive performance with the state-of-the-art adversarially-robust benchmarks on SVHN and outperforms them on CIFAR-10, while offering higher clean accuracy.", "keywords": "Adversarial Robustness;Semi-supervised Learning;Multi-view Learning;Diversity Regularization;Entropy Maximization", "primary_area": "", "supplementary_material": "", "author": "Kangkang Lu;Cuong Manh Nguyen;Xun Xu;Kiran Krishnamachari;Yu Jing Goh;Chuan-Sheng Foo", "authorids": "~Kangkang_Lu1;~Cuong_Manh_Nguyen1;~Xun_Xu1;~Kiran_Krishnamachari1;~Yu_Jing_Goh1;~Chuan-Sheng_Foo1", "gender": "M;M;Not Specified;F;M;", "homepage": ";;https://alex-xun-xu.github.io/;;http://ai.stanford.edu/~csfoo;", "dblp": ";;47/3944-2;295/5280.html;73/1823;", "google_scholar": "QYkJHCYAAAAJ;;https://scholar.google.com.sg/citations?user=pi0SGQUAAAAJ;;AgbeqGkAAAAJ;", "orcid": ";0000-0002-6342-1393;;;0000-0002-4748-5792;", "linkedin": ";alfred-nguyen-2905/;;;;kiranchari", "or_profile": "~Kangkang_Lu1;~Cuong_Manh_Nguyen1;~Xun_Xu1;~Yu_Jing_Goh1;~Chuan-Sheng_Foo1;~Kiran_Chari1", "aff": "A*STAR;Institute for Infocomm Research, A*STAR;A*STAR;National University of Singapore;Institute for Infocomm Research, A*STAR;National University of Singapore", "aff_domain": "a-star.edu.sg;i2r.a-star.edu.sg;i2r.a-star.edu.sg;nus.edu.sg;i2r.a-star.edu.sg;nus.edu", "position": "Senior Research Engineer;Researcher;Scientist;Undergrad student;Scientist;PhD student", "bibtex": "@inproceedings{\nlu2021armoured,\ntitle={{\\{}ARMOURED{\\}}: Adversarially Robust {\\{}MO{\\}}dels using Unlabeled data by {\\{}RE{\\}}gularizing Diversity},\nauthor={Kangkang Lu and Cuong Manh Nguyen and Xun Xu and Kiran Krishnamachari and Yu Jing Goh and Chuan-Sheng Foo},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=JoCR4h9O3Ew}}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "4;5;5;3", "wc_review": "219;1002;247;383", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "7;661;206;278", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 462.75, 317.45580401057407 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 288.0, 237.1254942008556 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3129694295178791768&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "pdf": "https://openreview.net/pdf?id=JoCR4h9O3Ew", "email": "a-star.edu.sg;i2r.a-star.edu.sg;i2r.a-star.edu.sg;nus.edu.sg;i2r.a-star.edu.sg;nus.edu", "author_num": 6, "aff_unique_index": "0;1;0;2;1;2", "aff_unique_norm": "Agency for Science, Technology and Research;Institute for Infocomm Research;National University of Singapore", "aff_unique_dep": ";;", "aff_unique_url": "https://www.a-star.edu.sg;https://www.i2r.a-star.edu.sg;https://www.nus.edu.sg", "aff_unique_abbr": "A*STAR;I2R;NUS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "Singapore" }, { "id": "Jq8JGA89sDa", "title": "Detecting Hallucinated Content in Conditional Neural Sequence Generation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural sequence models can generate highly fluent sentences but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input, which can cause a lack of trust in the model.\nTo better assess the faithfulness of the machine outputs, we propose a new task to predict whether each token in the output sequence is hallucinated conditioned on the source input, and collect new manually annotated evaluation sets for this task.\nWe also introduce a novel method for learning to model hallucination detection, based on pretrained language models fine tuned on synthetic data that includes automatically inserted hallucinations. \nExperiments on machine translation and abstract text summarization demonstrate the effectiveness of our proposed approach -- we obtain an average F1 of around 0.6 across all the benchmark datasets.\nFurthermore, we demonstrate how to use the token-level hallucination labels to define a fine-grained loss over the target sequence in the low-resource machine translation and achieve significant improvements over strong baseline methods.\nWe will also release our annotated data and code for future research.", "keywords": "conditional text generation;hallucination detection;sequence generation evaluation;neural machine translation;abstractive text summarization", "primary_area": "", "supplementary_material": "", "author": "Chunting Zhou;Jiatao Gu;Mona T. Diab;Paco Guzm\u00e1n;Luke Zettlemoyer;Marjan Ghazvininejad", "authorids": "~Chunting_Zhou1;~Jiatao_Gu1;~Mona_T._Diab1;fguzman@fb.com;~Luke_Zettlemoyer1;~Marjan_Ghazvininejad1", "gender": "F;M;F;;M;", "homepage": "https://violet-zct.github.io/;http://jiataogu.me;https://www.seas.gwu.edu/~mtdiab/;;https://www.cs.washington.edu/people/faculty/lsz/;", "dblp": "161/2679;164/5848.html;15/4305;;21/6793;", "google_scholar": "mR5W7EgAAAAJ;https://scholar.google.com.sg/citations?user=cB1mFBsAAAAJ;https://scholar.google.com.tw/citations?user=-y6SIhQAAAAJ;;https://scholar.google.com.tw/citations?user=UjpbO6IAAAAJ;", "orcid": ";;;;;", "linkedin": ";jiatao-gu-204b2672/;mona-diab-55946614/;;luke-zettlemoyer-a0109b226/;", "or_profile": "~Chunting_Zhou1;~Jiatao_Gu1;~Mona_T._Diab1;fguzman@fb.com;~Luke_Zettlemoyer1;~Marjan_Ghazvininejad1", "aff": "Language Technologies Institute, Carnegie Mellon University;Meta;George Washington University;;Meta;", "aff_domain": "cs.cmu.edu;fb.com;gwu.edu;;meta.com;", "position": "PhD student;Researcher;Professor;;Researcher;", "bibtex": "@misc{\nzhou2021detecting,\ntitle={Detecting Hallucinated Content in Conditional Neural Sequence Generation},\nauthor={Chunting Zhou and Jiatao Gu and Mona T. Diab and Paco Guzm{\\'a}n and Luke Zettlemoyer and Marjan Ghazvininejad},\nyear={2021},\nurl={https://openreview.net/forum?id=Jq8JGA89sDa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=Jq8JGA89sDa", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "4;4;4;4", "wc_review": "288;598;1031;945", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "469;401;497;580", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 715.5, 295.28503179131854 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 486.75, 64.16531383855299 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 222, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3199888748342174467&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;2;1", "aff_unique_norm": "Carnegie Mellon University;Meta;George Washington University", "aff_unique_dep": "Language Technologies Institute;Meta Platforms, Inc.;", "aff_unique_url": "https://www.cmu.edu;https://meta.com;https://www.gwu.edu", "aff_unique_abbr": "CMU;Meta;GWU", "aff_campus_unique_index": "0", "aff_campus_unique": "Pittsburgh;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "Jr8XGtK04Pw", "title": "Hippocampal representations emerge when training recurrent neural networks on a memory dependent maze navigation task", "track": "main", "status": "Reject", "tldr": "", "abstract": "Can neural networks learn goal-directed behaviour using similar strategies to the brain, by combining the relationships between the current state of the organism and the consequences of future actions? Recent work has shown that recurrent neural networks trained on goal based tasks can develop representations resembling those found in the brain, entorhinal cortex grid cells, for instance. Here we explore the evolution of the dynamics of their internal representations and compare this with experimental data. We observe that once a recurrent network is trained to learn the structure of its environment solely based on sensory prediction, an attractor based landscape forms in the network's representation, which parallels hippocampal place cells in structure and function. Next, we extend the predictive objective to include Q-learning for a reward task, where rewarding actions are dependent on delayed cue modulation. Mirroring experimental findings in hippocampus recordings in rodents performing the same task, this training paradigm causes nonlocal neural activity to sweep forward in space at decision points, anticipating the future path to a rewarded location. Moreover, prevalent choice and cue-selective neurons form in this network, again recapitulating experimental findings. Together, these results indicate that combining predictive, unsupervised learning of the structure of an environment with reinforcement learning can help understand the formation of hippocampus-like representations containing both spatial and task-relevant information.", "keywords": "recurrent neural network;place cell;hippocampus;neural dynamics", "primary_area": "", "supplementary_material": "", "author": "Justin Jude;Matthias Hennig", "authorids": "~Justin_Jude1;m.hennig@ed.ac.uk", "gender": "M;", "homepage": "http://justinjude.io;", "dblp": ";", "google_scholar": "https://scholar.google.com/citations?hl=en;", "orcid": ";", "linkedin": ";", "or_profile": "~Justin_Jude1;m.hennig@ed.ac.uk", "aff": "University of Edinburgh;", "aff_domain": "ed.ac.uk;", "position": "PhD student;", "bibtex": "@misc{\njude2021hippocampal,\ntitle={Hippocampal representations emerge when training recurrent neural networks on a memory dependent maze navigation task},\nauthor={Justin Jude and Matthias Hennig},\nyear={2021},\nurl={https://openreview.net/forum?id=Jr8XGtK04Pw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=Jr8XGtK04Pw", "pdf_size": 0, "rating": "4;5;7;7", "confidence": "4;5;4;4", "wc_review": "1191;599;378;339", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1981;1331;983;1136", "reply_reviewers": "0;0;0;0", "reply_authors": "3;2;2;2", "rating_avg": [ 5.75, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 626.75, 340.52340227949094 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1357.75, 380.3836057192791 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18353081423280821366&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "University of Edinburgh", "aff_unique_dep": "", "aff_unique_url": "https://www.ed.ac.uk", "aff_unique_abbr": "Edinburgh", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "JthLaV0RsV", "title": "Refine and Imitate: Reducing Repetition and Inconsistency in Dialogue Generation via Reinforcement Learning and Human Demonstration", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Despite the recent success of large-scale language models on various downstream NLP tasks, the repetition and inconsistency problems still persist in dialogue response generation. Previous approaches have attempted to avoid repetition by penalizing the language model's undesirable behaviors in the loss function. However, these methods focus on token-level information and can lead to incoherent responses and uninterpretable behaviors. To alleviate these issues, we propose to apply reinforcement learning to refine an MLE-based language model without user simulators, and distill sentence-level information about repetition, inconsistency and task relevance through rewards. In addition, to better accomplish the dialogue task, the model learns from human demonstration to imitate intellectual activities such as persuasion, and selects the most persuasive responses. Experiments show that our model outperforms previous state-of-the-art dialogue models on both automatic metrics and human evaluation results on a donation persuasion task, and generates more diverse, consistent and persuasive conversations according to the user feedback. We will release the code and data upon acceptance.", "keywords": "Dialogue System;Persuasion;Conversation", "primary_area": "", "supplementary_material": "/attachment/80f1878618fd63cb9ecbd4335cb02b8ef5633ef1.zip", "author": "Weiyan Shi;Yu Li;Saurav Sahay;Zhou Yu", "authorids": "~Weiyan_Shi2;~Yu_Li6;~Saurav_Sahay1;~Zhou_Yu1", "gender": "F;M;F;M", "homepage": "https://sites.google.com/ucdavis.edu/wyshi/;http://home.cc.gatech.edu/ssahay;http://www.cs.columbia.edu/~zhouyu/;https://yooli23.github.io/", "dblp": "218/5722;18/4070;83/3205;34/2997-13.html", "google_scholar": "xj666rUAAAAJ;A_Kss_UAAAAJ;https://scholar.google.com.tw/citations?user=jee2Dy0AAAAJ;gCoIftIAAAAJ", "orcid": ";;;", "linkedin": ";sauravsahay/;;yu-li-443a59104/", "or_profile": "~Weiyan_Shi2;~Saurav_Sahay1;~Zhou_Yu1;~Yu_Li12", "aff": "Columbia University;Intel;Columbia University;University of California, Davis", "aff_domain": "columbia.edu;intel.com;columbia.edu;ucdavis.edu", "position": "PhD student;Staff Scientist;Assistant Professor;PhD student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=JthLaV0RsV", "pdf_size": 0, "rating": "3;4;6", "confidence": "4;4;4", "wc_review": "310;707;191", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 1.247219128924647 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 402.6666666666667, 220.61177565024846 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:4QZEf4ldqzQJ:scholar.google.com/&scioq=Refine+and+Imitate:+Reducing+Repetition+and+Inconsistency+in+Dialogue+Generation+via+Reinforcement+Learning+and+Human+Demonstration&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "Columbia University;Intel;University of California, Davis", "aff_unique_dep": ";Intel Corporation;", "aff_unique_url": "https://www.columbia.edu;https://www.intel.com;https://www.ucdavis.edu", "aff_unique_abbr": "Columbia;Intel;UC Davis", "aff_campus_unique_index": "1", "aff_campus_unique": ";Davis", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "JvPsKam58LX", "title": "Robust Multi-Agent Reinforcement Learning Driven by Correlated Equilibrium", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper we deal with robust cooperative multi-agent reinforcement learning (CMARL). While CMARL has many potential applications, only a trained policy that is robust enough can be confidently deployed in real world. Existing works on robust MARL mainly apply vanilla adversarial training in centralized training and decentralized execution paradigm. We, however, find that if a CMARL environment contains an adversarial agent, the performance of decentralized equilibrium might perform significantly poor for achieving such adversarial robustness. To tackle this issue, we suggest that when execution the non-adversarial agents must jointly make the decision to improve the robustness, therefore solving correlated equilibrium instead. We theoretically demonstrate the superiority of correlated equilibrium over the decentralized one in adversarial MARL settings. Therefore, to achieve robust CMARL, we introduce novel strategies to encourage agents to learn correlated equilibrium while maximally preserving the convenience of the decentralized execution. The global variables with mutual information are proposed to help agents learn robust policies with MARL algorithms. The experimental results show that our method can dramatically boost performance on the SMAC environments.", "keywords": "Robust;Multi-agent;Reinforcement Learning;Correlated Equilibrium", "primary_area": "", "supplementary_material": "", "author": "Yizheng Hu;Kun Shao;Dong Li;Jianye HAO;Wulong Liu;Yaodong Yang;Jun Wang;Zhanxing Zhu", "authorids": "~Yizheng_Hu1;~Kun_Shao1;~Dong_Li10;~Jianye_HAO1;~Wulong_Liu1;~Yaodong_Yang1;~Jun_Wang2;~Zhanxing_Zhu1", "gender": ";;M;M;M;M;M;M", "homepage": "https://huyz-git.github.io/;;;http://www.icdai.org/jianye.html;;https://www.yangyaodong.com;http://www0.cs.ucl.ac.uk/staff/jun.wang/;https://zhanxingzhu.github.io/", "dblp": ";;47/4826-16;21/7664.html;36/9257.html;170/1496-1;w/JunWang12;87/7756.html", "google_scholar": ";;;;https://scholar.google.ca/citations?user=od00FfIAAAAJ;https://scholar.google.co.uk/citations?user=6yL0xw8AAAAJ;https://scholar.google.co.uk/citations?user=wIE1tY4AAAAJ;a2sHceIAAAAJ", "orcid": ";;;0000-0002-0422-8235;;0000-0001-8132-5613;;", "linkedin": ";;;;wulong-liu-28006155/;yaodong-yang;;", "or_profile": "~Yizheng_Hu1;~Kun_Shao1;~Dong_Li10;~Jianye_HAO1;~Wulong_Liu1;~Yaodong_Yang1;~Jun_Wang2;~Zhanxing_Zhu1", "aff": "Peking University;;Huawei Technologies Ltd.;Tianjin University;Huawei Noah's Ark Lab;King's College London;University College London;Peking University", "aff_domain": "pku.edu.cn;;huawei.com;tju.edu.cn;huawei.com;kcl.ac.uk;ucl.ac.uk;pku.edu.cn", "position": "PhD student;;Principal Researcher;Associate Professor;Researcher;Assistant Professor;Professor;Assistant Professor", "bibtex": "@misc{\nhu2021robust,\ntitle={Robust Multi-Agent Reinforcement Learning Driven by Correlated Equilibrium},\nauthor={Yizheng Hu and Kun Shao and Dong Li and Jianye HAO and Wulong Liu and Yaodong Yang and Jun Wang and Zhanxing Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=JvPsKam58LX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer5;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=JvPsKam58LX", "pdf_size": 0, "rating": "3;4;4;5;6", "confidence": "4;3;3;4;3", "wc_review": "465;224;497;536;511", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "498;387;406;360;265", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 4.4, 1.0198039027185568 ], "confidence_avg": [ 3.4, 0.4898979485566356 ], "wc_review_avg": [ 446.6, 113.63907778576875 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 383.2, 75.13028683560313 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.32025630761017426, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14201224595589713963&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;1;3;4;0", "aff_unique_norm": "Peking University;Huawei;Tianjin University;King's College London;University College London", "aff_unique_dep": ";Huawei Technologies;;;", "aff_unique_url": "http://www.pku.edu.cn;https://www.huawei.com;http://www.tju.edu.cn;https://www.kcl.ac.uk;https://www.ucl.ac.uk", "aff_unique_abbr": "Peking U;Huawei;TJU;KCL;UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;1;1;0", "aff_country_unique": "China;United Kingdom" }, { "id": "JyDnXkeJpjU", "title": "Task-similarity Aware Meta-learning through Nonparametric Kernel Regression", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper investigates the use of nonparametric kernel-regression to obtain a task- similarity aware meta-learning algorithm. Our hypothesis is that the use of task- similarity helps meta-learning when the available tasks are limited and may contain outlier/ dissimilar tasks. While existing meta-learning approaches implicitly assume the tasks as being similar, it is generally unclear how this task-similarity could be quantified and used in the learning. As a result, most popular meta- learning approaches do not actively use the similarity/dissimilarity between the tasks, but rely on availability of huge number of tasks for their working. Our contribution is a novel framework for meta-learning that explicitly uses task-similarity in the form of kernels and an associated meta-learning algorithm. We model the task-specific parameters to belong to a reproducing kernel Hilbert space where the kernel function captures the similarity across tasks. The proposed algorithm iteratively learns a meta-parameter which is used to assign a task-specific descriptor for every task. The task descriptors are then used to quantify the task-similarity through the kernel function. We show how our approach conceptually generalizes the popular meta-learning approaches of model-agnostic meta-learning (MAML) and Meta-stochastic gradient descent (Meta-SGD) approaches. Numerical experiments with regression and classification tasks show that our algorithm outperforms these approaches when the number of tasks is limited, even in the presence of out- lier or dissimilar tasks. This supports our hypothesis that task-similarity helps improve the meta-learning performance in task-limited and adverse settings.", "keywords": "Task-similarity;Meta-learning;Kernel regression;Nonparametric regression;Task-descriptors", "primary_area": "", "supplementary_material": "", "author": "Arun Venkitaraman;Anders Hansson;Bo Wahlberg", "authorids": "~Arun_Venkitaraman1;anders.g.hansson@liu.se;~Bo_Wahlberg1", "gender": "M;;M", "homepage": "https://www.kth.se/profile/arunv;;https://www.kth.se/profile/bo", "dblp": "118/9327.html;;87/1451", "google_scholar": "https://scholar.google.com/citations?hl=sv;;https://scholar.google.se/citations?user=fDeSgLwAAAAJ", "orcid": ";;0000-0002-1927-1690", "linkedin": ";;", "or_profile": "~Arun_Venkitaraman1;anders.g.hansson@liu.se;~Bo_Wahlberg1", "aff": "KTH Royal Institute of Technology, Stockholm, Sweden;;KTH Royal Institute of Technology, Stockholm, Sweden", "aff_domain": "kth.se;;kth.se", "position": "Postdoc;;Full Professor", "bibtex": "@misc{\nvenkitaraman2021tasksimilarity,\ntitle={Task-similarity Aware Meta-learning through Nonparametric Kernel Regression},\nauthor={Arun Venkitaraman and Anders Hansson and Bo Wahlberg},\nyear={2021},\nurl={https://openreview.net/forum?id=JyDnXkeJpjU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=JyDnXkeJpjU", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "5;4;4;3", "wc_review": "460;519;829;375", "wc_reply_reviewers": "0;76;0;0", "wc_reply_authors": "550;757;750;316", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 545.75, 171.35835987777193 ], "wc_reply_reviewers_avg": [ 19.0, 32.90896534380867 ], "wc_reply_authors_avg": [ 593.25, 180.3626555027398 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16235838820432457909&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "KTH Royal Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kth.se", "aff_unique_abbr": "KTH", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Stockholm", "aff_country_unique_index": "0;0", "aff_country_unique": "Sweden" }, { "id": "JydXRRDoDTv", "title": "Optimistic Policy Optimization with General Function Approximations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Although policy optimization with neural networks has a track record of achieving state-of-the-art results in reinforcement learning on various domains, the theoretical understanding of the computational and sample efficiency of policy optimization remains restricted to linear function approximations with finite-dimensional feature representations, which hinders the design of principled, effective, and efficient algorithms. To this end, we propose an optimistic model-based policy optimization algorithm, which allows general function approximations while incorporating~exploration. In the episodic setting, we establish a $\\sqrt{T}$-regret that scales polynomially in the eluder dimension of the general model class. Here $T$ is the number of steps taken by the agent. In particular, we specialize such a regret to handle two nonparametric model classes; one based on reproducing kernel Hilbert spaces and another based on overparameterized neural networks.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Qi Cai;Zhuoran Yang;Csaba Szepesvari;Zhaoran Wang", "authorids": "~Qi_Cai2;~Zhuoran_Yang1;~Csaba_Szepesvari1;~Zhaoran_Wang1", "gender": "M;M;M;Not Specified", "homepage": ";https://zhuoranyang.github.io/;https://sites.ualberta.ca/~szepesva/;https://zhaoranwang.github.io/", "dblp": ";;http://dblp.uni-trier.de/pers/hd/s/Szepesv=aacute=ri:Csaba;117/2756", "google_scholar": "FX6bV4UAAAAJ;;https://scholar.google.ca/citations?user=zvC19mQAAAAJ;https://scholar.google.com.tw/citations?user=HSx0BgQAAAAJ", "orcid": ";;;", "linkedin": ";;csaba-szepesvari-09376b1?trk=hp-identity-name;", "or_profile": "~Qi_Cai2;~Zhuoran_Yang1;~Csaba_Szepesvari1;~Zhaoran_Wang1", "aff": "Northwestern University;University of California, Berkeley;Google DeepMind;", "aff_domain": "u.northwestern.edu;berkeley.edu;google.com;", "position": "PhD student;Postdoc;Research Scientist;", "bibtex": "@misc{\ncai2021optimistic,\ntitle={Optimistic Policy Optimization with General Function Approximations},\nauthor={Qi Cai and Zhuoran Yang and Csaba Szepesvari and Zhaoran Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=JydXRRDoDTv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=JydXRRDoDTv", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;4;1;4", "wc_review": "330;679;62;290", "wc_reply_reviewers": "0;466;0;0", "wc_reply_authors": "1022;3065;22;132", "reply_reviewers": "0;1;0;0", "reply_authors": "2;6;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.25, 1.299038105676658 ], "wc_review_avg": [ 340.25, 220.683455428811 ], "wc_reply_reviewers_avg": [ 116.5, 201.7839190817742 ], "wc_reply_authors_avg": [ 1060.25, 1220.6654691192014 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.5, 2.0615528128088303 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.25819888974716115, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4791237027943816856&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Northwestern University;University of California, Berkeley;Google", "aff_unique_dep": ";;Google DeepMind", "aff_unique_url": "https://www.northwestern.edu;https://www.berkeley.edu;https://deepmind.com", "aff_unique_abbr": "NU;UC Berkeley;DeepMind", "aff_campus_unique_index": "1", "aff_campus_unique": ";Berkeley", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;United Kingdom" }, { "id": "JywMsiz_NtO", "title": "Enforcing Predictive Invariance across Structured Biomedical Domains", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many biochemical applications such as molecular property prediction require models to generalize beyond their training domains (environments). Moreover, natural environments in these tasks are structured, defined by complex descriptors such as molecular scaffolds or protein families. Therefore, most environments are either never seen during training, or contain only a single training example. To address these challenges, we propose a new regret minimization (RGM) algorithm and its extension for structured environments. RGM builds from invariant risk minimization (IRM) by recasting simultaneous optimality condition in terms of predictive regret, finding a representation that enables the predictor to compete against an oracle with hindsight access to held-out environments. The structured extension adaptively highlights variation due to complex environments via specialized domain perturbations. We evaluate our method on multiple applications: molecular property prediction, protein homology and stability prediction and show that RGM significantly outperforms previous state-of-the-art baselines.", "keywords": "Domain Generalization;Molecular Property Prediction", "primary_area": "", "supplementary_material": "/attachment/e94dba143236ece3608204d0097710106f0b1c42.zip", "author": "Wengong Jin;Regina Barzilay;Tommi S. Jaakkola", "authorids": "~Wengong_Jin1;~Regina_Barzilay1;~Tommi_S._Jaakkola1", "gender": ";female;", "homepage": "http://people.csail.mit.edu/wengong;https://www.regina.csail.mit.edu/;", "dblp": "173/6620;b/ReginaBarzilay;", "google_scholar": "IE5D8_QAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Wengong_Jin1;~Regina_Barzilay1;~Tommi_S._Jaakkola1", "aff": "Massachusetts Institute of Technology;Massachusetts Institute of Technology;", "aff_domain": "mit.edu;mit.edu;", "position": "PhD student;Professor;", "bibtex": "@misc{\njin2021enforcing,\ntitle={Enforcing Predictive Invariance across Structured Biomedical Domains},\nauthor={Wengong Jin and Regina Barzilay and Tommi S. Jaakkola},\nyear={2021},\nurl={https://openreview.net/forum?id=JywMsiz_NtO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=JywMsiz_NtO", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "3;4;3;4", "wc_review": "531;342;276;303", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "645;525;438;460", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 363.0, 99.79228427087938 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 517.0, 80.5263931888173 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.7071067811865476, "gs_citation": 7, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11036953564573018348&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Massachusetts Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://web.mit.edu", "aff_unique_abbr": "MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "JzG0n48hRf", "title": "Uncertainty for deep image classifiers on out of distribution data.", "track": "main", "status": "Reject", "tldr": "", "abstract": "In addition to achieving high accuracy, in many applications, it is important to estimate the probability that a model prediction is correct. Predictive uncertainty is particularly important on out of distribution (OOD) data where accuracy degrades. However, models are typically overconfident, and model calibration on OOD data remains a challenge. In this paper we propose a simple post hoc calibration method that significantly improves on benchmark results [Ovadia et al 2019] on a wide range of corrupted data. Our method uses outlier exposure to properly calibrate the model probabilities.", "keywords": "uncertainty;confidence;out of distribution;outlier exposure;classification", "primary_area": "", "supplementary_material": "/attachment/c0b07dd632bdef11b4f36518930c9bf8d591ad2f.zip", "author": "Tiago Salvador;Alexander Iannantuono;Adam M Oberman", "authorids": "tiago.saldanhasalvador@mail.mcgill.ca;alexander.iannantuono@mail.mcgill.ca;~Adam_M_Oberman1", "gender": ";;M", "homepage": ";;https://www.adamoberman.net/", "dblp": ";;31/8186", "google_scholar": ";;https://scholar.google.ca/citations?user=LPAZlL8AAAAJ", "orcid": ";;", "linkedin": ";;adam-oberman-527348107/", "or_profile": "tiago.saldanhasalvador@mail.mcgill.ca;alexander.iannantuono@mail.mcgill.ca;~Adam_M_Oberman1", "aff": ";;McGill University", "aff_domain": ";;mcgill.ca", "position": ";;Full Professor", "bibtex": "@misc{\nsalvador2021uncertainty,\ntitle={Uncertainty for deep image classifiers on out of distribution data. },\nauthor={Tiago Salvador and Alexander Iannantuono and Adam M Oberman},\nyear={2021},\nurl={https://openreview.net/forum?id=JzG0n48hRf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=JzG0n48hRf", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;4;4;3", "wc_review": "377;402;650;449", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "438;674;887;505", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 469.5, 107.36968846001184 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 626.0, 173.50072045959925 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:xZSKFq53g0kJ:scholar.google.com/&scioq=Uncertainty+for+deep+image+classifiers+on+out+of+distribution+data.&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "McGill University", "aff_unique_dep": "", "aff_unique_url": "https://www.mcgill.ca", "aff_unique_abbr": "McGill", "aff_country_unique_index": "0", "aff_country_unique": "Canada" }, { "id": "K398CuAKVKB", "title": "Removing Dimensional Restrictions on Complex/Hyper-complex Convolutions", "track": "main", "status": "Reject", "tldr": "", "abstract": "It has been shown that the core reasons that complex and hypercomplex valued neural networks offer improvements over their real-valued counterparts is the fact that aspects of their algebra forces treating multi-dimensional data as a single entity (forced local relationship encoding) with an added benefit of reducing parameter count via weight sharing. However, both are constrained to a set number of dimensions, two for complex and four for quaternions. These observations motivate us to introduce novel vector map convolutions which capture both of these properties provided by complex/hypercomplex convolutions, while dropping the unnatural dimensionality constraints their algebra imposes. This is achieved by introducing a system that mimics the unique linear combination of input dimensions via the Hamilton product using a permutation function, as well as batch normalization and weight initialization for the system. We perform three experiments using three different network architectures to show that these novel vector map convolutions seem to capture all the benefits of complex and hyper-complex networks, such as their ability to capture internal latent relations, while avoiding the dimensionality restriction.", "keywords": "CNNs;complex;hypercomplex", "primary_area": "", "supplementary_material": "", "author": "Chase John Gaudet;Anthony S. Maida", "authorids": "~Chase_John_Gaudet1;~Anthony_S._Maida1", "gender": "M;M", "homepage": ";https://people.cmix.louisiana.edu/maida/", "dblp": ";98/4656", "google_scholar": "SLqfgB8AAAAJ;yCwFNtMAAAAJ", "orcid": ";0000-0003-2586-2865", "linkedin": ";", "or_profile": "~Chase_John_Gaudet1;~Anthony_S._Maida1", "aff": "University of Louisiana at Lafayette;University of Louisiana at Lafeyette", "aff_domain": "louisiana.edu;louisiana.edu", "position": "PhD student;Associate Professor", "bibtex": "@misc{\ngaudet2021removing,\ntitle={Removing Dimensional Restrictions on Complex/Hyper-complex Convolutions},\nauthor={Chase John Gaudet and Anthony S. Maida},\nyear={2021},\nurl={https://openreview.net/forum?id=K398CuAKVKB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=K398CuAKVKB", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;2;5;4", "wc_review": "574;415;150;516", "wc_reply_reviewers": "129;0;0;0", "wc_reply_authors": "303;178;150;318", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 413.75, 162.55825878742672 ], "wc_reply_reviewers_avg": [ 32.25, 55.858638544096294 ], "wc_reply_authors_avg": [ 237.25, 74.10592081608594 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.48420012470625223, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5993607476565777&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Louisiana at Lafayette", "aff_unique_dep": "", "aff_unique_url": "https://www.louisiana.edu", "aff_unique_abbr": "ULL", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Lafayette", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "K3qa-sMHpQX", "title": "ForceNet: A Graph Neural Network for Large-Scale Quantum Chemistry Simulation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Machine Learning (ML) has a potential to dramatically accelerate large-scale physics-based simulations. However, practical models for real large-scale and complex problems remain out of reach. Here we present ForceNet, a model for accurate and fast quantum chemistry simulations to accelerate catalyst discovery for renewable energy applications. ForceNet is a graph neural network that uses surrounding 3D molecular structure to estimate per-atom forces---a central capability for performing atomic simulations. The key challenge is to accurately capture highly complex and non-linear quantum interactions of atoms in 3D space, on which forces are dependent. To this end, ForceNet adopts (1) expressive message passing architecture, (2) appropriate choice of basis and non-linear activation functions, and (3) model scaling in terms of network depth and width. We show ForceNet reduces the estimation error of atomic forces by 30% compared to existing ML models, and generalizes well to out-of-distribution structures. Finally, we apply ForceNet to the large-scale catalyst dataset, OC20. We use ForceNet to perform quantum chemistry simulations, where ForceNet is able to achieve 4x higher success rate than existing ML models. Overall, we demonstrate the potential for ML-based simulations to achieve practical usefulness while being orders of magnitude faster than physics-based simulations.", "keywords": "Graph Neural Networks;Physical simulation;Quantum chemistry;Catalysis", "primary_area": "", "supplementary_material": "/attachment/0a811f8cfd41f5175d54fb12be1216bcbe130375.zip", "author": "Weihua Hu;Muhammed Shuaibi;Abhishek Das;Siddharth Goyal;Anuroop Sriram;Jure Leskovec;Devi Parikh;Larry Zitnick", "authorids": "~Weihua_Hu1;~Muhammed_Shuaibi1;~Abhishek_Das1;~Siddharth_Goyal2;~Anuroop_Sriram1;~Jure_Leskovec1;~Devi_Parikh1;~Larry_Zitnick1", "gender": "M;M;M;M;M;;F;", "homepage": "http://web.stanford.edu/~weihuahu/;https://mshuaibii.github.io/;https://abhishekdas.com/;;https://anuroopsriram.com;http://cs.stanford.edu/~jure/;https://www.cc.gatech.edu/~parikh/;http://larryzitnick.org/", "dblp": "42/1232;;40/5262;;200/7951;l/JureLeskovec;64/2121;10/6888", "google_scholar": "wAFMjfkAAAAJ;lphfYeIAAAAJ;t6exkOAAAAAJ;vxjELqYAAAAJ;D4uRc_UAAAAJ;Q_kKkIUAAAAJ;ijpYJQwAAAAJ;ZeJjFQMAAAAJ", "orcid": ";;;;;0000-0002-5411-923X;;", "linkedin": "weihua-hu-a8284228/;mshuaibii/;;;anuroopsriram/;leskovec/;;", "or_profile": "~Weihua_Hu1;~Muhammed_Shuaibi1;~Abhishek_Das1;~Siddharth_Goyal2;~Anuroop_Sriram1;~Jure_Leskovec1;~Devi_Parikh1;~Larry_Zitnick1", "aff": "Stanford University;Carnegie Mellon University;Facebook AI Research (FAIR);;Meta Facebook;;FAIR, Meta;Meta", "aff_domain": "stanford.edu;cmu.edu;fb.com;;facebook.com;;fb.com;meta.com", "position": "PhD student;PhD student;Research Scientist;;Researcher;;Research Scientist;Researcher", "bibtex": "@misc{\nhu2021forcenet,\ntitle={ForceNet: A Graph Neural Network for Large-Scale Quantum Chemistry Simulation},\nauthor={Weihua Hu and Muhammed Shuaibi and Abhishek Das and Siddharth Goyal and Anuroop Sriram and Jure Leskovec and Devi Parikh and Larry Zitnick},\nyear={2021},\nurl={https://openreview.net/forum?id=K3qa-sMHpQX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=K3qa-sMHpQX", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "5;4;4;5", "wc_review": "536;322;201;479", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "557;284;279;400", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 384.5, 131.77727421676317 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 380.0, 113.07740711565684 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7584550451927177975&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;2;2;2", "aff_unique_norm": "Stanford University;Carnegie Mellon University;Meta", "aff_unique_dep": ";;Facebook AI Research", "aff_unique_url": "https://www.stanford.edu;https://www.cmu.edu;https://research.facebook.com", "aff_unique_abbr": "Stanford;CMU;FAIR", "aff_campus_unique_index": "0", "aff_campus_unique": "Stanford;", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "K4wkUp5xNK", "title": "Invariant Causal Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Due to spurious correlations, machine learning systems often fail to generalize to environments whose distributions differ from the ones used at training time. Prior work addressing this, either explicitly or implicitly, attempted to find a data representation that has an invariant causal relationship with the outcome. This is done by leveraging a diverse set of training environments to reduce the effect of spurious features, on top of which an invariant classifier is then built. However, these methods have generalization guarantees only when both data representation and classifiers come from a linear model class. As an alternative, we propose Invariant Causal Representation Learning (ICRL), a learning paradigm that enables out-of-distribution generalization in the nonlinear setting (i.e., nonlinear representations and nonlinear classifiers). It builds upon a practical and general assumption: data representations factorize when conditioning on the outcome and the environment. Based on this, we show identifiability up to a permutation and pointwise transformation. We also prove that all direct causes of the outcome can be fully discovered, which further enables us to obtain generalization guarantees in the nonlinear setting. Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Chaochao Lu;Yuhuai Wu;Jos\u00e9 Miguel Hern\u00e1ndez-Lobato;Bernhard Sch\u00f6lkopf", "authorids": "~Chaochao_Lu1;~Yuhuai_Wu1;~Jos\u00e9_Miguel_Hern\u00e1ndez-Lobato1;~Bernhard_Sch\u00f6lkopf1", "gender": ";M;;", "homepage": "https://causallu.com/;http://www.cs.toronto.edu/~ywu/;;", "dblp": "142/2790;;;", "google_scholar": "C_Qxt0IAAAAJ;https://scholar.google.ca/citations?user=bOQGfFIAAAAJ;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Chaochao_Lu1;~Yuhuai_Wu1;~Jos\u00e9_Miguel_Hern\u00e1ndez-Lobato1;~Bernhard_Sch\u00f6lkopf1", "aff": "University of Cambridge;Department of Computer Science, University of Toronto;;", "aff_domain": "cam.ac.uk;cs.toronto.edu;;", "position": "PhD student;PhD student;;", "bibtex": "@misc{\nlu2021invariant,\ntitle={Invariant Causal Representation Learning},\nauthor={Chaochao Lu and Yuhuai Wu and Jos{\\'e} Miguel Hern{\\'a}ndez-Lobato and Bernhard Sch{\\\"o}lkopf},\nyear={2021},\nurl={https://openreview.net/forum?id=K4wkUp5xNK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=K4wkUp5xNK", "pdf_size": 0, "rating": "4;4;5", "confidence": "4;4;4", "wc_review": "332;285;487", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "594;329;631", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 368.0, 86.30565836992767 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 518.0, 134.494113873681 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1", "aff_unique_norm": "University of Cambridge;University of Toronto", "aff_unique_dep": ";Department of Computer Science", "aff_unique_url": "https://www.cam.ac.uk;https://www.utoronto.ca", "aff_unique_abbr": "Cambridge;U of T", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Cambridge;Toronto", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;Canada" }, { "title": "Tilted Empirical Risk Minimization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2679", "id": "K5YasWXZT3O", "poster": "", "openreview": "https://openreview.net/forum?id=K5YasWXZT3O", "slides": "https://iclr.cc/virtual/2021/poster/2679", "video": "https://iclr.cc/virtual/2021/poster/2679", "author_site": "Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith", "tldr": "", "abstract": "Empirical risk minimization (ERM) is typically designed to perform well on the average loss, which can result in estimators that are sensitive to outliers, generalize poorly, or treat subgroups unfairly. While many methods aim to address these problems individually, in this work, we explore them through a unified framework---tilted empirical risk minimization (TERM). In particular, we show that it is possible to flexibly tune the impact of individual losses through a straightforward extension to ERM using a hyperparameter called the tilt. We provide several interpretations of the resulting framework: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to a superquantile method. We develop batch and stochastic first-order optimization methods for solving TERM, and show that the problem can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. TERM is not only competitive with existing solutions tailored to these individual problems, but can also enable entirely new applications, such as simultaneously addressing outliers and promoting fairness.", "keywords": "exponential tilting;models of learning and generalization;label noise robustness;fairness", "primary_area": "", "supplementary_material": "", "author": "Tian Li;Ahmad Beirami;Maziar Sanjabi;Virginia Smith", "authorids": "~Tian_Li1;~Ahmad_Beirami1;~Maziar_Sanjabi1;~Virginia_Smith1", "gender": ";M;M;F", "homepage": "https://litian96.github.io/;https://beirami.github.io/;https://sites.google.com/view/maziar;", "dblp": "91/7844-5;41/9367;21/8577;120/0921", "google_scholar": "https://scholar.google.com/citations?hl=en;VuKWbMMAAAAJ;bc_N2-oAAAAJ;", "orcid": ";;;", "linkedin": ";ahmad-beirami-97001962;;", "or_profile": "~Tian_Li1;~Ahmad_Beirami1;~Maziar_Sanjabi1;~Virginia_Smith1", "aff": "Carnegie Mellon University;Facebook AI;Meta;Carnegie Mellon University", "aff_domain": "cmu.edu;fb.com;meta.com;cmu.edu", "position": "PhD student;Research Scientist;Researcher;Associate Professor", "bibtex": "@inproceedings{\nli2021tilted,\ntitle={Tilted Empirical Risk Minimization},\nauthor={Tian Li and Ahmad Beirami and Maziar Sanjabi and Virginia Smith},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=K5YasWXZT3O}\n}", "github": "[![github](/images/github_icon.svg) litian96/TERM](https://github.com/litian96/TERM) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=K5YasWXZT3O)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "3;3;3;4", "wc_review": "361;202;270;937", "wc_reply_reviewers": "82;0;0;0", "wc_reply_authors": "502;188;587;748", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 442.5, 291.0193292549483 ], "wc_reply_reviewers_avg": [ 20.5, 35.50704155516198 ], "wc_reply_authors_avg": [ 506.25, 203.87787398342175 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 167, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13273330371410515607&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=K5YasWXZT3O", "email": "cmu.edu;fb.com;meta.com;cmu.edu", "author_num": 4, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "Carnegie Mellon University;Meta", "aff_unique_dep": ";Facebook AI", "aff_unique_url": "https://www.cmu.edu;https://www.facebook.com", "aff_unique_abbr": "CMU;Facebook AI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "K5a_QFEUzA1", "title": "Cross-model Back-translated Distillation for Unsupervised Machine Translation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent unsupervised machine translation (UMT) systems usually employ three main principles: initialization, language modeling and iterative back-translation, though they may apply them differently. Crucially, iterative back-translation and denoising auto-encoding for language modeling provide data diversity to train the UMT systems. However, these diversification processes may have reached their limit. We introduce a novel component to the standard UMT framework called Cross-model Back-translated Distillation (CBD), that is aimed to induce another level of data diversification that existing principles lack. CBD is applicable to all previous UMT approaches. In our experiments, it boosts the performance of the standard UMT methods by 1.5-2.0 BLEU. In particular, in WMT'14 English-French, WMT'16 German-English and English-Romanian, CBD outperforms cross-lingual masked language model (XLM) by 2.3, 2.2 and 1.6 BLEU, respectively. It also yields 1.5-3.3 BLEU improvements in IWSLT English-French and English-German tasks. Through extensive experimental analyses, we show that CBD is effective because it embraces data diversity while other similar variants do not.", "keywords": "unsupervised machine translation;NMT;machine translation", "primary_area": "", "supplementary_material": "/attachment/408838f51c2d56185d95452e8b4a4ba4a543b549.zip", "author": "Phi Xuan Nguyen;Shafiq Joty;Kui Wu;AiTi Aw", "authorids": "~Phi_Xuan_Nguyen1;~Shafiq_Joty1;wuk@i2r.a-star.edu.sg;~AiTi_Aw1", "gender": "M;M;;", "homepage": "https://nxphi47.github.io/;https://raihanjoty.github.io/;;", "dblp": "252/5270;62/2078;;", "google_scholar": "HN8VxX4AAAAJ;hR249csAAAAJ;;", "orcid": ";;;", "linkedin": "xuanphinguyen/;;;", "or_profile": "~Phi_Xuan_Nguyen1;~Shafiq_Joty1;wuk@i2r.a-star.edu.sg;~AiTi_Aw1", "aff": "Nanyang Technological University;Nanyang Technological University;;", "aff_domain": "ntu.edu.sg;ntu.edu.sg;;", "position": "PhD student;Assistant Professor;;", "bibtex": "@misc{\nnguyen2021crossmodel,\ntitle={Cross-model Back-translated Distillation for Unsupervised Machine Translation},\nauthor={Phi Xuan Nguyen and Shafiq Joty and Kui Wu and AiTi Aw},\nyear={2021},\nurl={https://openreview.net/forum?id=K5a_QFEUzA1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=K5a_QFEUzA1", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "5;3;5;4", "wc_review": "658;187;434;477", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 439.0, 168.02827143073276 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.0909090909090909, "gs_citation": 17, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12269896059746732525&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0", "aff_unique_norm": "Nanyang Technological University", "aff_unique_dep": "", "aff_unique_url": "https://www.ntu.edu.sg", "aff_unique_abbr": "NTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Singapore" }, { "title": "Disambiguating Symbolic Expressions in Informal Documents", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2931", "id": "K5j7D81ABvt", "poster": "", "openreview": "https://openreview.net/forum?id=K5j7D81ABvt", "slides": "https://iclr.cc/virtual/2021/poster/2931", "video": "https://iclr.cc/virtual/2021/poster/2931", "author_site": "Dennis M\u00fcller, Cezary Kaliszyk", "tldr": "", "abstract": "We propose the task of \\emph{disambiguating} symbolic expressions in informal STEM documents in the form of \\LaTeX files -- that is, determining their precise semantics and abstract syntax tree -- as a neural machine translation task. We discuss the distinct challenges involved and present a dataset with roughly 33,000 entries. We evaluated several baseline models on this dataset, which failed to yield even syntactically valid \\LaTeX before overfitting. Consequently, we describe a methodology using a \\emph{transformer} language model pre-trained on sources obtained from \\url{arxiv.org}, which yields promising results despite the small size of the dataset. We evaluate our model using a plurality of dedicated techniques, taking syntax and semantics of symbolic expressions into account.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/a764a6e83929e29f550cd8a215e908c069402a8d.zip", "author": "Dennis M\u00fcller;Cezary Kaliszyk", "authorids": "~Dennis_M\u00fcller1;~Cezary_Kaliszyk1", "gender": "M;M", "homepage": "https://kwarc.info/people/dmueller/;https://ckaliszyk.github.io/", "dblp": "92/2576-1.html;30/5217.html", "google_scholar": ";bp68Q1kAAAAJ", "orcid": "0000-0002-4482-4912;0000-0002-8273-6059", "linkedin": ";cezary-kaliszyk-801884/", "or_profile": "~Dennis_M\u00fcller1;~Cezary_Kaliszyk1", "aff": "Friedrich-Alexander University Erlangen-N\u00fcrnberg;University of Warsaw", "aff_domain": "fau.de;mimuw.edu.pl", "position": "Postdoc;Adjunct Professor", "bibtex": "@inproceedings{\nm{\\\"u}ller2021disambiguating,\ntitle={Disambiguating Symbolic Expressions in Informal Documents},\nauthor={Dennis M{\\\"u}ller and Cezary Kaliszyk},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=K5j7D81ABvt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "4;6;7;8", "confidence": "3;3;4;4", "wc_review": "1456;301;540;361", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "807;370;215;265", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 1.479019945774904 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 664.5, 465.3538975876317 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 414.25, 233.55232283152313 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.8451542547285166, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=481176685290279829&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 12, "pdf": "https://openreview.net/pdf?id=K5j7D81ABvt", "email": "fau.de;mimuw.edu.pl", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Friedrich-Alexander University Erlangen-N\u00fcrnberg;University of Warsaw", "aff_unique_dep": ";", "aff_unique_url": "https://www fau.de;https://www.uw.edu.pl", "aff_unique_abbr": "FAU;UW", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Germany;Poland" }, { "id": "K6YbHUIWHOy", "title": "Memory Augmented Design of Graph Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "The expressive power of graph neural networks (GNN) has drawn much interest recently. Most existent work focused on measuring the expressiveness of GNN through the task of distinguishing between graphs. In this paper, we inspect the representation limits of locally unordered messaging passing (LUMP) GNN architecture through the lens of \\emph{node classification}. For GNNs based on permutation invariant local aggregators, we characterize graph-theoretic conditions under which such GNNs fail to discriminate simple instances, regardless of underlying architecture or network depth. To overcome this limitation, we propose a novel framework to augment GNNs with global graph information called \\emph{memory augmentation}. Specifically, we allow every node in the original graph to interact with a group of memory nodes. For each node, information from all the other nodes in the graph can be gleaned through the relay of the memory nodes. For proper backbone architectures like GAT and GCN, memory augmented GNNs are theoretically shown to be more expressive than LUMP GNNs. Empirical evaluations demonstrate the significant improvement of memory augmentation. In particular, memory augmented GAT and GCN are shown to either outperform or closely match state-of-the-art performance across various benchmark datasets. ", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/a22532596c2ed5e501e525304fa192cb7d0fe93e.zip", "author": "Tao Xiong;Liang Zhu;Ruofan Wu;Yuan Qi", "authorids": "~Tao_Xiong3;tailiang.zl@antgroup.com;~Ruofan_Wu1;yuan.qi@antgroup.com", "gender": ";;M;", "homepage": ";;https://rorschach1989.github.io/;", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Tao_Xiong3;tailiang.zl@antgroup.com;~Ruofan_Wu1;yuan.qi@antgroup.com", "aff": ";;Ant Group;", "aff_domain": ";;antgroup.com;", "position": ";;Researcher;", "bibtex": "@misc{\nxiong2021memory,\ntitle={Memory Augmented Design of Graph Neural Networks},\nauthor={Tao Xiong and Liang Zhu and Ruofan Wu and Yuan Qi},\nyear={2021},\nurl={https://openreview.net/forum?id=K6YbHUIWHOy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=K6YbHUIWHOy", "pdf_size": 0, "rating": "3;5;5;5", "confidence": "4;4;4;4", "wc_review": "290;357;379;421", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "105;293;300;298", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 361.75, 47.378133141777546 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 249.0, 83.17752100177067 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=72004562215694054&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Ant Group", "aff_unique_dep": "", "aff_unique_url": "https://www.antgroup.com", "aff_unique_abbr": "Ant Group", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "title": "Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3174", "id": "K9bw7vqp_s", "poster": "", "openreview": "https://openreview.net/forum?id=K9bw7vqp_s", "slides": "https://iclr.cc/virtual/2021/poster/3174", "video": "https://iclr.cc/virtual/2021/poster/3174", "author_site": "Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li", "tldr": "", "abstract": "Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. It can be generally categorized into unstructured fine-grained sparsity that zeroes out multiple individual weights distributed across the neural network, and structured coarse-grained sparsity which prunes blocks of sub-networks of a neural network. Fine-grained sparsity can achieve a high compression ratio but is not hardware friendly and hence receives limited speed gains. On the other hand, coarse-grained sparsity cannot simultaneously achieve both apparent acceleration on modern GPUs and\ndecent performance. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network, which can maintain the advantages of both unstructured fine-grained sparsity and structured coarse-grained sparsity simultaneously on specifically designed GPUs. Specifically, a 2 : 4 sparse network could achieve 2\u00d7 speed-up without performance drop on Nvidia A100 GPUs. Furthermore, we propose a novel and effective ingredient, sparse-refined straight-through estimator (SR-STE), to alleviate the negative influence of the approximated gradients computed by vanilla STE during optimization. We also define a metric, Sparse Architecture Divergence (SAD), to measure the sparse network\u2019s topology change during the training process. Finally, We justify SR-STE\u2019s advantages with SAD and demonstrate the effectiveness of SR-STE by performing\ncomprehensive experiments on various tasks. Anonymous code and model will be at available at https://github.com/anonymous-NM-sparsity/NM-sparsity.", "keywords": "sparsity;efficient training and inference.", "primary_area": "", "supplementary_material": "", "author": "Aojun Zhou;Yukun Ma;Junnan Zhu;Jianbo Liu;Zhijie Zhang;Kun Yuan;Wenxiu Sun;Hongsheng Li", "authorids": "~Aojun_Zhou2;~Yukun_Ma2;junnan.zhu@nlpr.ia.ac.cn;~Jianbo_Liu3;~Zhijie_Zhang1;~Kun_Yuan1;~Wenxiu_Sun1;~Hongsheng_Li3", "gender": ";M;;M;M;M;F;M", "homepage": ";;;;;https://yoookoo.github.io/yuankun/;http://wenxiusun.com/;http://www.ee.cuhk.edu.hk/~hsli", "dblp": ";;;91/5164;;;16/9879;27/7402-1", "google_scholar": ";tfwxYiAAAAAJ;;;;fCeZ32EAAAAJ;X9lE6O4AAAAJ;BN2Ze-QAAAAJ", "orcid": ";;;;;;;", "linkedin": ";qwertier;;;http://linkedin.com/in/zhijie-zhang-763824171;;;", "or_profile": "~Aojun_Zhou2;~Yukun_Ma2;junnan.zhu@nlpr.ia.ac.cn;~Jianbo_Liu3;~Zhijie_Zhang1;~Kun_Yuan1;~Wenxiu_Sun1;~Hongsheng_Li3", "aff": ";;;The Chinese University of Hong Kong;;SenseTime Research;SenseTime Group Limited;The Chinese University of Hong Kong", "aff_domain": ";;;cuhk.edu.hk;;sensetime.com;sensetime.com;cuhk.edu.hk", "position": ";;;PhD student;;Researcher;Principal Researcher;Assistant Professor", "bibtex": "@inproceedings{\nzhou2021learning,\ntitle={Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch},\nauthor={Aojun Zhou and Yukun Ma and Junnan Zhu and Jianbo Liu and Zhijie Zhang and Kun Yuan and Wenxiu Sun and Hongsheng Li},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=K9bw7vqp_s}\n}", "github": "[![github](/images/github_icon.svg) anonymous-NM-sparsity/NM-sparsity](https://github.com/anonymous-NM-sparsity/NM-sparsity) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=K9bw7vqp_s)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "5;3;3;4", "wc_review": "238;254;280;469", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "988;900;489;345", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 310.25, 92.87188756561375 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 680.5, 270.17077932300526 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 296, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15803248310712339133&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=K9bw7vqp_s", "email": ";;;cuhk.edu.hk;;sensetime.com;sensetime.com;cuhk.edu.hk", "author_num": 8, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "Chinese University of Hong Kong;SenseTime;SenseTime Group Limited", "aff_unique_dep": ";SenseTime Research;", "aff_unique_url": "https://www.cuhk.edu.hk;https://www.sensetime.com;https://www.sensetime.com", "aff_unique_abbr": "CUHK;SenseTime;SenseTime", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hong Kong SAR;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "KBWK5Y92BRh", "title": "Neighborhood-Aware Neural Architecture Search", "track": "main", "status": "Reject", "tldr": "", "abstract": "Existing neural architecture search (NAS) methods often return an architecture with good search performance but generalizes poorly to the test setting. To achieve better generalization, we propose a novel neighborhood-aware NAS formulation to identify flat-minima architectures in the search space, with the assumption that flat minima generalize better than sharp minima. The phrase ``flat-minima architecture'' refers to architectures whose performance is stable under small perturbations in the architecture (\\emph{e.g.}, replacing a convolution with a skip connection). Our formulation takes the ``flatness'' of an architecture into account by aggregating the performance over the neighborhood of this architecture. We demonstrate a principled way to apply our formulation to existing search algorithms, including sampling-based algorithms and gradient-based algorithms. To facilitate the application to gradient-based algorithms, we also propose a differentiable representation for the neighborhood of architectures. Based on our formulation, we propose neighborhood-aware random search (NA-RS) and neighborhood-aware differentiable architecture search (NA-DARTS). Notably, by simply augmenting DARTS~\\cite{liu2018darts} with our formulation, NA-DARTS finds architectures that perform better or on par with those found by state-of-the-art NAS methods on established benchmarks, including CIFAR-10, CIAFR-100 and ImageNet.", "keywords": "Neural architecture search;Flat minima", "primary_area": "", "supplementary_material": "", "author": "Xiaofang Wang;Shengcao Cao;Mengtian Li;Kris M. Kitani", "authorids": "~Xiaofang_Wang1;~Shengcao_Cao1;~Mengtian_Li1;~Kris_M._Kitani1", "gender": "M;M;M;M", "homepage": "http://www.cs.cmu.edu/~xiaofan2/;https://shengcao-cao.github.io/;https://mtli.github.io/;http://www.cs.cmu.edu/~kkitani/", "dblp": ";236/4681;;42/163", "google_scholar": "YQomDVsAAAAJ;yMYTz3AAAAAJ;;yv3sH74AAAAJ", "orcid": ";;;0000-0002-9389-4060", "linkedin": ";;;", "or_profile": "~Xiaofang_Wang1;~Shengcao_Cao1;~Mengtian_Li1;~Kris_M._Kitani1", "aff": "Carnegie Mellon University;Carnegie Mellon University;School of Computer Science, Carnegie Mellon University;Carnegie Mellon University", "aff_domain": "cmu.edu;cmu.edu;cs.cmu.edu;cmu.edu", "position": "PhD student;MS student;PhD student;Associate Professor", "bibtex": "@misc{\nwang2021neighborhoodaware,\ntitle={Neighborhood-Aware Neural Architecture Search},\nauthor={Xiaofang Wang and Shengcao Cao and Mengtian Li and Kris M. Kitani},\nyear={2021},\nurl={https://openreview.net/forum?id=KBWK5Y92BRh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=KBWK5Y92BRh", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "5;3;4;4", "wc_review": "388;428;375;456", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "828;447;284;255", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 411.75, 32.158785735783 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 453.5, 228.26793467326942 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.42640143271122083, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7513641288182613033&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Pittsburgh", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "KCzRX9N8BIH", "title": "It Is Likely That Your Loss Should be a Likelihood", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many common loss functions such as mean-squared-error, cross-entropy, and reconstruction loss are unnecessarily rigid. Under a probabilistic interpretation, these common losses correspond to distributions with fixed shapes and scales. We instead argue for optimizing full likelihoods that include parameters like the normal variance and softmax temperature. Joint optimization of these ``likelihood parameters'' with model parameters can adaptively tune the scales and shapes of losses in addition to the strength of regularization. We explore and systematically evaluate how to parameterize and apply likelihood parameters for robust modeling, outlier-detection, and re-calibration. Additionally, we propose adaptively tuning $L_2$ and $L_1$ weights by fitting the scale parameters of normal and Laplace priors and introduce more flexible element-wise regularizers.", "keywords": "Adaptive Losses;Outlier Detection;Adaptive Regularization;Recalibration;Robust Modelling", "primary_area": "", "supplementary_material": "/attachment/81cd7e1d0a067d21e29d40d2e53a3ce1c0938323.zip", "author": "Mark Hamilton;Evan Shelhamer;William T. Freeman", "authorids": "~Mark_Hamilton1;~Evan_Shelhamer2;~William_T._Freeman1", "gender": "M;M;M", "homepage": "https://mhamilton.net;https://billf.mit.edu/;http://imaginarynumber.net", "dblp": "91/631;86/6650;150/6541", "google_scholar": "kgZtMGsAAAAJ;https://scholar.google.com.tw/citations?user=0zZnyMEAAAAJ;-ltRSM0AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Mark_Hamilton1;~William_T._Freeman1;~Evan_G_Shelhamer1", "aff": "Massachusetts Institute of Technology;Massachusetts Institute of Technology;Google DeepMind", "aff_domain": "mit.edu;mit.edu;deepmind.com", "position": "PhD student;Professor;Research Scientist", "bibtex": "@misc{\nhamilton2021it,\ntitle={It Is Likely That Your Loss Should be a Likelihood},\nauthor={Mark Hamilton and Evan Shelhamer and William T. Freeman},\nyear={2021},\nurl={https://openreview.net/forum?id=KCzRX9N8BIH}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=KCzRX9N8BIH", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;5;4;2", "wc_review": "562;392;709;390", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "799;671;486;205", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 513.25, 132.84083521267095 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 540.25, 223.26147786844018 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.48420012470625223, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16194926374931083632&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Massachusetts Institute of Technology;Google", "aff_unique_dep": ";Google DeepMind", "aff_unique_url": "https://web.mit.edu;https://deepmind.com", "aff_unique_abbr": "MIT;DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;United Kingdom" }, { "id": "KG4igOosnw8", "title": "Discriminative Representation Loss (DRL): A More Efficient Approach than Gradient Re-Projection in Continual Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "The use of episodic memories in continual learning has been shown to be effective in terms of alleviating catastrophic forgetting. In recent studies, several gradient-based approaches have been developed to make more efficient use of compact episodic memories, which constrain the gradients resulting from new samples with those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we reveal the relation between diversity of gradients and discriminativeness of representations, demonstrating connections between Deep Metric Learning and continual learning. Based on these findings, we propose a simple yet efficient method -- Discriminative Representation Loss (DRL) -- for continual learning. In comparison with several state-of-the-art methods, this method shows effectiveness with low computational cost on multiple benchmark experiments in the setting of online continual learning.", "keywords": "continual learning;episodic memory;GEM;experience replay;deep metric learning", "primary_area": "", "supplementary_material": "/attachment/c2bd9d746eb908d4b9ac768d0b9a7535419ab105.zip", "author": "Yu Chen;Tom Diethe;Peter Flach", "authorids": "~Yu_Chen10;~Tom_Diethe1;~Peter_Flach1", "gender": "M;M;F", "homepage": "http://www.tomdiethe.com;http://people.cs.bris.ac.uk/~flach/;", "dblp": "33/1098;https://dblp.uni-trier.de/pers/f/Flach:Peter_A=.html;", "google_scholar": "https://scholar.google.co.uk/citations?user=oWGk9c8AAAAJ;o9ggd4sAAAAJ;https://scholar.google.co.uk/citations?user=M_lKV1sAAAAJ", "orcid": "0000-0002-0776-5407;0000-0001-6857-5810;", "linkedin": "tomdiethe/;;", "or_profile": "~Tom_Diethe1;~Peter_Flach1;~YU_CHEN1", "aff": "Amazon;University of Bristol;University of Bristol", "aff_domain": "amazon.com;bristol.ac.uk;bristol.ac.uk", "position": "Principal Researcher;Full Professor;PhD student", "bibtex": "@misc{\nchen2021discriminative,\ntitle={Discriminative Representation Loss ({\\{}DRL{\\}}): A More Efficient Approach than Gradient Re-Projection in Continual Learning},\nauthor={Yu Chen and Tom Diethe and Peter Flach},\nyear={2021},\nurl={https://openreview.net/forum?id=KG4igOosnw8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=KG4igOosnw8", "pdf_size": 0, "rating": "5;6;6", "confidence": "4;3;4", "wc_review": "2386;203;439", "wc_reply_reviewers": "1628;0;190", "wc_reply_authors": "4719;462;1092", "reply_reviewers": "5;0;1", "reply_authors": "9;1;2", "rating_avg": [ 5.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 1009.3333333333334, 978.2066357484098 ], "wc_reply_reviewers_avg": [ 606.0, 726.8140523315896 ], "wc_reply_authors_avg": [ 2091.0, 1875.990938144425 ], "reply_reviewers_avg": [ 2.0, 2.160246899469287 ], "reply_authors_avg": [ 4.0, 3.559026084010437 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;1;1", "aff_unique_norm": "Amazon;University of Bristol", "aff_unique_dep": "Amazon.com, Inc.;", "aff_unique_url": "https://www.amazon.com;https://www.bristol.ac.uk", "aff_unique_abbr": "Amazon;Bristol", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1", "aff_country_unique": "United States;United Kingdom" }, { "id": "KIS8jqLp4fQ", "title": "On Dynamic Noise Influence in Differential Private Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Protecting privacy in learning while maintaining the model performance has become increasingly critical in many applications that involve sensitive data. Private Gradient Descent (PGD) is a commonly used private learning framework, which adds noise according to the Differential Privacy protocol.Recent studies show that dynamic privacy schedules of decreasing noise magnitudes can improve loss at the final iteration, and yet theoretical understandings of the effectiveness of such schedules and their connections to optimization algorithms remain limited. In this paper, we provide comprehensive analysis of noise influence in dynamic privacy schedules to answer these critical questions. We first present a dynamic noise schedule minimizing the utility upper bound of PGD, and show how the noise influence from each optimization step collectively impacts utility of the final model. Our study also reveals how impacts from dynamic noise influence change when momentum is used. We empirically show the connection exists for general non-convex losses, and the influence is greatly impacted by the loss curvature.", "keywords": "privacy;private learning;dynamic policy", "primary_area": "", "supplementary_material": "", "author": "Junyuan Hong;Zhangyang Wang;Jiayu Zhou", "authorids": "~Junyuan_Hong1;~Zhangyang_Wang1;~Jiayu_Zhou1", "gender": "M;M;M", "homepage": "https://jyhong.gitlab.io/;https://vita-group.github.io;http://jiayuzhou.github.io/", "dblp": "185/1316;119/4026;73/1353", "google_scholar": "7Cbv6doAAAAJ;pxFyKAIAAAAJ;https://scholar.google.com.tw/citations?user=yQKlLTQAAAAJ", "orcid": "0000-0002-5718-5187;;0000-0003-4336-6777", "linkedin": ";;jiayuzhou/", "or_profile": "~Junyuan_Hong1;~Zhangyang_Wang1;~Jiayu_Zhou1", "aff": "Michigan State University;University of Texas, Austin;Michigan State University", "aff_domain": "msu.edu;utexas.edu;msu.edu", "position": "PhD student;Assistant Professor;Assistant Professor", "bibtex": "@misc{\nhong2021on,\ntitle={On Dynamic Noise Influence in Differential Private Learning},\nauthor={Junyuan Hong and Zhangyang Wang and Jiayu Zhou},\nyear={2021},\nurl={https://openreview.net/forum?id=KIS8jqLp4fQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=KIS8jqLp4fQ", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "3;4;4;3", "wc_review": "316;214;494;204", "wc_reply_reviewers": "199;95;0;0", "wc_reply_authors": "949;745;739;682", "reply_reviewers": "2;1;0;0", "reply_authors": "5;2;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 307.0, 116.52038448271615 ], "wc_reply_reviewers_avg": [ 73.5, 82.1842442320911 ], "wc_reply_authors_avg": [ 778.75, 101.32219648231083 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.25, 1.6393596310755 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:4buwPxFOLU8J:scholar.google.com/&scioq=On+Dynamic+Noise+Influence+in+Differential+Private+Learning&hl=en&as_sdt=0,5", "gs_version_total": 2, "aff_unique_index": "0;1;0", "aff_unique_norm": "Michigan State University;University of Texas at Austin", "aff_unique_dep": ";", "aff_unique_url": "https://www.msu.edu;https://www.utexas.edu", "aff_unique_abbr": "MSU;UT Austin", "aff_campus_unique_index": "1", "aff_campus_unique": ";Austin", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "KIfbqntFnOc", "title": "Robust Ensembles of Neural Networks using It\u00f4 Processes", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Residual neural networks (ResNets) can be modeled as dynamical systems where the evolution of dynamical systems represents the inference in ResNets. We exploit this connection and the theory of stochastic dynamical systems to construct a novel ensemble of It\u00f4 processes as a new deep learning representation that is more robust than classical residual networks. An It\u00f4 process obtained by solving a suitably-formulated stochastic differential equation derived from a residual network has a probability density function that is not readily perturbed by small changes in the neural network\u2019s inputs. Our robust stochastic It\u00f4 ensemble of neural networks achieve an accuracy of 73.91% on the CIFAR-10 dataset against the PGD attack with \u03b5 = 2.0 under the L2 norm, while the accuracy of Madry\u2019s robustness toolbox on the same attack is 18.59%. Similarly, our stochastic It\u00f4 ensemble of neural networks achieves an accuracy of 79.66% on PGD attack with \u03b5 = 16/255 under the L\u221e norm, while the accuracy of Madry\u2019s robustness toolbox on the same attack is 18.13%. The It\u00f4 ensemble trained on ImageNet achieves an accuracy of 28.53% against PGD attacks under the L\u221e norm with \u03b5 = 16/255 and accuracy of 65.74% under the L2 norm with \u03b5 = 3.0, respectively. This significantly improves state-of-the-art accuracy of 5% and 35.16% for Madry\u2019s robustness tool against the same PGD attacks under the L\u221e and L2 norms, respectively. Further, our approach achieves these high robustness values without any explicit adversarial training or a significant loss of accuracy on benign inputs.", "keywords": "Robustness;Ito Process;Stochastic", "primary_area": "", "supplementary_material": "", "author": "Sumit Kumar Jha;Susmit Jha;Rickard Ewetz;Alvaro Velasquez", "authorids": "~Sumit_Kumar_Jha2;~Susmit_Jha1;~Rickard_Ewetz1;alvarovelasquezucf@gmail.com", "gender": ";;M;", "homepage": "http://www.sumitkumarjha.com;http://susmitjha.github.io/;https://ewetz.ece.ufl.edu/;", "dblp": "05/5046-1;;127/9041;", "google_scholar": "3kJbs98AAAAJ;https://scholar.google.com/citations?hl=en;h_RaG-8AAAAJ;", "orcid": "0000-0003-0354-2940;0000-0001-5983-9095;;", "linkedin": "sumit-jha-572a45180/;susmitjha/;;", "or_profile": "~Sumit_Kumar_Jha2;~Susmit_Jha1;~Rickard_Ewetz1;alvarovelasquezucf@gmail.com", "aff": "University of Texas, San Antonio;SRI International;University of Central Florida;", "aff_domain": "utsa.edu;sri.com;ucf.edu;", "position": "Full Professor;Principal Scientist;Assistant Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=KIfbqntFnOc", "pdf_size": 0, "rating": "1;5;6;7", "confidence": "5;3;2;2", "wc_review": "556;100;402;479", "wc_reply_reviewers": "695;0;101;0", "wc_reply_authors": "804;267;596;305", "reply_reviewers": "4;0;1;0", "reply_authors": "4;1;2;1", "rating_avg": [ 4.75, 2.277608394786075 ], "confidence_avg": [ 3.0, 1.224744871391589 ], "wc_review_avg": [ 384.25, 172.90803191292184 ], "wc_reply_reviewers_avg": [ 199.0, 289.31902806417696 ], "wc_reply_authors_avg": [ 493.0, 220.0852107707376 ], "reply_reviewers_avg": [ 1.25, 1.6393596310755 ], "reply_authors_avg": [ 2.0, 1.224744871391589 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.9858435728860858, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Arqxdd8BTh8J:scholar.google.com/&scioq=Robust+Ensembles+of+Neural+Networks+using+It%C3%B4+Processes&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Texas at San Antonio;SRI International;University of Central Florida", "aff_unique_dep": ";;", "aff_unique_url": "https://www.utsa.edu;https://www.sri.com;https://www.ucf.edu", "aff_unique_abbr": "UTSA;SRI;UCF", "aff_campus_unique_index": "0", "aff_campus_unique": "San Antonio;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2813", "id": "KJNcAkY8tY4", "poster": "", "openreview": "https://openreview.net/forum?id=KJNcAkY8tY4", "slides": "https://iclr.cc/virtual/2021/poster/2813", "video": "https://iclr.cc/virtual/2021/poster/2813", "author_site": "Thao Nguyen, Maithra Raghu, Simon Kornblith", "tldr": "", "abstract": "A key factor in the success of deep neural networks is the ability to scale models to improve performance by varying the architecture depth and width. This simple property of neural network design has resulted in highly effective architectures for a variety of tasks. Nevertheless, there is limited understanding of effects of depth and width on the learned representations. In this paper, we study this fundamental question. We begin by investigating how varying depth and width affects model hidden representations, finding a characteristic block structure in the hidden representations of larger capacity (wider or deeper) models. We demonstrate that this block structure arises when model capacity is large relative to the size of the training set, and is indicative of the underlying layers preserving and propagating the dominant principal component of their representations. This discovery has important ramifications for features learned by different models, namely, representations outside the block structure are often similar across architectures with varying widths and depths, but the block structure is unique to each model. We analyze the output predictions of different model architectures, finding that even when the overall accuracy is similar, wide and deep models exhibit distinctive error patterns and variations across classes.", "keywords": "Representation learning", "primary_area": "", "supplementary_material": "", "author": "Thao Nguyen;Maithra Raghu;Simon Kornblith", "authorids": "~Thao_Nguyen3;~Maithra_Raghu1;~Simon_Kornblith1", "gender": "F;F;M", "homepage": "https://thaonguyen19.github.io/;http://maithraraghu.com/;", "dblp": "77/2922;;220/4059", "google_scholar": "DvJG-_8AAAAJ;tiE4g64AAAAJ;1O3RPmsAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Thao_Nguyen3;~Maithra_Raghu1;~Simon_Kornblith1", "aff": "Google;Google Brain;Google", "aff_domain": "google.com;cornell.edu;google.com", "position": "AI Resident;Senior Research Scientist;Research Scientist", "bibtex": "@inproceedings{\nnguyen2021do,\ntitle={Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth},\nauthor={Thao Nguyen and Maithra Raghu and Simon Kornblith},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=KJNcAkY8tY4}\n}", "github": "[![github](/images/github_icon.svg) google-research/google-research](https://github.com/google-research/google-research/tree/master/do_wide_and_deep_networks_learn_the_same_things) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=KJNcAkY8tY4)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "3;3;3;5", "wc_review": "377;338;347;218", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "232;622;366;687", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 320.0, 60.63414879422321 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 476.75, 185.3852407825391 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.8703882797784891, "gs_citation": 342, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4021644032965421898&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=KJNcAkY8tY4", "email": "google.com;cornell.edu;google.com", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "KJSC_AsN14", "title": "Contrastive Learning with Stronger Augmentations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Representation learning has been greatly improved with the advance of contrastive learning methods with the performance being closer to their supervised learning counterparts. Those methods have greatly benefited from various data augmentations that are carefully designated to maintain their identities so that the images transformed from the same instance can still be retrieved. Although stronger augmentations could expose novel patterns of representations to improve their generalizability, directly using stronger augmentations in instance discrimination-based contrastive learning may even deteriorate the performance, because the distortions induced from the stronger augmentations could ridiculously change the image structures and thus the transformed images cannot be viewed as the same as the original ones any more. Additional efforts are needed for us to explore the role of the stronger augmentations in further pushing the performance of unsupervised learning to the fully supervised upper bound. Instead of applying the stronger augmentations directly to minimize the contrastive loss, we propose to minimize the distribution divergence between the weakly and strongly augmented images over the representation bank to supervise the retrieval of strongly augmented queries from a pool of candidates. This avoids an overoptimistic assumption that could overfit the strongly augmented queries containing distorted visual structures into the positive targets in the representation bank, while still being able to distinguish them from the negative samples by leveraging the distributions of weakly augmented counterparts. The proposed method achieves top-1 accuracy of 76.2% on ImageNet with a standard ResNet-50 architecture with a single-layer classifier fine-tuned. This is almost the same as 76.5% of top-1 accuracy with a fully supervised ResNet-50. Moreover, it outperforms the previous self-supervised and supervised methods on both the transfer learning and object detection tasks.\n", "keywords": "Contrastive learning;Self-supervised learning;Unsupervised learning;Stronger augmentations", "primary_area": "", "supplementary_material": "", "author": "Xiao Wang;Guo-Jun Qi", "authorids": "~Xiao_Wang6;~Guo-Jun_Qi1", "gender": "M;M", "homepage": "https://wang3702.github.io/;http://maple-lab.net/gqi/", "dblp": "49/67-13;41/943", "google_scholar": "AGS_dK8AAAAJ;https://scholar.google.com.tw/citations?user=Nut-uvoAAAAJ", "orcid": "0000-0003-4435-7098;0000-0003-3508-1851", "linkedin": ";", "or_profile": "~Xiao_Wang6;~Guo-Jun_Qi1", "aff": "Purdue University;Futurewei Technologies", "aff_domain": "purdue.edu;futurewei.com", "position": "PhD student;Chief AI Scientist and Technical VP", "bibtex": "@misc{\nwang2021contrastive,\ntitle={Contrastive Learning with Stronger Augmentations},\nauthor={Xiao Wang and Guo-Jun Qi},\nyear={2021},\nurl={https://openreview.net/forum?id=KJSC_AsN14}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=KJSC_AsN14", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;4;3;4", "wc_review": "185;253;355;475", "wc_reply_reviewers": "0;0;0;55", "wc_reply_authors": "193;365;486;443", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;2", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 317.0, 109.46232228488486 ], "wc_reply_reviewers_avg": [ 13.75, 23.81569860407206 ], "wc_reply_authors_avg": [ 371.75, 111.94501998749207 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.13245323570650439, "gs_citation": 281, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11350484156998561290&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1", "aff_unique_norm": "Purdue University;Futurewei Technologies", "aff_unique_dep": ";", "aff_unique_url": "https://www.purdue.edu;https://www.futurewei.com", "aff_unique_abbr": "Purdue;Futurewei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "DARTS-: Robustly Stepping out of Performance Collapse Without Indicators", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2641", "id": "KLH36ELmwIB", "poster": "", "openreview": "https://openreview.net/forum?id=KLH36ELmwIB", "slides": "https://iclr.cc/virtual/2021/poster/2641", "video": "https://iclr.cc/virtual/2021/poster/2641", "author_site": "Xiangxiang Chu, Victor Wang, Bo Zhang, Shun Lu, Xiaolin Wei, Junchi Yan", "tldr": "", "abstract": "Despite the fast development of differentiable architecture search (DARTS), it suffers from a standing instability issue regarding searching performance, which extremely limits its application. Existing robustifying methods draw clues from the outcome instead of finding out the causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal of performance collapse, and the searching should be stopped once an indicator reaches a preset threshold.\nHowever, these methods tend to easily reject good architectures if thresholds are inappropriately set, let alone the searching is intrinsically noisy. In this paper, we undertake a more subtle and direct approach to resolve the collapse. \nWe first demonstrate that skip connections with a learnable architectural coefficient can easily recover from a disadvantageous state and become dominant. We conjecture that skip connections profit too much from this privilege, hence causing the collapse for the derived model. Therefore, we propose to factor out this benefit with an auxiliary skip connection, ensuring a fairer competition for all operations. Extensive experiments on various datasets verify that our approach can substantially improve the robustness of DARTS. Our code is available at https://github.com/Meituan-AutoML/DARTS-", "keywords": "neural architecture search;DARTS stability", "primary_area": "", "supplementary_material": "/attachment/29f9cbb28ac64928a3a5cd33d1bd1ce6711a061a.zip", "author": "Xiangxiang Chu;Xiaoxing Wang;Bo Zhang;Shun Lu;Xiaolin Wei;Junchi Yan", "authorids": "~Xiangxiang_Chu1;figure1_wxx@sjtu.edu.cn;~Bo_Zhang7;~Shun_Lu1;weixiaolin02@meituan.com;~Junchi_Yan2", "gender": "M;;M;M;;", "homepage": "https://cxxgtxy.github.io/;;;https://shunlu91.github.io/;;", "dblp": "207/8002;;36/2259-46;;;", "google_scholar": "jn21pUsAAAAJ;;uUNQnu0AAAAJ;-zX83WMAAAAJ;;", "orcid": "0000-0003-2548-0605;;0000-0003-0564-617X;;;", "linkedin": ";;bo-zhang-20a86588/;;;", "or_profile": "~Xiangxiang_Chu1;figure1_wxx@sjtu.edu.cn;~Bo_Zhang7;~Shun_Lu1;weixiaolin02@meituan.com;~Junchi_Yan2", "aff": "MeiTuan;;Meituan Inc.;Institute of Computing Technology, Chinese Academy of Sciences ;;", "aff_domain": "meituan.com;;meituan.com;ucas.ac.cn;;", "position": "Senior Engineer;;Senior Software Engineer;PhD student;;", "bibtex": "@inproceedings{\nchu2021darts,\ntitle={{\\{}DARTS{\\}}-: Robustly Stepping out of Performance Collapse Without Indicators},\nauthor={Xiangxiang Chu and Xiaoxing Wang and Bo Zhang and Shun Lu and Xiaolin Wei and Junchi Yan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=KLH36ELmwIB}\n}", "github": "[![github](/images/github_icon.svg) Meituan-AutoML/DARTS-](https://github.com/Meituan-AutoML/DARTS-)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "5;3;2;5", "wc_review": "449;379;218;463", "wc_reply_reviewers": "0;0;0;19", "wc_reply_authors": "715;532;390;488", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 1.299038105676658 ], "wc_review_avg": [ 377.25, 97.29433436742347 ], "wc_reply_reviewers_avg": [ 4.75, 8.227241335952167 ], "wc_reply_authors_avg": [ 531.25, 117.88421225931825 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.5555555555555555, "gs_citation": 219, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14536849517699271582&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=KLH36ELmwIB", "email": "meituan.com;;meituan.com;ucas.ac.cn;;", "author_num": 6, "aff_unique_index": "0;1;2", "aff_unique_norm": "Meituan;Meituan Inc.;Chinese Academy of Sciences", "aff_unique_dep": ";;Institute of Computing Technology", "aff_unique_url": "https://www.meituan.com;https://www.meituan.com;http://www.ict.ac.cn", "aff_unique_abbr": "MeiTuan;Meituan;CAS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "KOtxfjpQsq", "title": "Meta-Model-Based Meta-Policy Optimization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Model-based reinforcement learning (MBRL) has been applied to meta-learning settings and has demonstrated its high sample efficiency. \nHowever, in previous MBRL for meta-learning settings, policies are optimized via rollouts that fully rely on a predictive model of an environment. \nThus, its performance in a real environment tends to degrade when the predictive model is inaccurate. \nIn this paper, we prove that performance degradation can be suppressed by using branched meta-rollouts. \nOn the basis of this theoretical analysis, we propose Meta-Model-based Meta-Policy Optimization (M3PO), in which the branched meta-rollouts are used for policy optimization. \nWe demonstrate that M3PO outperforms existing meta reinforcement learning methods in continuous-control benchmarks. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Takuya Hiraoka;Takahisa Imagawa;Voot Tangkaratt;Takayuki Osa;Takashi Onishi;Yoshimasa Tsuruoka", "authorids": "takuya-h1@nec.com;~Takahisa_Imagawa1;~Voot_Tangkaratt1;~Takayuki_Osa1;takashi.onishi@nec.com;~Yoshimasa_Tsuruoka1", "gender": ";M;M;M;;M", "homepage": ";;;;;https://www.logos.t.u-tokyo.ac.jp/~tsuruoka/", "dblp": ";;125/2327;27/1571;;18/3787", "google_scholar": ";https://scholar.google.co.jp/citations?user=MgEUW3EAAAAJ;;https://scholar.google.co.jp/citations?user=LqVev6MAAAAJ;;J2CkFngAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "takuya-h1@nec.com;~Takahisa_Imagawa1;~Voot_Tangkaratt1;~Takayuki_Osa1;takashi.onishi@nec.com;~Yoshimasa_Tsuruoka1", "aff": ";AIST;RIKEN center for Advanced Intelligence Project;Kyushu Institute of Technology, Japan;;The University of Tokyo", "aff_domain": ";aist.go.jp;riken.jp;kyutech.ac.jp;;u-tokyo.ac.jp", "position": ";Postdoc;Postdoc;Associate Professor;;Full Professor", "bibtex": "@misc{\nhiraoka2021metamodelbased,\ntitle={Meta-Model-Based Meta-Policy Optimization},\nauthor={Takuya Hiraoka and Takahisa Imagawa and Voot Tangkaratt and Takayuki Osa and Takashi Onishi and Yoshimasa Tsuruoka},\nyear={2021},\nurl={https://openreview.net/forum?id=KOtxfjpQsq}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=KOtxfjpQsq", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "3;3;3;3", "wc_review": "713;705;390;406", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "639;1035;648;434", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 553.5, 155.62856421621322 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 689.0, 217.32579230270852 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13486215565327963611&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Advanced Industrial Science and Technology;RIKEN;Kyushu Institute of Technology;University of Tokyo", "aff_unique_dep": ";center for Advanced Intelligence Project;;", "aff_unique_url": "https://www.aist.go.jp;https://www.riken.jp/en/;https://www.kyutech.ac.jp;https://www.u-tokyo.ac.jp", "aff_unique_abbr": "AIST;RIKEN;Kyutech;UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Japan" }, { "id": "KRKGJrbPcKE", "title": "Distribution Based MIL Pooling Filters are Superior to Point Estimate Based Counterparts", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Multiple instance learning (MIL) is a machine learning paradigm which learns the mapping between bags of instances and bag labels. There are different MIL tasks which can be solved by different MIL methods. One common component of all MIL methods is the MIL pooling filter, which obtains bag level representations from extracted features of instances. Here, we recommend and discuss a grouping scheme for MIL pooling filters: point estimate based pooling filters and distribution based pooling filters. The point estimate based pooling filters include the standard pooling filters, such as \u2018max\u2019, \u2018mean\u2019 and \u2018attention\u2019 pooling. The distribution based pooling filters include recently proposed \u2018distribution\u2019 pooling and newly designed \u2018distribution with attention\u2019 pooling. In this paper, we perform the first systematic analysis of different pooling filters. We theoretically showed that the distribution based pooling filters are superior to the point estimate based counterparts in terms of amount of information captured while obtaining bag level representations from extracted features. Then, we empirically study the performance of the 5 pooling filters, namely \u2018max\u2019, \u2018mean\u2019, \u2018attention\u2019, \u2018distribution\u2019 and \u2018distribution with attention\u2019, on distinct real world MIL tasks. We showed that the performance of different pooling filters are different for different MIL tasks. Moreover, consistent with our theoretical analysis, models with distribution based pooling filters almost always performed equal or better than that with point estimate based pooling filters.", "keywords": "multiple instance learning;mil;mil pooling filters;distribution pooling;point estimate based pooling", "primary_area": "", "supplementary_material": "", "author": "Mustafa Umit Oner;Jared Marc Song;Hwee Kuan Lee;Wing-Kin Sung", "authorids": "~Mustafa_Umit_Oner1;~Jared_Marc_Song1;~Hwee_Kuan_Lee1;~Wing-Kin_Sung1", "gender": "M;M;M;M", "homepage": "https://www.comp.nus.edu.sg/~umitoner/;http://www.bii.a-star.edu.sg;https://web.bii.a-star.edu.sg/~leehk/index.html;https://www.comp.nus.edu.sg/~ksung/", "dblp": ";;;s/WingKinSung", "google_scholar": "https://scholar.google.com.sg/citations?user=ELQAk70AAAAJ;;;https://scholar.google.com.tw/citations?user=KaCbE9MAAAAJ", "orcid": ";;;", "linkedin": ";jared-marc-song-kye-jet-52526713a;;", "or_profile": "~Mustafa_Umit_Oner1;~Jared_Marc_Song1;~Hwee_Kuan_Lee1;~Ken_Sung1", "aff": "A*STAR;Singapore Institute of Technology;BII;National University of Singapore", "aff_domain": "a-star.edu.sg;singaporetech.edu.sg;astar.edu.sg;nus.edu.sg", "position": "PhD student;Undergrad student;Principal Researcher;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=KRKGJrbPcKE", "pdf_size": 0, "rating": "4;4;5", "confidence": "4;5;3", "wc_review": "985;837;390", "wc_reply_reviewers": "0;168;0", "wc_reply_authors": "918;1459;783", "reply_reviewers": "0;1;0", "reply_authors": "2;3;1", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 737.3333333333334, 252.9246703840669 ], "wc_reply_reviewers_avg": [ 56.0, 79.19595949289332 ], "wc_reply_authors_avg": [ 1053.3333333333333, 292.09625506367286 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.0, 0.816496580927726 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:k6ul-q0RquQJ:scholar.google.com/&scioq=Distribution+Based+MIL+Pooling+Filters+are+Superior+to+Point+Estimate+Based+Counterparts&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Agency for Science, Technology and Research;Singapore Institute of Technology;Bioinformatics Institute;National University of Singapore", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.a-star.edu.sg;https://www.singaporetech.edu.sg;https://www.bii.a-star.edu.sg;https://www.nus.edu.sg", "aff_unique_abbr": "A*STAR;SIT;BII;NUS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Singapore" }, { "id": "KTEde38blNB", "title": "Intervention Generative Adversarial Nets", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem. The main idea is to incorporate a regularization term that we call intervention into the objective. We refer to the resulting generative model as Intervention Generative Adversarial Networks (IVGAN). By perturbing the latent representations of real images obtained from an auxiliary encoder network with Gaussian invariant interventions and penalizing the dissimilarity of the distributions of the resulting generated images, the intervention term provides more informative gradient for the generator, significantly improving training stability and encouraging modecovering behaviour. We demonstrate the performance of our approach via solid theoretical analysis and thorough evaluation on standard real-world datasets as well as the stacked MNIST dataset.\n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Jiadong Liang;Liangyu Zhang;Cheng Zhang;Zhihua Zhang", "authorids": "~Jiadong_Liang1;~Liangyu_Zhang2;~Cheng_Zhang3;~Zhihua_Zhang1", "gender": ";M;M;M", "homepage": ";https://zhangliangyu32.github.io/;https://zcrabbit.github.io;http://www.math.pku.edu.cn/teachers/zhzhang/", "dblp": "194/2730;123/7110;;52/5331", "google_scholar": ";rmjtiikAAAAJ;PddDrLgAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Jiadong_Liang1;~Liangyu_Zhang2;~Cheng_Zhang3;~Zhihua_Zhang1", "aff": ";Peking University;Peking University;Peking University", "aff_domain": ";pku.edu.cn;pku.edu.cn;pku.edu.cn", "position": ";PhD student;Assistant Professor;Full Professor", "bibtex": "@misc{\nliang2021intervention,\ntitle={Intervention Generative Adversarial Nets},\nauthor={Jiadong Liang and Liangyu Zhang and Cheng Zhang and Zhihua Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=KTEde38blNB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=KTEde38blNB", "pdf_size": 0, "rating": "2;3;6;7", "confidence": "5;5;3;4", "wc_review": "384;470;452;335", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 2.0615528128088303 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 410.25, 54.001736083203845 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8043996665398437, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2853945439126154447&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Peking University", "aff_unique_dep": "", "aff_unique_url": "http://www.pku.edu.cn", "aff_unique_abbr": "Peking U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "KTS3QeWxRQq", "title": "Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding", "track": "main", "status": "Reject", "tldr": "", "abstract": "Variational autoencoder (VAE) estimates the posterior parameters (mean and variance) of latent variables corresponding to each input data. While it is used for many tasks, the transparency of the model is still an underlying issue. This paper provides a quantitative understanding of VAE property by interpreting VAE as a non-linearly scaled isometric embedding. According to the Rate-distortion theory, the optimal transform coding is achieved by using a PCA-like orthonormal transform where the transform space is isometric to the input. From this analogy, we show theoretically and experimentally that VAE can be mapped to an implicit isometric embedding with a scale factor derived from the posterior parameter. As a result, we can estimate the data probabilities in the input space from the prior, loss metrics, and corresponding posterior parameters. In addition, the quantitative importance of each latent variable can be evaluated like the eigenvalue of PCA.\n", "keywords": "unsupervised representation learning;deep image compression", "primary_area": "", "supplementary_material": "/attachment/87fb4ff7e1783bf646e0fd3d1d8dee80e814c789.zip", "author": "Akira Nakagawa;Keizo Kato", "authorids": "~Akira_Nakagawa1;~Keizo_Kato1", "gender": "M;M", "homepage": ";", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": "akira-nakagawa-29788128/;https://linkedin.com/in/keizo-kato-aimp-vlas", "or_profile": "~Akira_Nakagawa1;~Keizo_Kato1", "aff": ";FUJITSU LABORATORIES LTD.", "aff_domain": ";fujitsu.com", "position": ";Researcher", "bibtex": "@misc{\nnakagawa2021quantitative,\ntitle={Quantitative Understanding of {\\{}VAE{\\}} as a Non-linearly Scaled Isometric Embedding},\nauthor={Akira Nakagawa and Keizo Kato},\nyear={2021},\nurl={https://openreview.net/forum?id=KTS3QeWxRQq}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=KTS3QeWxRQq", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "3;4;2;1", "wc_review": "399;249;460;199", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "146;264;287;741", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 2.5, 1.118033988749895 ], "wc_review_avg": [ 326.75, 106.4668375598712 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 359.5, 226.6610906176885 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8944271909999159, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10335805510258953267&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "Fujitsu Laboratories Ltd.", "aff_unique_dep": "", "aff_unique_url": "https://www.fujitsu.com/global/labs/", "aff_unique_abbr": "Fujitsu Labs", "aff_country_unique_index": "0", "aff_country_unique": "Japan" }, { "title": "Initialization and Regularization of Factorized Neural Layers", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2913", "id": "KTlJT1nof6d", "poster": "", "openreview": "https://openreview.net/forum?id=KTlJT1nof6d", "slides": "https://iclr.cc/virtual/2021/poster/2913", "video": "https://iclr.cc/virtual/2021/poster/2913", "author_site": "Mikhail Khodak, Neil Tenenholtz, Lester Mackey, Nicolo Fusi", "tldr": "", "abstract": "Factorized layers\u2014operations parameterized by products of two or more matrices\u2014occur in a variety of deep learning contexts, including compressed model training, certain types of knowledge distillation, and multi-head self-attention architectures. We study how to initialize and regularize deep nets containing such layers, examining two simple, understudied schemes, spectral initialization and Frobenius decay, for improving their performance. The guiding insight is to design optimization routines for these networks that are as close as possible to that of their well-tuned, non-decomposed counterparts; we back this intuition with an analysis of how the initialization and regularization schemes impact training with gradient descent, drawing on modern attempts to understand the interplay of weight-decay and batch-normalization. Empirically, we highlight the benefits of spectral initialization and Frobenius decay across a variety of settings. In model compression, we show that they enable low-rank methods to significantly outperform both unstructured sparsity and tensor methods on the task of training low-memory residual networks; analogs of the schemes also improve the performance of tensor decomposition techniques. For knowledge distillation, Frobenius decay enables a simple, overcomplete baseline that yields a compact model from over-parameterized training without requiring retraining with or pruning a teacher network. Finally, we show how both schemes applied to multi-head attention lead to improved performance on both translation and unsupervised pre-training.", "keywords": "model compression;knowledge distillation;multi-head attention;matrix factorization", "primary_area": "", "supplementary_material": "/attachment/7b78a17804d358061dc2ca9866aa50caac6bba5f.zip", "author": "Mikhail Khodak;Neil A. Tenenholtz;Lester Mackey;Nicolo Fusi", "authorids": "~Mikhail_Khodak1;~Neil_A._Tenenholtz1;~Lester_Mackey1;~Nicolo_Fusi1", "gender": ";;M;M", "homepage": ";;https://stanford.edu/~lmackey;", "dblp": ";;05/2961;86/10995", "google_scholar": ";;erv7TP0AAAAJ;GldD-lwAAAAJ", "orcid": ";;0000-0002-1102-0387;", "linkedin": ";;lester-mackey-5902909;", "or_profile": "~Mikhail_Khodak1;~Neil_A._Tenenholtz1;~Lester_Mackey1;~Nicolo_Fusi1", "aff": ";;Microsoft Research New England;Microsoft", "aff_domain": ";;microsoft.com;microsoft.com", "position": ";;Principal Researcher;Researcher", "bibtex": "@inproceedings{\nkhodak2021initialization,\ntitle={Initialization and Regularization of Factorized Neural Layers},\nauthor={Mikhail Khodak and Neil A. Tenenholtz and Lester Mackey and Nicolo Fusi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=KTlJT1nof6d}\n}", "github": "[![github](/images/github_icon.svg) microsoft/fnl_paper](https://github.com/microsoft/fnl_paper)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "3;3;3;3", "wc_review": "338;203;422;137", "wc_reply_reviewers": "0;84;60;123", "wc_reply_authors": "585;690;678;479", "reply_reviewers": "0;2;1;2", "reply_authors": "1;2;2;2", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 275.0, 111.58628948038374 ], "wc_reply_reviewers_avg": [ 66.75, 44.61712115320754 ], "wc_reply_authors_avg": [ 608.0, 84.84397444721694 ], "reply_reviewers_avg": [ 1.25, 0.82915619758885 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 65, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15693677234095389612&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=KTlJT1nof6d", "email": ";;microsoft.com;microsoft.com", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Microsoft Research", "aff_unique_url": "https://www.microsoft.com/en-us/research/group/microsoft-research-new-england", "aff_unique_abbr": "MSR NE", "aff_campus_unique_index": "0", "aff_campus_unique": "New England;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Learning Incompressible Fluid Dynamics from Scratch - Towards Fast, Differentiable Fluid Models that Generalize", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3004", "id": "KUDUoRsEphu", "poster": "", "openreview": "https://openreview.net/forum?id=KUDUoRsEphu", "slides": "https://iclr.cc/virtual/2021/poster/3004", "video": "https://iclr.cc/virtual/2021/poster/3004", "author_site": "Nils Wandel, Michael Weinmann, Reinhard Klein", "tldr": "", "abstract": "Fast and stable fluid simulations are an essential prerequisite for applications ranging from computer-generated imagery to computer-aided design in research and development. However, solving the partial differential equations of incompressible fluids is a challenging task and traditional numerical approximation schemes come at high computational costs. Recent deep learning based approaches promise vast speed-ups but do not generalize to new fluid domains, require fluid simulation data for training, or rely on complex pipelines that outsource major parts of the fluid simulation to traditional methods.\n\nIn this work, we propose a novel physics-constrained training approach that generalizes to new fluid domains, requires no fluid simulation data, and allows convolutional neural networks to map a fluid state from time-point t to a subsequent state at time t+dt in a single forward pass. This simplifies the pipeline to train and evaluate neural fluid models. After training, the framework yields models that are capable of fast fluid simulations and can handle various fluid phenomena including the Magnus effect and K\u00e1rm\u00e1n vortex streets. We present an interactive real-time demo to show the speed and generalization capabilities of our trained models. Moreover, the trained neural networks are efficient differentiable fluid solvers as they offer a differentiable update step to advance the fluid simulation in time. We exploit this fact in a proof-of-concept optimal control experiment. Our models significantly outperform a recent differentiable fluid solver in terms of computational speed and accuracy.", "keywords": "Unsupervised Learning;Fluid Dynamics;U-Net", "primary_area": "", "supplementary_material": "/attachment/f5f0ce921d6cbfbfbc4c7e410fa7c866d979d8c6.zip", "author": "Nils Wandel;Michael Weinmann;Reinhard Klein", "authorids": "~Nils_Wandel2;~Michael_Weinmann1;~Reinhard_Klein1", "gender": ";M;M", "homepage": "https://cg.cs.uni-bonn.de/de/mitarbeiter/msc-nils-wandel/;https://graphics.tudelft.nl/michael-weinmann/;https://cg.cs.uni-bonn.de/", "dblp": ";79/9941;28/4015", "google_scholar": "https://scholar.google.de/citations?user=UbKw5nEAAAAJ;https://scholar.google.de/citations?user=NexBuB8AAAAJ;y31iZ3IAAAAJ", "orcid": ";0000-0003-3634-0093;0000-0002-5505-9347", "linkedin": ";;reinhard-klein-90053560/", "or_profile": "~Nils_Wandel2;~Michael_Weinmann1;~Reinhard_Klein1", "aff": "Rheinische Friedrich-Wilhelms Universit\u00e4t Bonn;University of Bonn;Rheinische Friedrich-Wilhelms Universit\u00e4t Bonn", "aff_domain": "uni-bonn.de;uni-bonn.de;uni-bonn.de", "position": "PhD student;Postdoc;Full Professor", "bibtex": "@inproceedings{\nwandel2021learning,\ntitle={Learning Incompressible Fluid Dynamics from Scratch - Towards Fast, Differentiable Fluid Models that Generalize},\nauthor={Nils Wandel and Michael Weinmann and Reinhard Klein},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=KUDUoRsEphu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "3;5;5;3", "wc_review": "319;440;629;317", "wc_reply_reviewers": "0;0;83;76", "wc_reply_authors": "487;462;725;336", "reply_reviewers": "0;0;1;1", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 426.25, 127.21512292176587 ], "wc_reply_reviewers_avg": [ 39.75, 39.8269695056001 ], "wc_reply_authors_avg": [ 502.5, 140.6316109557165 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 99, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15051003364237552028&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=KUDUoRsEphu", "email": "uni-bonn.de;uni-bonn.de;uni-bonn.de", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "Rheinische Friedrich-Wilhelms Universit\u00e4t Bonn;University of Bonn", "aff_unique_dep": ";", "aff_unique_url": "https://www.uni-bonn.de/;https://www.uni-bonn.de/", "aff_unique_abbr": "Uni Bonn;UBonn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Germany" }, { "id": "KVTkzgz3g8O", "title": "TraDE: A Simple Self-Attention-Based Density Estimator", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data. Our model is trained using a penalized maximum likelihood objective, which ensures that samples from the density estimate resemble the training data distribution. The use of self-attention means that the model need not retain conditional sufficient statistics during the auto-regressive process beyond what is needed for each covariate. On standard tabular and image data benchmarks, TraDE produces significantly better density estimates than existing approaches such as normalizing flow estimators and recurrent auto-regressive models. However log-likelihood on held-out data only partially reflects how useful these estimates are in real-world applications. In order to systematically evaluate density estimators, we present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data and demonstrate that TraDE works well in these scenarios. ", "keywords": "density estimation;self-attention", "primary_area": "", "supplementary_material": "", "author": "Rasool Fakoor;Pratik Anil Chaudhari;Jonas Mueller;Alex Smola", "authorids": "~Rasool_Fakoor1;~Pratik_Anil_Chaudhari1;~Jonas_Mueller1;~Alex_Smola1", "gender": "M;M;M;M", "homepage": "http://rasoolfa.github.io;;http://alex.smola.org;https://pratikac.github.io/", "dblp": "123/2447;178/3250;s/AlexanderJSmola;", "google_scholar": "nVsOPtQAAAAJ;HeVcLzAAAAAJ;Tb0ZrYwAAAAJ;c_z5hWEAAAAJ", "orcid": ";;;", "linkedin": "rasool-fakoor-695b5845/;;smola;pratik-chaudhari-59508765", "or_profile": "~Rasool_Fakoor1;~Jonas_Mueller1;~Alex_Smola1;~Pratik_Chaudhari1", "aff": "Amazon Web Services;Amazon;Amazon;School of Engineering and Applied Science, University of Pennsylvania", "aff_domain": "amazon.com;amazon.com;amazon.com;seas.upenn.edu", "position": "Researcher;Scientist;Distinguished Scientist;Assistant Professor", "bibtex": "@misc{\nfakoor2021trade,\ntitle={Tra{\\{}DE{\\}}: A Simple Self-Attention-Based Density Estimator},\nauthor={Rasool Fakoor and Pratik Anil Chaudhari and Jonas Mueller and Alex Smola},\nyear={2021},\nurl={https://openreview.net/forum?id=KVTkzgz3g8O}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=KVTkzgz3g8O", "pdf_size": 0, "rating": "3;4;5", "confidence": "5;4;4", "wc_review": "638;442;376", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "774;1096;270", "reply_reviewers": "0;0;0", "reply_authors": "1;2;1", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 485.3333333333333, 111.26345112190057 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 713.3333333333334, 339.93071189418714 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:JpUL7laWFLYJ:scholar.google.com/&scioq=TraDE:+A+Simple+Self-Attention-Based+Density+Estimator&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Amazon;University of Pennsylvania", "aff_unique_dep": "Amazon Web Services;School of Engineering and Applied Science", "aff_unique_url": "https://aws.amazon.com;https://www.upenn.edu", "aff_unique_abbr": "AWS;UPenn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "KWToR-Phbrz", "title": "Beyond Trivial Counterfactual Generations with Diverse Valuable Explanations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Explainability of machine learning models has gained considerable attention within our research community given the importance of deploying more reliable machine-learning systems. Explanability can also be helpful for model debugging. In computer vision applications, most methods explain models by displaying the regions in the input image that they focus on for their prediction, but it is difficult to improve models based on these explanations since they do not indicate why the model fail. Counterfactual methods, on the other hand, indicate how to perturb the input to change the model prediction, providing details about the model's decision-making. Unfortunately, current counterfactual methods make ambiguous interpretations as they combine multiple biases of the model and the data in a single counterfactual interpretation of the model's decision. Moreover, these methods tend to generate trivial counterfactuals about the model's decision, as they often suggest to exaggerate or remove the presence of the attribute being classified. Trivial counterfactuals are usually not valuable, since the information they provide is often already known to the system's designer. In this work, we propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss to uncover multiple valuable explanations about the model's prediction. Further, we introduce a mechanism to prevent the model from producing trivial explanations. Experiments on CelebA and Synbols demonstrate that our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods. We will make the code public.", "keywords": "Interpretability;Counterfactual;Explanations;Black-Box", "primary_area": "", "supplementary_material": "", "author": "Pau Rodriguez;Massimo Caccia;Alexandre Lacoste;Lee Zamparo;Issam H. Laradji;Laurent Charlin;David Vazquez", "authorids": "~Pau_Rodriguez2;~Massimo_Caccia1;~Alexandre_Lacoste1;~Lee_Zamparo1;~Issam_H._Laradji1;~Laurent_Charlin1;~David_Vazquez1", "gender": ";M;M;M;M;M;", "homepage": ";;http://lzamparo.github.io;https://issamlaradji.github.io/;http://www.cs.toronto.edu/~lcharlin/;http://www.david-vazquez.com;https://prlz77.github.io", "dblp": "43/6338.html;59/6239.html;https://dblp.org/search?q=Lee+Zamparo;142/0043;48/5717;94/8653;190/7735", "google_scholar": "WaE4GicAAAAJ;;https://scholar.google.ca/citations?user=UtAt8MoAAAAJ;https://scholar.google.ca/citations?user=8vRS7F0AAAAJ;Cul0g2YAAAAJ;1jHvtfsAAAAJ;https://scholar.google.es/citations?user=IwBx73wAAAAJ", "orcid": ";;0000-0002-8443-7079;;0000-0002-6545-9459;0000-0002-2845-8158;0000-0002-1689-8084", "linkedin": ";;lee-zamparo/;issam-laradji-67ba1a99/;;https://www.linkedin.com/company/david-vazquez/;", "or_profile": "~Massimo_Caccia1;~Alexandre_Lacoste1;~Lee_Zamparo1;~Issam_H._Laradji1;~Laurent_Charlin1;~David_Vazquez1;~Pau_Rodriguez_Lopez1", "aff": "University of Montreal;Element AI;ServiceNow Research;Element AI;HEC Montreal;ServiceNow research;Element AI", "aff_domain": "umontreal.ca;elementai.com;elementai.com;elementai.com;hec.ca;servicenow.com;elementai.com", "position": "PhD student;Research Scientist;Applied Research Scientist;Researcher;Assistant Professor;Researcher;Researcher", "bibtex": "@misc{\nrodriguez2021beyond,\ntitle={Beyond Trivial Counterfactual Generations with Diverse Valuable Explanations},\nauthor={Pau Rodriguez and Massimo Caccia and Alexandre Lacoste and Lee Zamparo and Issam H. Laradji and Laurent Charlin and David Vazquez},\nyear={2021},\nurl={https://openreview.net/forum?id=KWToR-Phbrz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=KWToR-Phbrz", "pdf_size": 0, "rating": "4;4;6;7", "confidence": "4;4;5;3", "wc_review": "488;1511;845;198", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "457;1491;868;214", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;2;1", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 760.5, 490.1665533265198 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 757.5, 483.71608408238814 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.2721655269759087, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=746165586739007895&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;1;3;2;1", "aff_unique_norm": "University of Montreal;Element AI;ServiceNow;HEC Montreal", "aff_unique_dep": ";;Research;", "aff_unique_url": "https://wwwumontreal.ca;https://www.elementai.com;https://www.servicenow.com;https://www.hec.ca", "aff_unique_abbr": "UM;Element AI;ServiceNow;HEC", "aff_campus_unique_index": "1", "aff_campus_unique": ";Montreal", "aff_country_unique_index": "0;0;1;0;0;1;0", "aff_country_unique": "Canada;United States" }, { "title": "Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2651", "id": "KYPz4YsCPj", "poster": "", "openreview": "https://openreview.net/forum?id=KYPz4YsCPj", "slides": "https://iclr.cc/virtual/2021/poster/2651", "video": "https://iclr.cc/virtual/2021/poster/2651", "author_site": "Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, Pan Li", "tldr": "", "abstract": "Temporal networks serve as abstractions of many real-world dynamic systems. These networks typically evolve according to certain laws, such as the law of triadic closure, which is universal in social networks. Inductive representation learning of temporal networks should be able to capture such laws and further be applied to systems that follow the same laws but have not been unseen during the training stage. Previous works in this area depend on either network node identities or rich edge attributes and typically fail to extract these laws. Here, we propose {\\em Causal Anonymous Walks (CAWs)} to inductively represent a temporal network. CAWs are extracted by temporal random walks and work as automatic retrieval of temporal network motifs to represent network dynamics while avoiding the time-consuming selection and counting of those motifs. CAWs adopt a novel anonymization strategy that replaces node identities with the hitting counts of the nodes based on a set of sampled walks to keep the method inductive, and simultaneously establish the correlation between motifs. We further propose a neural-network model CAW-N to encode CAWs, and pair it with a CAW sampling strategy with constant memory and time cost to support online training and inference. CAW-N is evaluated to predict links over 6 real temporal networks and uniformly outperforms previous SOTA methods by averaged 15\\% AUC gain in the inductive setting. CAW-N also outperforms previous methods in 5 out of the 6 networks in the transductive setting.", "keywords": "temporal networks;inductive representation learning;anonymous walk;network motif", "primary_area": "", "supplementary_material": "/attachment/fdfba4fc19a5727b92f95d64db4b7198c9c1c9b5.zip", "author": "Yanbang Wang;Yen-Yu Chang;Yunyu Liu;Jure Leskovec;Pan Li", "authorids": "~Yanbang_Wang1;yenyu@stanford.edu;~Yunyu_Liu1;~Jure_Leskovec1;~Pan_Li2", "gender": ";;M;;", "homepage": ";;https://wenwen0319.github.io/;http://cs.stanford.edu/~jure/;", "dblp": "232/1994;;;l/JureLeskovec;https://dblp.org/pers/hd/l/Li_0005:Pan", "google_scholar": "Ch3YUgsAAAAJ;;;Q_kKkIUAAAAJ;IroP0EwAAAAJ", "orcid": ";;;0000-0002-5411-923X;", "linkedin": ";;;leskovec/;pan-li-b951105a/", "or_profile": "~Yanbang_Wang1;yenyu@stanford.edu;~Yunyu_Liu1;~Jure_Leskovec1;~Pan_Li2", "aff": "Computer Science Department, Stanford University;;Purdue University;;Purdue University", "aff_domain": "cs.stanford.edu;;purdue.edu;;purdue.edu", "position": "MS student;;PhD student;;Assistant Professor", "bibtex": "@inproceedings{\nwang2021inductive,\ntitle={Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks},\nauthor={Yanbang Wang and Yen-Yu Chang and Yunyu Liu and Jure Leskovec and Pan Li},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=KYPz4YsCPj}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;3;4;4", "wc_review": "211;342;231;506", "wc_reply_reviewers": "0;0;0;28", "wc_reply_authors": "823;697;331;609", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;2", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 322.5, 117.10785626933831 ], "wc_reply_reviewers_avg": [ 7.0, 12.12435565298214 ], "wc_reply_authors_avg": [ 615.0, 180.7484439767048 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 327, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15166917654599098061&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=KYPz4YsCPj", "email": "cs.stanford.edu;;purdue.edu;;purdue.edu", "author_num": 5, "aff_unique_index": "0;1;1", "aff_unique_norm": "Stanford University;Purdue University", "aff_unique_dep": "Computer Science Department;", "aff_unique_url": "https://www.stanford.edu;https://www.purdue.edu", "aff_unique_abbr": "Stanford;Purdue", "aff_campus_unique_index": "0", "aff_campus_unique": "Stanford;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "K_ETaDx3Iv", "title": "FLAGNet : Feature Label based Automatic Generation Network for symbolic music", "track": "main", "status": "Reject", "tldr": "", "abstract": "The technology for automatic music generation has been very actively studied in recent years. However, almost in these studies, handling domain knowledge of music was omitted or considered a difficult task. In particular, research that analyzes and utilizes the characteristics of each bar of music is very rare, even though it is essential in the human composition. We propose a model that generate music with musical characteristics of bars by conditional generative adversarial network, and analyze the good combination of the sequence of which characterized bars for symbolic-domain music generation by Recurrent Neural Network with Long short term memory layer. Also, by analyzing symbolic music data as image-like based on relational pitch approach, it increases the utilization of the data set with arbitrary chord scales and enables the use of generational results extensively. The resulting model FLAGNet generates music with the understanding of musical domain knowledge while handling inputs like minimum unit of note, length of music, chart scales, and chord condition.", "keywords": "cGAN;RNN;MIDI generation;music", "primary_area": "", "supplementary_material": "/attachment/055ee6ab56a3ade6396503d5c037f3b4c1c2ce10.zip", "author": "SeongHyeon Go", "authorids": "~SeongHyeon_Go2", "gender": "M", "homepage": "https://github.com/slslslrhfem", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~SeongHyeon_Go2", "aff": "SungKyunKwan University", "aff_domain": "skku.edu", "position": "Undergrad student", "bibtex": "@misc{\ngo2021flagnet,\ntitle={{\\{}FLAGN{\\}}et : Feature Label based Automatic Generation Network for symbolic music},\nauthor={SeongHyeon Go},\nyear={2021},\nurl={https://openreview.net/forum?id=K_ETaDx3Iv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=K_ETaDx3Iv", "pdf_size": 0, "rating": "2;2;3;3", "confidence": "5;5;5;5", "wc_review": "393;548;1263;330", "wc_reply_reviewers": "114;31;112;0", "wc_reply_authors": "387;265;141;143", "reply_reviewers": "1;1;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 2.5, 0.5 ], "confidence_avg": [ 5.0, 0.0 ], "wc_review_avg": [ 633.5, 371.998991934118 ], "wc_reply_reviewers_avg": [ 64.25, 49.97186708539115 ], "wc_reply_authors_avg": [ 234.0, 101.61200716450787 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:QeSNUrbVyR8J:scholar.google.com/&scioq=FLAGNet+:+Feature+Label+based+Automatic+Generation+Network+for+symbolic+music&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Sungkyunkwan University", "aff_unique_dep": "", "aff_unique_url": "https://www.skku.edu", "aff_unique_abbr": "SKKU", "aff_country_unique_index": "0", "aff_country_unique": "South Korea" }, { "id": "Kao09W-oe8", "title": "Channel-Directed Gradients for Optimization of Convolutional Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. The method requires only simple processing of existing stochastic gradients, can be used in conjunction with any optimizer, and has only a linear overhead (in the number of parameters) compared to computation of the stochastic gradient. The method works by computing the gradient of the loss function with respect to output-channel directed re-weighted L2 or Sobolev metrics, which has the effect of smoothing components of the gradient across a certain direction of the parameter tensor. We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental. We present the continuum theory of such gradients, its discretization, and application to deep networks. Experiments on benchmark datasets, several networks, and baseline optimizers show that optimizers can be improved in generalization error by simply computing the stochastic gradient with respect to output-channel directed metrics.", "keywords": "stochastic optimization;Riemannian geometry;Riemannian gradient flows;convolutional neural nets", "primary_area": "", "supplementary_material": "", "author": "Dong Lao;Peihao Zhu;Peter Wonka;Ganesh Sundaramoorthi", "authorids": "~Dong_Lao1;~Peihao_Zhu1;~Peter_Wonka2;~Ganesh_Sundaramoorthi1", "gender": "M;M;;", "homepage": ";;;", "dblp": "180/5522;255/9066;;", "google_scholar": "dvQXYW0AAAAJ;Gn8URq0AAAAJ;;", "orcid": ";0000-0002-7122-1551;;", "linkedin": ";;;", "or_profile": "~Dong_Lao1;~Peihao_Zhu1;~Peter_Wonka2;~Ganesh_Sundaramoorthi1", "aff": "KAUST;KAUST;;", "aff_domain": "kaust.edu.sa;kaust.edu.sa;;", "position": "PhD;PhD student;;", "bibtex": "@misc{\nlao2021channeldirected,\ntitle={Channel-Directed Gradients for Optimization of Convolutional Neural Networks},\nauthor={Dong Lao and Peihao Zhu and Peter Wonka and Ganesh Sundaramoorthi},\nyear={2021},\nurl={https://openreview.net/forum?id=Kao09W-oe8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=Kao09W-oe8", "pdf_size": 0, "rating": "4;5;6;6;6", "confidence": "3;5;3;1;2", "wc_review": "316;763;124;205;182", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "180;644;348;43;14", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.4, 0.7999999999999999 ], "confidence_avg": [ 2.8, 1.32664991614216 ], "wc_review_avg": [ 318.0, 231.05410621756974 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 245.8, 231.60172710927696 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.48995593493886586, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13154985769854647671&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "King Abdullah University of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaust.edu.sa", "aff_unique_abbr": "KAUST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Saudi Arabia" }, { "id": "Kc6XtnDIZdI", "title": "Fewmatch: Dynamic Prototype Refinement for Semi-Supervised Few-Shot Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Semi-Supervised Few-shot Learning (SS-FSL) investigates the benefit of incorporating unlabelled data in few-shot settings. Recent work has relied on the popular Semi-Supervised Learning (SSL) concept of iterative pseudo-labelling, yet often yield models that are susceptible to error propagation and are sensitive to initialisation. Alternative work utilises the concept of consistency regularisation (CR), a popular SSL state of the art technique where a student model is trained to consistently agree with teacher predictions under different input perturbations, without pseudo-label requirements. However, applications of CR to the SS-FSL set-up struggle to outperform pseudo-labelling approaches; limited available training data yields unreliable early stage predictions and requires fast convergence that is not amenable for, typically slower to converge, CR approaches. In this paper, we introduce a prototype-based approach for SS-FSL that exploits model consistency in a robust manner. Our Dynamic Prototype Refinement (DPR) approach is a novel training paradigm for few-shot model adaptation to new unseen classes, combining concepts from metric and meta-gradient based FSL methods. New class prototypes are alternatively refined 1) explicitly, using labelled and unlabelled data with high confidence class predictions and 2) implicitly, by model fine-tuning using a data selective CR loss. DPR affords CR convergence, with the explicit refinement providing an increasingly stronger initialisation. We demonstrate method efficacy and report extensive experiments on two competitive benchmarks; miniImageNet and tieredImageNet. The ability to effectively utilise and combine information from both labelled base-class and auxiliary unlabelled novel-class data results in significant accuracy improvements.", "keywords": "Few shot learning;Semi-supervised Learning", "primary_area": "", "supplementary_material": "", "author": "Xu Lan;Steven McDonagh;Shaogang Gong;Jiali Wang;Zhenguo Li;Sarah Parisot", "authorids": "~Xu_Lan2;~Steven_McDonagh1;~Shaogang_Gong2;jiali.wang@qmul.ac.uk;~Zhenguo_Li1;~Sarah_Parisot1", "gender": "M;;;;M;", "homepage": "http://www.eecs.qmul.ac.uk/~xl309/;https://smcdonagh.github.io/;;;http://www.ee.columbia.edu/~zgli/;https://parisots.github.io/", "dblp": ";159/2641;;;23/6479;20/10169", "google_scholar": "https://scholar.google.com.hk/citations?user=0fPB-K8AAAAJ;https://scholar.google.co.uk/citations?user=k8-q2AoAAAAJ;;;XboZC1AAAAAJ;https://scholar.google.co.uk/citations?user=N-AmfK4AAAAJ", "orcid": ";0000-0001-7025-5197;;;;", "linkedin": ";;;;;", "or_profile": "~Xu_Lan2;~Steven_McDonagh1;~Shaogang_Gong2;jiali.wang@qmul.ac.uk;~Zhenguo_Li1;~Sarah_Parisot1", "aff": ";Huawei Technologies Ltd.;;;Huawei Noah's Ark Lab;Huawei Technologies Ltd.", "aff_domain": ";huawei.com;;;huawei.com;huawei.com", "position": ";Senior Research Scientist;;;Principal Researcher;Senior research scientist", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=Kc6XtnDIZdI", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;4;4;4", "wc_review": "269;571;402;401", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 410.75, 107.17363248486075 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:4BB_AtMBmTAJ:scholar.google.com/&scioq=Fewmatch:+Dynamic+Prototype+Refinement+for+Semi-Supervised+Few-Shot+Learning&hl=en&as_sdt=0,33", "gs_version_total": 2, "aff_unique_index": "0;0;0", "aff_unique_norm": "Huawei", "aff_unique_dep": "Huawei Technologies", "aff_unique_url": "https://www.huawei.com", "aff_unique_abbr": "Huawei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "KcImcc3j-qS", "title": "Fast Predictive Uncertainty for Classification with Bayesian Deep Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "In Bayesian Deep Learning, distributions over the output of classification neural networks are approximated by first constructing a Gaussian distribution over the weights, then sampling from it to receive a distribution over the categorical output distribution. This is costly. We reconsider old work to construct a Dirichlet approximation of this output distribution, which yields an analytic map between Gaussian distributions in logit space and Dirichlet distributions (the conjugate prior to the categorical) in the output space. We argue that the resulting Dirichlet distribution has theoretical and practical advantages, in particular, more efficient computation of the uncertainty estimate, scaling to large datasets and networks like ImageNet and DenseNet. We demonstrate the use of this Dirichlet approximation by using it to construct a lightweight uncertainty-aware output ranking for the ImageNet setup.", "keywords": "Bayesian Deep Learning;Approximate Inference", "primary_area": "", "supplementary_material": "", "author": "Marius Hobbhahn;Agustinus Kristiadi;Philipp Hennig", "authorids": "~Marius_Hobbhahn1;~Agustinus_Kristiadi1;~Philipp_Hennig1", "gender": ";;M", "homepage": "http://www.mariushobbhahn.com;https://agustinus.kristia.de;http://mml.inf.uni-tuebingen.de", "dblp": "260/0039;215/3954;08/9077", "google_scholar": "SJ1y8o0AAAAJ;_1qe2mYAAAAJ;https://scholar.google.de/citations?user=UeG5w08AAAAJ", "orcid": ";0000-0003-1615-1121;0000-0001-7293-6092", "linkedin": ";agustinus-kristiadi/;", "or_profile": "~Marius_Hobbhahn1;~Agustinus_Kristiadi1;~Philipp_Hennig1", "aff": "Max Planck Institute for Intelligent Systems, Max-Planck Institute;University of Tuebingen;Max Planck Institute for Intelligent Systems, Max-Planck Institute", "aff_domain": "tue.mpg.de;uni-tuebingen.de;tuebingen.mpg.de", "position": "PhD student;PhD student;Adjunct Professor", "bibtex": "@misc{\nhobbhahn2021fast,\ntitle={Fast Predictive Uncertainty for Classification with Bayesian Deep Networks},\nauthor={Marius Hobbhahn and Agustinus Kristiadi and Philipp Hennig},\nyear={2021},\nurl={https://openreview.net/forum?id=KcImcc3j-qS}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=KcImcc3j-qS", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "5;4;4;5", "wc_review": "570;271;1543;474", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "740;411;728;298", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 714.5, 490.36338566414196 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 544.25, 193.9566639741981 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 43, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8925183995102430081&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;0", "aff_unique_norm": "Max Planck Institute for Intelligent Systems;University of Tuebingen", "aff_unique_dep": "Intelligent Systems;", "aff_unique_url": "https://www.mpi-is.mpg.de;https://www.uni-tuebingen.de/", "aff_unique_abbr": "MPI-IS;Uni T\u00fcbingen", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Germany" }, { "id": "KcLlh3Qe7KU", "title": "Ensembles of Generative Adversarial Networks for Disconnected Data", "track": "main", "status": "Reject", "tldr": "", "abstract": "Most computer vision datasets are composed of disconnected sets, such as images of different objects. We prove that distributions of this type of data cannot be represented with a continuous generative network without error, independent of the learning algorithm used. Disconnected datasets can be represented in two ways: with an ensemble of networks or with a single network using a truncated latent space. We show that ensembles are more desirable than truncated distributions for several theoretical and computational reasons. We construct a regularized optimization problem that rigorously establishes the relationships between a single continuous GAN, an ensemble of GANs, conditional GANs, and Gaussian Mixture GANs. The regularization can be computed efficiently, and we show empirically that our framework has a performance sweet spot that can be found via hyperparameter tuning. The ensemble framework provides better performance than a single continuous GAN or cGAN while maintaining fewer total parameters. ", "keywords": "GANs;ensembles;disconnected data", "primary_area": "", "supplementary_material": "", "author": "Lorenzo Luzi;Randall Balestriero;Richard Baraniuk", "authorids": "~Lorenzo_Luzi1;~Randall_Balestriero1;~Richard_Baraniuk1", "gender": "M;M;", "homepage": ";https://randallbalestriero.github.io/;http://richb.rice.edu/", "dblp": ";175/5364;32/2804", "google_scholar": "https://scholar.google.com/citations?hl=en;S1x_xqcAAAAJ;https://scholar.google.com.tw/citations?user=N-BBA20AAAAJ", "orcid": ";;", "linkedin": ";randallbalestriero/;richard-baraniuk", "or_profile": "~Lorenzo_Luzi1;~Randall_Balestriero1;~Richard_Baraniuk1", "aff": "Rice University;Rice University;William Marsh Rice University", "aff_domain": "rice.edu;rice.edu;rice.edu", "position": "PhD student;PhD student;C. Sidney Burrus Professor", "bibtex": "@misc{\nluzi2021ensembles,\ntitle={Ensembles of Generative Adversarial Networks for Disconnected Data},\nauthor={Lorenzo Luzi and Randall Balestriero and Richard Baraniuk},\nyear={2021},\nurl={https://openreview.net/forum?id=KcLlh3Qe7KU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=KcLlh3Qe7KU", "pdf_size": 0, "rating": "4;4;5;7", "confidence": "3;4;3;4", "wc_review": "932;446;462;428", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 567.0, 211.0758157629623 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.40824829046386296, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16725503144243132673&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Rice University", "aff_unique_dep": "", "aff_unique_url": "https://www.rice.edu", "aff_unique_abbr": "Rice", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "KcTBbZ1kM6K", "title": "Out-of-Distribution Generalization Analysis via Influence Function", "track": "main", "status": "Reject", "tldr": "", "abstract": "The mismatch between training dataset and target environment is one major challenge for current machine learning systems. When training data is collected from multiple environments and the the evaluation is on any new environment, we are facing an Out-of-Distribution (OOD) generalization problem that aims to find a model with the best OOD accuracy, i.e. the best worst-environment accuracy. However, with limited access to environments, the worst environment may be unseen, and test accuracy is a biased estimate of OOD accuracy. In this paper, we show that test accuracy may dramatically fail to identify OOD accuracy and mislead the tuning procedure. To this end, we introduce Influence Function, a classical tool from robust statistics, into the OOD generalization problem and suggest the variance of influence function to measure the stability of a model on training environments. We show that the proposed index and test accuracy together can help us discern whether OOD algorithms are needed and whether a model achieves good OOD generalization.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Haotian Ye;Chuanlong Xie;Yue Liu;Zhenguo Li", "authorids": "1800017704@pku.edu.cn;~Chuanlong_Xie1;liuyue52@huawei.com;~Zhenguo_Li1", "gender": ";M;;M", "homepage": ";;;http://www.ee.columbia.edu/~zgli/", "dblp": ";;;23/6479", "google_scholar": ";_fgE3u8AAAAJ;;XboZC1AAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "1800017704@pku.edu.cn;~Chuanlong_Xie1;liuyue52@huawei.com;~Zhenguo_Li1", "aff": ";Huawei Technologies Ltd.;;Huawei Noah's Ark Lab", "aff_domain": ";huawei.com;;huawei.com", "position": ";Researcher;;Principal Researcher", "bibtex": "@misc{\nye2021outofdistribution,\ntitle={Out-of-Distribution Generalization Analysis via Influence Function},\nauthor={Haotian Ye and Chuanlong Xie and Yue Liu and Zhenguo Li},\nyear={2021},\nurl={https://openreview.net/forum?id=KcTBbZ1kM6K}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=KcTBbZ1kM6K", "pdf_size": 0, "rating": "4;4;5;7", "confidence": "5;4;4;5", "wc_review": "1385;1109;1242;796", "wc_reply_reviewers": "410;313;342;238", "wc_reply_authors": "1477;1400;1432;998", "reply_reviewers": "1;1;1;1", "reply_authors": "2;2;2;2", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 1133.0, 217.67521677949466 ], "wc_reply_reviewers_avg": [ 325.75, 61.69430686862444 ], "wc_reply_authors_avg": [ 1326.75, 191.76466697491486 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.40824829046386296, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17376721082699833455&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Huawei", "aff_unique_dep": "Huawei Technologies", "aff_unique_url": "https://www.huawei.com", "aff_unique_abbr": "Huawei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "KfRtxjqU-Hd", "title": "NODE-SELECT: A FLEXIBLE GRAPH NEURAL NETWORK BASED ON REALISTIC PROPAGATION SCHEME", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "While there exists a wide variety of graph neural networks (GNN) for node classification, only a minority of them adopt effective mechanisms to propagate the nodes' information with respect to these nodes' global importance. Additionally, two very important challenges that still significantly affect graph neural networks are the over-fitting and over-smoothing issues. Essentially, both issues cause poor generalization of the model and much poorer node classification performance. In this paper we propose the NODE-SELECT graph neural network (NSGNN): a novel and flexible graph neural network that uses subsetting filters to learn the contribution from the nodes selected to share their information. For the selected nodes, the way their learned information propagates resembles that of actual networks of the real world; where only a subset of nodes simultaneously share information. With the ability to manipulate the message passing operations through the use of numerous ensembled filters, our NODE-SELECT graph neural network is able to address the over-fitting problem and by-pass the over-smoothing challenge for graph neural networks. Furthermore, we also propose an efficient and informative measure named MICS to quantify the over-smoothing problem. Our NODE-SELECT achieved or matched state-of-the art results in a number of transductive experiments over different benchmark datasets.", "keywords": "Node selection;Realistic Propagation;Graph neural networks", "primary_area": "", "supplementary_material": "", "author": "Steph-Yves Louis;Alireza Nasiri;Fatima Christina Rolland;Cameron Mitro;Jianjun Hu", "authorids": "~Steph-Yves_Louis1;~Alireza_Nasiri1;fr92@drexel.edu;cameron.mitro@atriumhealth.org;~Jianjun_Hu1", "gender": "M;;;;M", "homepage": ";;;;http://www.cse.sc.edu/~jianjunh/", "dblp": ";;;;", "google_scholar": "TXFW-uUAAAAJ;;;;https://scholar.google.com.tw/citations?user=_iD3nyMAAAAJ", "orcid": ";;;;0000-0002-8725-6660", "linkedin": "steph-yves-louis-57656946/;;;;", "or_profile": "~Steph-Yves_Louis1;~Alireza_Nasiri1;fr92@drexel.edu;cameron.mitro@atriumhealth.org;~Jianjun_Hu1", "aff": "University of South Carolina-Columbia;;;;", "aff_domain": "email.sc.edu;;;;", "position": "PhD student;;;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=KfRtxjqU-Hd", "pdf_size": 0, "rating": "3;4;4", "confidence": "4;5;3", "wc_review": "517;223;666", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 468.6666666666667, 184.05494348759618 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:IiJOyx_riCEJ:scholar.google.com/&scioq=NODE-SELECT:+A+FLEXIBLE+GRAPH+NEURAL+NETWORK+BASED+ON+REALISTIC+PROPAGATION+SCHEME&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of South Carolina", "aff_unique_dep": "", "aff_unique_url": "https://www.sc.edu", "aff_unique_abbr": "USC", "aff_campus_unique_index": "0", "aff_campus_unique": "Columbia", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "Ki5Mv0iY8C", "title": "On Flat Minima, Large Margins and Generalizability", "track": "main", "status": "Reject", "tldr": "", "abstract": "The intuitive connection to robustness and convincing empirical evidence have made the flatness of the loss surface an attractive measure of generalizability for neural networks. \n Yet it suffers from various problems such as computational difficulties, reparametrization issues, and a growing concern that it may only be an epiphenomenon of optimization methods. \n We provide empirical evidence that under the cross-entropy loss once a neural network reaches a non-trivial training error, the flatness correlates (via Pearson Correlation Coefficient) well to the classification margins, which allows us to better reason about the concerns surrounding flatness. \n Our results lead to the practical recommendation that when assessing generalizability one should consider a margin-based measure instead, as it is computationally more efficient, provides further insight, and is highly correlated to flatness. \n We also use our insight to replace the misleading folklore that small-batch methods generalize better because they are able to escape sharp minima. Instead, we argue that large-batch methods did not have enough time to maximize margins and hence generalize worse. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Daniel Lengyel;Nicholas Jennings;Panos Parpas;Nicholas Kantas", "authorids": "~Daniel_Lengyel1;~Nicholas_Jennings1;~Panos_Parpas1;n.kantas@imperial.ac.uk", "gender": ";M;M;", "homepage": ";http://www.imperial.ac.uk/people/n.jennings/;http://www.doc.ic.ac.uk/~pp500/;", "dblp": ";j/NicholasRJennings;;", "google_scholar": ";;https://scholar.google.com.tw/citations?user=yXcvHysAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Daniel_Lengyel1;~Nicholas_Jennings1;~Panos_Parpas1;n.kantas@imperial.ac.uk", "aff": ";Imperial College London;Imperial College London, Imperial College London;", "aff_domain": ";;imperial.ac.uk;", "position": ";Full Professor;Associate Professor;", "bibtex": "@misc{\nlengyel2021on,\ntitle={On Flat Minima, Large Margins and Generalizability},\nauthor={Daniel Lengyel and Nicholas Jennings and Panos Parpas and Nicholas Kantas},\nyear={2021},\nurl={https://openreview.net/forum?id=Ki5Mv0iY8C}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=Ki5Mv0iY8C", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "5;4;3;4", "wc_review": "423;821;803;411", "wc_reply_reviewers": "0;14;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;1;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 614.5, 197.64804577834815 ], "wc_reply_reviewers_avg": [ 3.5, 6.06217782649107 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17652544874051148208&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Imperial College London", "aff_unique_dep": "", "aff_unique_url": "https://www.imperial.ac.uk", "aff_unique_abbr": "ICL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "KiFeuZu24k", "title": "Global Self-Attention Networks for Image Recognition", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recently, a series of works in computer vision have shown promising results on various image and video understanding tasks using self-attention. However, due to the quadratic computational and memory complexities of self-attention, these works either apply attention only to low-resolution feature maps in later stages of a deep network or restrict the receptive field of attention in each layer to a small local region. To overcome these limitations, this work introduces a new global self-attention module, referred to as the GSA module, which is efficient enough to serve as the backbone component of a deep network. This module consists of two parallel layers: a content attention layer that attends to pixels based only on their content and a positional attention layer that attends to pixels based on their spatial locations. The output of this module is the sum of the outputs of the two layers. Based on the proposed GSA module, we introduce new standalone global attention-based deep networks that use GSA modules instead of convolutions to model pixel interactions. Due to the global extent of the proposed GSA module, a GSA network has the ability to model long-range pixel interactions throughout the network. Our experimental results show that GSA networks outperform the corresponding convolution-based networks significantly on the CIFAR-100 and ImageNet datasets while using less number of parameters and computations. The proposed GSA networks also outperform various existing attention-based networks on the ImageNet dataset.", "keywords": "self-attention;neural network architecture;image classification;semantic segmentation", "primary_area": "", "supplementary_material": "/attachment/a34bda658229e6c866b5d1f9939ba6ab8436e179.zip", "author": "Zhuoran Shen;Irwan Bello;Raviteja Vemulapalli;Xuhui Jia;Ching-Hui Chen", "authorids": "~Zhuoran_Shen1;~Irwan_Bello1;~Raviteja_Vemulapalli1;~Xuhui_Jia1;~Ching-Hui_Chen2", "gender": "M;M;M;M;", "homepage": "https://cmsflash.github.io/;;http://ravitejav.weebly.com/;https://scholar.google.com/citations?view_op=search_authors&mauthors=xuhui+jia&hl=en&oi=ao;", "dblp": "https://dblp.org/pers/s/Shen:Zhuoran.html;190/7529;135/4940;116/8360;87/8052", "google_scholar": "vrj_PtkAAAAJ;YjFF0KgAAAAJ;0OFqm7YAAAAJ;https://scholar.google.com/citations?view_op=search_authors;", "orcid": "0000-0002-3030-326X;;;;", "linkedin": "zhuoran-shen-206020120/;;raviteja-vemulapalli-85146113?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=ios_app;;", "or_profile": "~Zhuoran_Shen1;~Irwan_Bello1;~Raviteja_Vemulapalli1;~Xuhui_Jia1;~Ching-Hui_Chen2", "aff": "Google Research;Google;Google;Google;Google", "aff_domain": "research.google;google.com;google.com;google.com;google.com", "position": "AI Resident;Google Brain;Research Scientist;Researcher;Software Engineer", "bibtex": "@misc{\nshen2021global,\ntitle={Global Self-Attention Networks for Image Recognition},\nauthor={Zhuoran Shen and Irwan Bello and Raviteja Vemulapalli and Xuhui Jia and Ching-Hui Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=KiFeuZu24k}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=KiFeuZu24k", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;5;5;3", "wc_review": "1055;207;346;626", "wc_reply_reviewers": "125;0;0;0", "wc_reply_authors": "1800;973;489;603", "reply_reviewers": "1;0;0;0", "reply_authors": "7;2;3;2", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 558.5, 323.9509993810792 ], "wc_reply_reviewers_avg": [ 31.25, 54.12658773652741 ], "wc_reply_authors_avg": [ 966.25, 513.5422937791978 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 3.5, 2.0615528128088303 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 42, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=345442999736188332&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google Research", "aff_unique_url": "https://research.google", "aff_unique_abbr": "Google Research", "aff_campus_unique_index": "0;0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "KjeUNkU2d26", "title": "Rethinking Content and Style: Exploring Bias for Unsupervised Disentanglement", "track": "main", "status": "Reject", "tldr": "", "abstract": "Content and style (C-S) disentanglement intends to decompose the underlying explanatory factors of objects into two independent latent spaces. Aiming for unsupervised disentanglement, we introduce an inductive bias to our formulation by assigning different and independent roles to content and style when approximating the real data distributions. The content embeddings of individual images are forced to share a common distribution. The style embeddings encoding instance-specific features are used to customize the shared distribution. The experiments on several popular datasets demonstrate that our method achieves the state-of-the-art disentanglement compared to other unsupervised approaches and comparable or even better results than supervised methods. Furthermore, as a new application of C-S disentanglement, we propose to generate multi-view images from a single view image for 3D reconstruction.", "keywords": "Unsupervised Disentanglement;Content and Style Disentanglement;Inductive Bias;Representation Learning", "primary_area": "", "supplementary_material": "", "author": "Xuanchi Ren;Tao Yang;Wenjun Zeng;Yuwang Wang", "authorids": "~Xuanchi_Ren1;~Tao_Yang9;~Wenjun_Zeng3;~Yuwang_Wang3", "gender": "M;M;M;M", "homepage": "https://xuanchiren.com/;https://github.com/ThomasMrY;https://www.eias.ac.cn/h-col-187.html;", "dblp": "255/5432;;57/145;161/2633", "google_scholar": "fDHUk18AAAAJ;https://scholar.google.com.hk/citations?user=qT5psCEAAAAJ;_cUfvYQAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Xuanchi_Ren1;~Tao_Yang9;~Wenjun_Zeng3;~Yuwang_Wang3", "aff": "Microsoft Research Asia;Xi'an Jiaotong University;Microsoft;Microsoft Research Asia", "aff_domain": "microsoft.com;xjtu.edu.cn;microsoft.com;microsoft.com", "position": "Research Intern;PhD student;Principal Researcher;Researcher", "bibtex": "@misc{\nren2021rethinking,\ntitle={Rethinking Content and Style: Exploring Bias for Unsupervised Disentanglement},\nauthor={Xuanchi Ren and Tao Yang and Wenjun Zeng and Yuwang Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=KjeUNkU2d26}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=KjeUNkU2d26", "pdf_size": 0, "rating": "4;4;7", "confidence": "4;4;4", "wc_review": "672;332;417", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "563;446;663", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 1.4142135623730951 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 473.6666666666667, 144.47221955179555 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 557.3333333333334, 88.68045005649341 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17455921322838895106&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Microsoft;Xi'an Jiao Tong University", "aff_unique_dep": "Research;", "aff_unique_url": "https://www.microsoft.com/en-us/research/group/asia;https://www.xjtu.edu.cn", "aff_unique_abbr": "MSR Asia;XJTU", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Asia;", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "China;United States" }, { "id": "Kkw3shxszSd", "title": "Improving Generalizability of Protein Sequence Models via Data Augmentations", "track": "main", "status": "Reject", "tldr": "", "abstract": "While protein sequence data is an emerging application domain for machine learning methods, small modifications to protein sequences can result in difficult-to-predict changes to the protein's function. Consequently, protein machine learning models typically do not use randomized data augmentation procedures analogous to those used in computer vision or natural language, e.g., cropping or synonym substitution. In this paper, we empirically explore a set of simple string manipulations, which we use to augment protein sequence data when fine-tuning semi-supervised protein models. We provide 276 different comparisons to the Tasks Assessing Protein Embeddings (TAPE) baseline models, with Transformer-based models and training datasets that vary from the baseline methods only in the data augmentations and representation learning procedure. For each TAPE validation task, we demonstrate improvements to the baseline scores when the learned protein representation is fixed between tasks. We also show that contrastive learning fine-tuning methods typically outperform masked-token prediction in these models, with increasing amounts of data augmentation generally improving performance for contrastive learning protein methods. We find the most consistent results across TAPE tasks when using domain-motivated transformations, such as amino acid replacement, as well as restricting the Transformer attention to randomly sampled sub-regions of the protein sequence. In rarer cases, we even find that information-destroying augmentations, such as randomly shuffling entire protein sequences, can improve downstream performance. ", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Hongyu Shen;Layne C. Price;Mohammad Taha Bahadori;Franziska Seeger", "authorids": "hongyus@amazon.com;prilayne@amazon.com;~Mohammad_Taha_Bahadori1;fseeger@amazon.com", "gender": ";;M;", "homepage": ";;http://faculty.washington.edu/bahadori/;", "dblp": ";;28/10813.html;", "google_scholar": ";;tlZvhyoAAAAJ;", "orcid": ";;;", "linkedin": ";;tahabahadori/;", "or_profile": "hongyus@amazon.com;prilayne@amazon.com;~Mohammad_Taha_Bahadori1;fseeger@amazon.com", "aff": ";;Amazon;", "aff_domain": ";;amazon.com;", "position": ";;Scientist;", "bibtex": "@misc{\nshen2021improving,\ntitle={Improving Generalizability of Protein Sequence Models via Data Augmentations},\nauthor={Hongyu Shen and Layne C. Price and Mohammad Taha Bahadori and Franziska Seeger},\nyear={2021},\nurl={https://openreview.net/forum?id=Kkw3shxszSd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=Kkw3shxszSd", "pdf_size": 0, "rating": "3;4;6;9", "confidence": "4;4;3;4", "wc_review": "510;741;480;577", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1249;1417;607;102", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.5, 2.29128784747792 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 577.0, 100.98762300400975 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 843.75, 524.1866914563932 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.1259881576697424, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18264706991295585672&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0", "aff_unique_norm": "Amazon", "aff_unique_dep": "Amazon.com, Inc.", "aff_unique_url": "https://www.amazon.com", "aff_unique_abbr": "Amazon", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "KmlvRQo3tC", "title": "MASP: Model-Agnostic Sample Propagation for Few-shot learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Few-shot learning aims to train a classifier given only a few samples per class that are highly insufficient to describe the whole data distribution. These few-shot samples not only introduce high variance to the training but also may include outliers near the class boundaries. Directly feeding these samples to training algorithms can lead to unstable optimization and even incorrect gradient descent direction. In this paper, we improve the robustness to ``outliers'' by learning to propagate and refine the representations of few-shot samples to form a more compact data distribution before using them to train a classifier. We develop a mutual calibration among few-shot samples' representations by graph propagation, for which we learn an attention mechanism to build the graph and determine the propagation weights. On both clean datasets and datasets containing noisy labels, we show that our sample propagation generally improves different types of existing few-shot learning methods in multiple few-shot learning settings.", "keywords": "few-shot learning;sample propagation;feature calibration;outlier removal;noisy label", "primary_area": "", "supplementary_material": "", "author": "Lu Liu;Tianyi Zhou;Guodong Long;Jing Jiang;Xuanyi Dong;Chengqi Zhang", "authorids": "~Lu_Liu7;~Tianyi_Zhou1;~Guodong_Long2;~Jing_Jiang6;~Xuanyi_Dong1;~Chengqi_Zhang1", "gender": "M;M;F;M;M;F", "homepage": "https://tianyizhou.github.io/;https://www.uts.edu.au/staff/guodong.long;https://www.uts.edu.au/staff/jing.jiang;https://xuanyidong.com/;https://research.polyu.edu.hk/en/persons/chengqi-zhang;https://liulu112601.github.io/", "dblp": "88/8205-1;34/10089;68/1974-2;198/1522;71/964;", "google_scholar": "OKvgizMAAAAJ;https://scholar.google.com.au/citations?user=Pl8m7hMAAAAJ;https://scholar.google.com.au/citations?hl=en;7zp9arUAAAAJ;https://scholar.google.com.au/citations?user=B6lBmqEAAAAJ;epMGJ28AAAAJ", "orcid": "0000-0001-5348-0632;0000-0003-3740-9515;;0000-0001-9272-1590;0000-0001-5715-7154;", "linkedin": "tianyizhou;;;;chengqi-zhang-55aa8910/;lu-liu-2b5b93187/", "or_profile": "~Tianyi_Zhou1;~Guodong_Long2;~Jing_Jiang6;~Xuanyi_Dong1;~Chengqi_Zhang1;~Lu_Liu4", "aff": "University of Washington, Seattle;University of Technology Sydney;University of Technology Sydney;University of Technology Sydney;University of Technology Sydney;University of Technology Sydney", "aff_domain": "uw.edu;uts.edu.au;uts.edu.au;uts.edu.au;uts.edu.au;uts.edu.au", "position": "PhD student;Associate Professor;Lecturer;PhD student;Full Professor;PhD student", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer5;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=KmlvRQo3tC", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "5;4;5;4", "wc_review": "670;308;452;222", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "137;336;17;118", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 413.0, 169.61426826773743 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 152.0, 115.60925568482828 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:g-yZMF_393wJ:scholar.google.com/&scioq=MASP:+Model-Agnostic+Sample+Propagation+for+Few-shot+learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;1;1;1", "aff_unique_norm": "University of Washington;University of Technology Sydney", "aff_unique_dep": ";", "aff_unique_url": "https://www.washington.edu;https://www.uts.edu.au", "aff_unique_abbr": "UW;UTS", "aff_campus_unique_index": "0", "aff_campus_unique": "Seattle;", "aff_country_unique_index": "0;1;1;1;1;1", "aff_country_unique": "United States;Australia" }, { "title": "Prototypical Contrastive Learning of Unsupervised Representations", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3090", "id": "KmykpuSrjcq", "poster": "", "openreview": "https://openreview.net/forum?id=KmykpuSrjcq", "slides": "https://iclr.cc/virtual/2021/poster/3090", "video": "https://iclr.cc/virtual/2021/poster/3090", "author_site": "Junnan Li, Pan Zhou, Caiming Xiong, Steven Hoi", "tldr": "", "abstract": "This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that bridges contrastive learning with clustering. PCL not only learns low-level features for the task of instance discrimination, but more importantly, it implicitly encodes semantic structures of the data into the learned embedding space. Specifically, we introduce prototypes as latent variables to help find the maximum-likelihood estimation of the network parameters in an Expectation-Maximization framework. We iteratively perform E-step as finding the distribution of prototypes via clustering and M-step as optimizing the network via contrastive learning. We propose ProtoNCE loss, a generalized version of the InfoNCE loss for contrastive learning, which encourages representations to be closer to their assigned prototypes. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks with substantial improvement in low-resource transfer learning. Code and pretrained models are available at https://github.com/salesforce/PCL.", "keywords": "self-supervised learning;unsupervised learning;representation learning;contrastive learning", "primary_area": "", "supplementary_material": "/attachment/7fe37b283f92372770869f2705675a8d0fff8ec4.zip", "author": "Junnan Li;Pan Zhou;Caiming Xiong;Steven Hoi", "authorids": "~Junnan_Li2;~Pan_Zhou3;~Caiming_Xiong1;~Steven_Hoi2", "gender": ";;M;M", "homepage": ";;http://cmxiong.com/;http://stevenhoi.com", "dblp": ";;80/7282;", "google_scholar": ";;vaSdahkAAAAJ;JoLjflYAAAAJ", "orcid": ";;;", "linkedin": ";;caiming-xiong-150a1417;", "or_profile": "~Junnan_Li2;~Pan_Zhou3;~Caiming_Xiong1;~Steven_Hoi2", "aff": ";;Salesforce Research;Singapore Management University", "aff_domain": ";;salesforce.com;smu.edu.sg", "position": ";;Research Scientist;Associate Professor", "bibtex": "@inproceedings{\nli2021prototypical,\ntitle={Prototypical Contrastive Learning of Unsupervised Representations},\nauthor={Junnan Li and Pan Zhou and Caiming Xiong and Steven Hoi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=KmykpuSrjcq}\n}", "github": "[![github](/images/github_icon.svg) salesforce/PCL](https://github.com/salesforce/PCL) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=KmykpuSrjcq)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;5;4;3", "wc_review": "549;382;401;419", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "287;201;248;171", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 437.75, 65.54912280114814 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 226.75, 44.30787176112163 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.42640143271122083, "gs_citation": 1264, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=298080063887760247&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=KmykpuSrjcq", "email": ";;salesforce.com;smu.edu.sg", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Salesforce;Singapore Management University", "aff_unique_dep": "Salesforce Research;", "aff_unique_url": "https://research.salesforce.com;https://www.smu.edu.sg", "aff_unique_abbr": "Salesforce;SMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Singapore" }, { "title": "Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3096", "id": "KpfasTaLUpq", "poster": "", "openreview": "https://openreview.net/forum?id=KpfasTaLUpq", "slides": "https://iclr.cc/virtual/2021/poster/3096", "video": "https://iclr.cc/virtual/2021/poster/3096", "author_site": "Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah Smith", "tldr": "", "abstract": "Much recent effort has been invested in non-autoregressive neural machine translation, which appears to be an efficient alternative to state-of-the-art autoregressive machine translation on modern GPUs. In contrast to the latter, where generation is sequential, the former allows generation to be parallelized across target token positions. Some of the latest non-autoregressive models have achieved impressive translation quality-speed tradeoffs compared to autoregressive baselines. In this work, we reexamine this tradeoff and argue that autoregressive baselines can be substantially sped up without loss in accuracy. Specifically, we study autoregressive models with encoders and decoders of varied depths. Our extensive experiments show that given a sufficiently deep encoder, a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable inference speed. We show that the speed disadvantage for autoregressive baselines compared to non-autoregressive methods has been overestimated in three aspects: suboptimal layer allocation, insufficient speed measurement, and lack of knowledge distillation. Our results establish a new protocol for future research toward fast, accurate machine translation. Our code is available at https://github.com/jungokasai/deep-shallow.", "keywords": "Machine Translation;Sequence Modeling;Natural Language Processing", "primary_area": "", "supplementary_material": "", "author": "Jungo Kasai;Nikolaos Pappas;Hao Peng;James Cross;Noah Smith", "authorids": "~Jungo_Kasai1;~Nikolaos_Pappas1;~Hao_Peng4;~James_Cross3;~Noah_Smith1", "gender": "M;M;M;M;M", "homepage": "https://homes.cs.washington.edu/~jkasai/;http://nik0spapp.github.io/;;https://homes.cs.washington.edu/~nasmith/;https://haopeng-nlp.github.io/", "dblp": "205/9020;36/8968-2.html;90/4769;90/5204.html;", "google_scholar": "nHCLoIwAAAAJ;https://scholar.google.ch/citations?user=daiFj_cAAAAJ;Oef7pDkAAAAJ;https://scholar.google.com/citations?hl=en;6Y37nm0AAAAJ", "orcid": ";0000-0002-2004-8111;;0000-0002-2310-6380;", "linkedin": ";nik0spapp/;;;", "or_profile": "~Jungo_Kasai1;~Nikolaos_Pappas1;~James_Cross3;~Noah_Smith1;~Hao_Peng1", "aff": "Paul G. Allen School of Computer Science & Engineering, University of Washington;University of Washington;Meta Facebook;Allen Institute for Artificial Intelligence;Department of Computer Science, University of Washington", "aff_domain": "cs.washington.edu;cs.washington.edu;fb.com;allenai.org;cs.washington.edu", "position": "PhD student;Postdoc;Research Scientist;Senior Director of NLP Research;PhD student", "bibtex": "@inproceedings{\nkasai2021deep,\ntitle={Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation},\nauthor={Jungo Kasai and Nikolaos Pappas and Hao Peng and James Cross and Noah Smith},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=KpfasTaLUpq}\n}", "github": "[![github](/images/github_icon.svg) jungokasai/deep-shallow](https://github.com/jungokasai/deep-shallow) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=KpfasTaLUpq)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "5;7;7;9", "confidence": "4;3;3;5", "wc_review": "341;487;358;314", "wc_reply_reviewers": "385;0;22;17", "wc_reply_authors": "917;285;434;156", "reply_reviewers": "1;0;1;1", "reply_authors": "2;1;1;1", "rating_avg": [ 7.0, 1.4142135623730951 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 375.0, 66.53946197558258 ], "wc_reply_reviewers_avg": [ 106.0, 161.28701125633148 ], "wc_reply_authors_avg": [ 448.0, 288.09286697174576 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.4264014327112209, "gs_citation": 187, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9322073775736159949&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=KpfasTaLUpq", "email": "cs.washington.edu;cs.washington.edu;fb.com;allenai.org;cs.washington.edu", "author_num": 5, "aff_unique_index": "0;0;1;2;0", "aff_unique_norm": "University of Washington;Meta;Allen Institute for Artificial Intelligence", "aff_unique_dep": "Paul G. Allen School of Computer Science & Engineering;Meta Platforms, Inc.;", "aff_unique_url": "https://www.washington.edu;https://meta.com;https://allenai.org", "aff_unique_abbr": "UW;Meta;AI2", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Seattle;", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "Kr7CrZPPPo", "title": "Learning a Non-Redundant Collection of Classifiers", "track": "main", "status": "Reject", "tldr": "", "abstract": "Supervised learning models constructed under the i.i.d. assumption have often been shown to exploit spurious or brittle predictive signals instead of more robust ones present in the training data. Inspired by Quality-Diversity algorithms, in this work we train a collection of classifiers to learn distinct solutions to a classification problem, with the goal of learning to exploit a variety of predictive signals present in the training data. We propose an information-theoretic measure of model diversity based on minimizing an estimate of conditional total correlation of final layer representations across models given the label. We consider datasets with synthetically injected spurious correlations and evaluate our framework's ability to rapidly adapt to a change in distribution that destroys the spurious correlation. We compare our method to a variety of baselines under this evaluation protocol, showing that it is competitive with other approaches while being more successful at isolating distinct signals. We also show that our model is competitive with Invariant Risk Minimization under this evaluation protocol without requiring access to the environment information required by IRM to discriminate between spurious and robust signals.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Daniel Pace;Alessandra Russo;Murray Shanahan", "authorids": "~Daniel_Pace1;~Alessandra_Russo1;~Murray_Shanahan1", "gender": ";F;M", "homepage": "https://www.doc.ic.ac.uk/~dp1218/;http://www.imperial.ac.uk/people/a.russo/;https://www.doc.ic.ac.uk/~mpsha/", "dblp": ";79/683;11/5268", "google_scholar": ";https://scholar.google.com.tw/citations?user=_6zceo4AAAAJ;https://scholar.google.co.uk/citations?user=00bnGpAAAAAJ", "orcid": ";0000-0002-3318-8711;0000-0001-5984-2964", "linkedin": ";alessandra-russo-422b6219/?originalSubdomain=uk;", "or_profile": "~Daniel_Pace1;~Alessandra_Russo1;~Murray_Shanahan1", "aff": "Imperial College London;Imperial College London;Imperial College London", "aff_domain": "ic.ac.uk;imperial.ac.uk;", "position": "PhD student;Full Professor;Full Professor", "bibtex": "@misc{\npace2021learning,\ntitle={Learning a Non-Redundant Collection of Classifiers},\nauthor={Daniel Pace and Alessandra Russo and Murray Shanahan},\nyear={2021},\nurl={https://openreview.net/forum?id=Kr7CrZPPPo}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=Kr7CrZPPPo", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "3;4;4;3", "wc_review": "535;296;426;863", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1243;686;683;1092", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;2", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 530.0, 210.05118423850888 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 926.0, 247.332771787323 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:alb_Ww2YcCUJ:scholar.google.com/&scioq=Learning+a+Non-Redundant+Collection+of+Classifiers&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Imperial College London", "aff_unique_dep": "", "aff_unique_url": "https://www.imperial.ac.uk", "aff_unique_abbr": "ICL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "KsN9p5qJN3", "title": "Energy-based Out-of-distribution Detection for Multi-label Classification", "track": "main", "status": "Reject", "tldr": "", "abstract": "Out-of-distribution (OOD) detection is essential to prevent anomalous inputs from causing a model to fail during deployment. Improved methods for OOD detection in multi-class classification have emerged, while OOD detection methods for multi-label classification remain underexplored and use rudimentary techniques. We propose SumEnergy, a simple and effective method, which estimates the OOD indicator scores by aggregating energy scores from multiple labels. We show that SumEnergy can be mathematically interpreted from a joint likelihood perspective. Our results show consistent improvement over previous methods that are based on the maximum-valued scores, which fail to capture joint information from multiple labels. We demonstrate the effectiveness of our method on three common multi-label classification benchmarks, including MS-COCO, PASCAL-VOC, and NUS-WIDE. We show that SumEnergy reduces the FPR95 by up to 10.05% compared to the previous best baseline, establishing state-of-the-art performance. ", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/d2e9a1b3caf6ba8c997acda65f8ebe4aeae6516d.zip", "author": "Haoran Wang;Weitang Liu;Alex Bocchieri;Yixuan Li", "authorids": "~Haoran_Wang5;~Weitang_Liu1;~Alex_Bocchieri1;~Yixuan_Li1", "gender": ";M;M;F", "homepage": ";https://github.com/wetliu;https://bocchs.github.io/;http://pages.cs.wisc.edu/~sharonli/", "dblp": ";194/3059;;144/6087-1", "google_scholar": ";lm55cKIAAAAJ;;https://scholar.google.com/citations?hl=en", "orcid": ";;;", "linkedin": "haoran-wang-581036162/;;alex-bocchieri-9b5160168/;liyixuan", "or_profile": "~Haoran_Wang5;~Weitang_Liu1;~Alex_Bocchieri1;~Yixuan_Li1", "aff": "Fudan University;University of California, San Diego;University of Wisconsin, Madison;Cornell University", "aff_domain": "fudan.edu.cn;ucsd.edu;wisc.edu;cornell.edu", "position": "Undergrad student;PhD student;PhD student;Graduate Student", "bibtex": "@misc{\nwang2021energybased,\ntitle={Energy-based Out-of-distribution Detection for Multi-label Classification},\nauthor={Haoran Wang and Weitang Liu and Alex Bocchieri and Yixuan Li},\nyear={2021},\nurl={https://openreview.net/forum?id=KsN9p5qJN3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=KsN9p5qJN3", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;3;3;5", "wc_review": "192;197;346;208", "wc_reply_reviewers": "292;0;0;112", "wc_reply_authors": "974;471;395;294", "reply_reviewers": "1;0;0;2", "reply_authors": "3;1;1;2", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 235.75, 63.91547152294192 ], "wc_reply_reviewers_avg": [ 101.0, 119.37755232873558 ], "wc_reply_authors_avg": [ 533.5, 261.95848907794533 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.20751433915982243, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3153049894418966608&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Fudan University;University of California, San Diego;University of Wisconsin;Cornell University", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.fudan.edu.cn;https://www.ucsd.edu;https://www.wisc.edu;https://www.cornell.edu", "aff_unique_abbr": "Fudan;UCSD;UW;Cornell", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";San Diego;Madison", "aff_country_unique_index": "0;1;1;1", "aff_country_unique": "China;United States" }, { "title": "Multi-resolution modeling of a discrete stochastic process identifies causes of cancer", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3170", "id": "KtH8W3S_RE", "poster": "", "openreview": "https://openreview.net/forum?id=KtH8W3S_RE", "slides": "https://iclr.cc/virtual/2021/poster/3170", "video": "https://iclr.cc/virtual/2021/poster/3170", "author_site": "Adam Yaari, Maxwell Sherman, Oliver C Priebe, Po-Ru Loh, Boris Katz, Andrei Barbu, Bonnie Berger", "tldr": "", "abstract": "Detection of cancer-causing mutations within the vast and mostly unexplored human genome is a major challenge. Doing so requires modeling the background mutation rate, a highly non-stationary stochastic process, across regions of interest varying in size from one to millions of positions. Here, we present the split-Poisson-Gamma (SPG) distribution, an extension of the classical Poisson-Gamma formulation, to model a discrete stochastic process at multiple resolutions. We demonstrate that the probability model has a closed-form posterior, enabling efficient and accurate linear-time prediction over any length scale after the parameters of the model have been inferred a single time. We apply our framework to model mutation rates in tumors and show that model parameters can be accurately inferred from high-dimensional epigenetic data using a convolutional neural network, Gaussian process, and maximum-likelihood estimation. Our method is both more accurate and more efficient than existing models over a large range of length scales. We demonstrate the usefulness of multi-resolution modeling by detecting genomic elements that drive tumor emergence and are of vastly differing sizes.", "keywords": "Computational Biology;non-stationary stochastic processes;cancer research;deep learning;probabelistic models;graphical models", "primary_area": "", "supplementary_material": "/attachment/29bc2b76ff80115b7585836c5011ec963e51be5d.zip", "author": "Adam Uri Yaari;Maxwell Sherman;Oliver Clarke Priebe;Po-Ru Loh;Boris Katz;Andrei Barbu;Bonnie Berger", "authorids": "~Adam_Uri_Yaari1;~Maxwell_Sherman1;~Oliver_Clarke_Priebe1;~Po-Ru_Loh1;~Boris_Katz1;~Andrei_Barbu3;~Bonnie_Berger1", "gender": "M;M;M;M;M;M;F", "homepage": ";http://www.mit.edu/~maxas/profile.html;http://olivercpriebe.com/;https://statgen.hms.harvard.edu/;http://people.csail.mit.edu/boris/boris.html;https://0xab.com;https://people.csail.mit.edu/bab/", "dblp": "292/7968;;;;k/BorisKatz;58/8365;b/BonnieBerger", "google_scholar": "https://scholar.google.co.il/citations?user=s28yMP0AAAAJ;;;https://scholar.google.com/citations?hl=en;FdNuUb8AAAAJ;t1rjgHgAAAAJ;bYjKaowAAAAJ", "orcid": "0000-0002-1703-9097;0000-0002-5297-9252;;0000-0001-5542-9064;;;", "linkedin": "adam-yaari-b0192ab4/;;;;;andrei-barbu-1166131;", "or_profile": "~Adam_Uri_Yaari1;~Maxwell_Sherman1;~Oliver_Clarke_Priebe1;~Po-Ru_Loh1;~Boris_Katz1;~Andrei_Barbu3;~Bonnie_Berger1", "aff": ";Massachusetts Institute of Technology;University of Pennsylvania;Harvard University;Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology", "aff_domain": ";mit.edu;upenn.edu;harvard.edu;mit.edu;mit.edu;mit.edu", "position": ";PhD student;Undergrad student;Assistant Professor;Principal Research Scientist;Researcher;Full Professor", "bibtex": "@inproceedings{\nyaari2021multiresolution,\ntitle={Multi-resolution modeling of a discrete stochastic process identifies causes of cancer},\nauthor={Adam Uri Yaari and Maxwell Sherman and Oliver Clarke Priebe and Po-Ru Loh and Boris Katz and Andrei Barbu and Bonnie Berger},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=KtH8W3S_RE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7", "confidence": "1;3;3", "wc_review": "325;467;237", "wc_reply_reviewers": "0;143;0", "wc_reply_authors": "720;737;627", "reply_reviewers": "0;1;0", "reply_authors": "1;2;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 2.3333333333333335, 0.9428090415820634 ], "wc_review_avg": [ 343.0, 94.75582655787804 ], "wc_reply_reviewers_avg": [ 47.666666666666664, 67.41084647311754 ], "wc_reply_authors_avg": [ 694.6666666666666, 48.34827355299831 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17023917931303445407&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=KtH8W3S_RE", "email": ";mit.edu;upenn.edu;harvard.edu;mit.edu;mit.edu;mit.edu", "author_num": 7, "aff_unique_index": "0;1;2;0;0;0", "aff_unique_norm": "Massachusetts Institute of Technology;University of Pennsylvania;Harvard University", "aff_unique_dep": ";;", "aff_unique_url": "https://web.mit.edu;https://www.upenn.edu;https://www.harvard.edu", "aff_unique_abbr": "MIT;UPenn;Harvard", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "KubHAaKdSr7", "title": "Modifying Memories in Transformer Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Large Transformer models have achieved impressive performance in many natural language tasks. In particular, Transformer based language models have been shown to have great capabilities in encoding factual knowledge in their vast amount of parameters. While the tasks of improving the memorization and generalization of Transformers have been widely studied, it is not well known how to make transformers forget specific old facts and memorize new ones. In this paper, we propose a new task of explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts. This task is useful in many scenarios, such as updating stale knowledge, protecting privacy, and eliminating unintended biases stored in the models. We benchmarked several approaches that provide natural baseline performances on this task. This leads to the discovery of key components of a Transformer model that are especially effective for knowledge modifications. The work also provides insights into the role that different training phases (such as pretraining and fine-tuning) play towards memorization and knowledge modification.", "keywords": "Transformers;memorization;question answering", "primary_area": "", "supplementary_material": "", "author": "Chen Zhu;Ankit Singh Rawat;Manzil Zaheer;Srinadh Bhojanapalli;Daliang Li;Felix Yu;Sanjiv Kumar", "authorids": "~Chen_Zhu2;~Ankit_Singh_Rawat1;~Manzil_Zaheer1;~Srinadh_Bhojanapalli1;~Daliang_Li1;~Felix_Yu1;~Sanjiv_Kumar1", "gender": "M;M;M;M;M;M;", "homepage": "http://www.cs.umd.edu/~chenzhu/;https://ankitsrawat.github.io/home/;https://www.aclweb.org/anthology/people/m/manzil-zaheer/;https://bsrinadh.github.io/;;http://felixyu.org;http://www.sanjivk.com/", "dblp": "59/10522-1.html;https://dblp.org/pers/hd/r/Rawat:Ankit_Singh;40/10701;131/6700;;23/10574;", "google_scholar": "m-om5O8AAAAJ;http://scholar.google.com/citations?user=U0_ab4cAAAAJ;A33FhJMAAAAJ;bpSF_9EAAAAJ;Am6f2DsAAAAJ;lYvF6cUAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;;", "linkedin": ";;;;daliangli/;;", "or_profile": "~Chen_Zhu2;~Ankit_Singh_Rawat1;~Manzil_Zaheer1;~Srinadh_Bhojanapalli1;~Daliang_Li1;~Felix_Yu1;~Sanjiv_Kumar1", "aff": "Department of Computer Science, University of Maryland, College Park;Google;Google DeepMind;Google;Google;Google;Google", "aff_domain": "cs.umd.edu;google.com;deepmind.com;google.com;google.com;google.com;google.com", "position": "PhD student;Research Scientist;Researcher;Research Scientist;Researcher;Research Scientist;Research Scientist", "bibtex": "@misc{\nzhu2021modifying,\ntitle={Modifying Memories in Transformer Models},\nauthor={Chen Zhu and Ankit Singh Rawat and Manzil Zaheer and Srinadh Bhojanapalli and Daliang Li and Felix Yu and Sanjiv Kumar},\nyear={2021},\nurl={https://openreview.net/forum?id=KubHAaKdSr7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=KubHAaKdSr7", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;4;4;4", "wc_review": "335;410;543;454", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "643;1017;946;915", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;2;2", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 435.5, 75.24792355939132 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 880.25, 141.87912989583774 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 195, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1445640194263756161&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;1;1;1;1;1", "aff_unique_norm": "University of Maryland, College Park;Google", "aff_unique_dep": "Department of Computer Science;Google", "aff_unique_url": "https://www/umd.edu;https://www.google.com", "aff_unique_abbr": "UMD;Google", "aff_campus_unique_index": "0;1;1;1;1;1", "aff_campus_unique": "College Park;Mountain View;", "aff_country_unique_index": "0;0;1;0;0;0;0", "aff_country_unique": "United States;United Kingdom" }, { "title": "Global Convergence of Three-layer Neural Networks in the Mean Field Regime", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2699", "id": "KvyxFqZS_D", "poster": "", "openreview": "https://openreview.net/forum?id=KvyxFqZS_D", "slides": "https://iclr.cc/virtual/2021/poster/2699", "video": "https://iclr.cc/virtual/2021/poster/2699", "author_site": "Huy Tuan Pham, Phan-Minh Nguyen", "tldr": "", "abstract": "In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit. This lends a way to study large-width neural networks via analyzing the mean field limit. Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees. The extension to multilayer ones however has been a highly challenging puzzle, and little is known about the optimization efficiency in the mean field regime when there are more than two layers.\n\nIn this work, we prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime. We first develop a rigorous framework to establish the mean field limit of three-layer networks under stochastic gradient descent training. To that end, we propose the idea of a neuronal embedding, which comprises of a fixed probability space that encapsulates neural networks of arbitrary sizes. The identified mean field limit is then used to prove a global convergence guarantee under suitable regularity and convergence mode assumptions, which \u2013 unlike previous works on two-layer networks \u2013 does not rely critically on convexity. Underlying the result is a universal approximation property, natural of neural networks, which importantly is shown to hold at any finite training time (not necessarily at convergence) via an algebraic topology argument.", "keywords": "deep learning theory", "primary_area": "", "supplementary_material": "", "author": "Huy Tuan Pham;Phan-Minh Nguyen", "authorids": "huypham@stanford.edu;~Phan-Minh_Nguyen1", "gender": ";", "homepage": ";https://npminh12.github.io/", "dblp": ";139/9727", "google_scholar": ";lPG5fAIAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "huypham@stanford.edu;~Phan-Minh_Nguyen1", "aff": ";The Voleon Group", "aff_domain": ";voleon.com", "position": ";Researcher", "bibtex": "@inproceedings{\npham2021global,\ntitle={Global Convergence of Three-layer Neural Networks in the Mean Field Regime},\nauthor={Huy Tuan Pham and Phan-Minh Nguyen},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=KvyxFqZS_D}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "7;7;7;9", "confidence": "3;2;3;2", "wc_review": "325;335;316;341", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "304;341;223;615", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.5, 0.8660254037844386 ], "confidence_avg": [ 2.5, 0.5 ], "wc_review_avg": [ 329.25, 9.54921462739214 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 370.75, 147.33359257141598 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 31, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15120639763964154455&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=KvyxFqZS_D", "email": ";voleon.com", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "Voleon Group", "aff_unique_dep": "", "aff_unique_url": "", "aff_unique_abbr": "", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "KwgQn_Aws3_", "title": "Interpretable Sequence Classification Via Prototype Trajectory", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a novel interpretable recurrent neural network (RNN) model, called ProtoryNet, in which we introduce a new concept of prototype trajectories. Motivated by the prototype theory in modern linguistics, ProtoryNet makes a prediction by finding the most similar prototype for each sentence in a text sequence and feeding an RNN backbone with the proximity of each of the sentences to the prototypes. The RNN backbone then captures the temporal pattern of the prototypes, to which we refer as prototype trajectories. The prototype trajectories enable intuitive, fine-grained interpretation of how the model reached to the final prediction, resembling the process of how humans analyze paragraphs. Experiments conducted on multiple public data sets reveal that the proposed method not only is more interpretable but also is more accurate than the current state-of-the-art prototype-based method. Furthermore, we report a survey result indicating that human users find ProtoryNet more intuitive and easier to understand, compared to the other prototype-based methods.", "keywords": "interpretable;RNN;prototypes", "primary_area": "", "supplementary_material": "", "author": "Dat Hong;Stephen Baek;Tong Wang", "authorids": "dat-hong@uiowa.edu;~Stephen_Baek1;~Tong_Wang4", "gender": ";;F", "homepage": ";http://www.stephenbaek.com;https://tongwang-ai.github.io/", "dblp": ";;https://dblp.uni-trier.de/pid/51/6856-11", "google_scholar": ";;KB6A0esAAAAJ", "orcid": ";;0000-0001-8687-4208", "linkedin": ";;", "or_profile": "dat-hong@uiowa.edu;~Stephen_Baek1;~Tong_Wang4", "aff": ";University of Iowa;University of Iowa", "aff_domain": ";uiowa.edu;iowa.edu", "position": ";Assistant Professor;Assistant Professor", "bibtex": "@misc{\nhong2021interpretable,\ntitle={Interpretable Sequence Classification Via Prototype Trajectory},\nauthor={Dat Hong and Stephen Baek and Tong Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=KwgQn_Aws3_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=KwgQn_Aws3_", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "5;3;3;4", "wc_review": "643;620;188;707", "wc_reply_reviewers": "1333;0;0;0", "wc_reply_authors": "3415;876;343;618", "reply_reviewers": "6;0;0;0", "reply_authors": "12;2;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 539.5, 205.42699433131958 ], "wc_reply_reviewers_avg": [ 333.25, 577.2059316223283 ], "wc_reply_authors_avg": [ 1313.0, 1228.1386322398623 ], "reply_reviewers_avg": [ 1.5, 2.598076211353316 ], "reply_authors_avg": [ 4.0, 4.636809247747852 ], "replies_avg": [ 31, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.40451991747794525, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2241044075489915110&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Iowa", "aff_unique_dep": "", "aff_unique_url": "https://www.uiowa.edu", "aff_unique_abbr": "UIowa", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "KxUlUb26-P3", "title": "PABI: A Unified PAC-Bayesian Informativeness Measure for Incidental Supervision Signals", "track": "main", "status": "Reject", "tldr": "", "abstract": "Real-world applications often require making use of {\\em a range of incidental supervision signals}. However, we currently lack a principled way to measure the benefit an incidental training dataset can bring, and the common practice of using indirect, weaker signals is through exhaustive experiments with various models and hyper-parameters. This paper studies whether we can, {\\em in a single framework, quantify the benefit of various types of incidental signals for one's target task without going through combinatorial experiments}. We propose PABI, a unified informativeness measure motivated by PAC-Bayesian theory, characterizing the reduction in uncertainty that indirect, weak signals provide. We demonstrate PABI's use in quantifying various types of incidental signals including partial labels, noisy labels, constraints, cross-domain signals, and combinations of these. Experiments with various setups on two natural language processing (NLP) tasks, named entity recognition (NER) and question answering (QA), show that PABI correlates well with learning performance, providing a promising way to determine, ahead of learning, which supervision signals would be beneficial.", "keywords": "informativeness measure;incidental supervision;natural language processing", "primary_area": "", "supplementary_material": "/attachment/a41368ab5e5c85eaab12b6f1210447bd7ab3811d.zip", "author": "Hangfeng He;Mingyuan Zhang;Qiang Ning;Dan Roth", "authorids": "~Hangfeng_He3;myz@seas.upenn.edu;qning@amazon.com;~Dan_Roth3", "gender": "M;;;M", "homepage": "https://hornhehhf.github.io;;;https://www.cis.upenn.edu/~danroth/", "dblp": "190/7762-1.html;;;r/DanRoth", "google_scholar": "BbpI6QoAAAAJ;;;E-bpPWgAAAAJ", "orcid": "0000-0001-5136-1218;;;", "linkedin": ";;;dan-roth-8667361/", "or_profile": "~Hangfeng_He3;myz@seas.upenn.edu;qning@amazon.com;~Dan_Roth3", "aff": "University of Pennsylvania;;;University of Pennsylvania", "aff_domain": "upenn.edu;;;upenn.edu", "position": "PhD student;;;Eduardo D. Glandt Distinguished Professor", "bibtex": "@misc{\nhe2021pabi,\ntitle={{\\{}PABI{\\}}: A Unified {\\{}PAC{\\}}-Bayesian Informativeness Measure for Incidental Supervision Signals},\nauthor={Hangfeng He and Mingyuan Zhang and Qiang Ning and Dan Roth},\nyear={2021},\nurl={https://openreview.net/forum?id=KxUlUb26-P3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=KxUlUb26-P3", "pdf_size": 0, "rating": "5;5;7;8", "confidence": "3;3;3;3", "wc_review": "383;437;537;1061", "wc_reply_reviewers": "0;0;0;612", "wc_reply_authors": "362;750;471;1046", "reply_reviewers": "0;0;0;3", "reply_authors": "1;1;1;4", "rating_avg": [ 6.25, 1.299038105676658 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 604.5, 269.28934252955503 ], "wc_reply_reviewers_avg": [ 153.0, 265.00377355803823 ], "wc_reply_authors_avg": [ 657.25, 265.3256254114932 ], "reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:k2KWOr8mRgcJ:scholar.google.com/&scioq=PABI:+A+Unified+PAC-Bayesian+Informativeness+Measure+for+Incidental+Supervision+Signals&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Pennsylvania", "aff_unique_dep": "", "aff_unique_url": "https://www.upenn.edu", "aff_unique_abbr": "UPenn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "Kz42iQirPJI", "title": "Towards Learning to Remember in Meta Learning of Sequential Domains", "track": "main", "status": "Reject", "tldr": "", "abstract": "Meta-learning has made rapid progress in past years, with recent extensions made to avoid catastrophic forgetting in the learning process, namely continual meta learning. It is desirable to generalize the meta learner\u2019s ability to continuously learn in sequential domains, which is largely unexplored to-date. We found through extensive empirical verification that significant improvement\nis needed for current continual learning techniques to be applied in the sequential domain meta learning setting. To tackle the problem, we adapt existing dynamic learning rate adaptation techniques to meta learn both model parameters and learning rates. Adaptation on parameters ensures good generalization performance, while adaptation on learning rates is made to avoid\ncatastrophic forgetting of past domains. Extensive experiments on a sequence of commonly used real-domain data demonstrate the effectiveness of our proposed method, outperforming current strong baselines in continual learning. Our code is made publicly available online (anonymous)", "keywords": "Meta learning;Continual Learning;Sequential Domain Learning", "primary_area": "", "supplementary_material": "", "author": "Zhenyi Wang;Tiehang Duan;Donglin Zhan;Changyou Chen", "authorids": "~Zhenyi_Wang1;~Tiehang_Duan1;~Donglin_Zhan1;~Changyou_Chen1", "gender": ";;M;M", "homepage": ";https://sites.google.com/view/icarusjanestephen;https://www.cse.buffalo.edu/~changyou/;https://joey-wang123.github.io/", "dblp": "184/7734;235/6846.html;65/2802;10/10222-1", "google_scholar": "gemTJXgAAAAJ;;LtEcKBcAAAAJ;F4uLsroAAAAJ", "orcid": "0000-0003-4323-642X;;;", "linkedin": ";;;", "or_profile": "~Tiehang_Duan1;~Donglin_Zhan1;~Changyou_Chen1;~Zhenyi_Wang8", "aff": "Meta Platforms, Inc.;Columbia University;State University of New York, Buffalo;State University of New York, Buffalo", "aff_domain": "fb.com;columbia.edu;buffalo.edu;buffalo.edu", "position": "Research Scientist;PhD student;Assistant Professor;PhD student", "bibtex": "@misc{\nwang2021towards,\ntitle={Towards Learning to Remember in Meta Learning of Sequential Domains},\nauthor={Zhenyi Wang and Tiehang Duan and Donglin Zhan and Changyou Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=Kz42iQirPJI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Kz42iQirPJI", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "5;5;3;5", "wc_review": "358;815;303;1039", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "516;828;472;453", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 628.75, 309.1976511877152 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 567.25, 152.26847178585592 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:1LVirO81bVYJ:scholar.google.com/&scioq=Towards+Learning+to+Remember+in+Meta+Learning+of+Sequential+Domains&hl=en&as_sdt=0,14", "gs_version_total": 0, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "Meta;Columbia University;State University of New York at Buffalo", "aff_unique_dep": "Meta Platforms, Inc.;;", "aff_unique_url": "https://www.meta.com;https://www.columbia.edu;https://www.buffalo.edu", "aff_unique_abbr": "Meta;Columbia;SUNY Buffalo", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Buffalo", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "Kzg0XmE6mxu", "title": "Adversarial Deep Metric Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning a distance metric between pairs of examples is widely important for various tasks. Deep Metric Learning (DML) utilizes deep neural network architectures to learn semantic feature embeddings where the distance between similar examples is close and dissimilar examples are far. While the underlying neural networks produce good accuracy on naturally occurring samples, they are vulnerable to adversarially-perturbed samples that can reduce their accuracy. To create robust versions of DML models, we introduce a robust training approach. A key challenge is that metric losses are not independent --- they depend on all samples in a mini-batch. This sensitivity to samples, if not accounted for, can lead to incorrect robust training. To the best of our knowledge, we are the first to systematically analyze this dependence effect and propose a principled approach for robust training of deep metric learning networks that accounts for the nuances of metric losses. Using experiments on three popular datasets in metric learning, we demonstrate the DML models trained using our techniques display robustness against strong iterative attacks while their performance on unperturbed (natural) samples remains largely unaffected. ", "keywords": "Deep metric learning;adversarial robustness;adversarial examples;adversarial perturbations;adversarial training", "primary_area": "", "supplementary_material": "", "author": "Thomas Kobber Panum;Zi Wang;Pengyu Kan;Earlence Fernandes;Somesh Jha", "authorids": "~Thomas_Kobber_Panum1;~Zi_Wang3;pkan2@cs.wisc.edu;earlence@cs.wisc.edu;~Somesh_Jha1", "gender": "M;M;;;M", "homepage": "https://panum.dk;https://z1w.github.io/;;;", "dblp": "246/5864;;;;j/SomeshJha", "google_scholar": ";https://scholar.google.com/citations?hl=en;;;BaI7l8QAAAAJ", "orcid": ";0000-0002-0815-1343;;;", "linkedin": ";zi-wang-53221139/;;;", "or_profile": "~Thomas_Kobber_Panum1;~Zi_Wang3;pkan2@cs.wisc.edu;earlence@cs.wisc.edu;~Somesh_Jha1", "aff": "Department of Electronic Systems, Aalborg University;University of Wisconsin, Madison;;;Department of Computer Science, University of Wisconsin, Madison", "aff_domain": "es.aau.dk;wisc.edu;;;cs.wisc.edu", "position": "PhD student;PhD student;;;Full Professor", "bibtex": "@misc{\npanum2021adversarial,\ntitle={Adversarial Deep Metric Learning},\nauthor={Thomas Kobber Panum and Zi Wang and Pengyu Kan and Earlence Fernandes and Somesh Jha},\nyear={2021},\nurl={https://openreview.net/forum?id=Kzg0XmE6mxu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=Kzg0XmE6mxu", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;3;4;4", "wc_review": "244;545;555;323", "wc_reply_reviewers": "0;66;0;53", "wc_reply_authors": "395;780;469;345", "reply_reviewers": "0;1;0;1", "reply_authors": "2;3;2;3", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 416.75, 136.19173065939063 ], "wc_reply_reviewers_avg": [ 29.75, 30.102948360584218 ], "wc_reply_authors_avg": [ 497.25, 169.101116199746 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.5, 0.5 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5040592919785972473&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Aalborg University;University of Wisconsin;University of Wisconsin-Madison", "aff_unique_dep": "Department of Electronic Systems;;Department of Computer Science", "aff_unique_url": "https://www.aau.dk;https://www.wisc.edu;https://www.wisc.edu", "aff_unique_abbr": "AAU;UW;UW-Madison", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Madison", "aff_country_unique_index": "0;1;1", "aff_country_unique": "Denmark;United States" }, { "id": "L-88RyVtXGr", "title": "Learning Deeply Shared Filter Bases for Efficient ConvNets", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recently, inspired by repetitive block structure of modern ConvNets, such as ResNets, parameter-sharing among repetitive convolution layers has been proposed to reduce the size of parameters. However, naive sharing of convolution filters poses many challenges such as overfitting and vanishing/exploding gradients, resulting in worse performance than non-shared counterpart models. Furthermore, sharing parameters often increases computational complexity due to additional operations for re-parameterization. In this work, we propose an efficient parameter-sharing structure and an effective training mechanism of deeply shared parameters. In the proposed ConvNet architecture, convolution layers are decomposed into a filter basis, that can be shared recursively, and layer-specific parts. We conjecture that a shared filter basis combined with a small amount of layer-specific parameters can retain, or further enhance, the representation power of individual layers, if a proper training method is applied. We show both theoretically and empirically that potential vanishing/exploding gradients problems can be mitigated by enforcing orthogonality to the shared filter bases. Experimental results demonstrate that our scheme effectively reduces redundancy by saving up to 63.8% of parameters while consistently outperforming non-shared counterpart networks even when a filter basis is deeply shared by up to 10 repetitive convolution layers.", "keywords": "Deep learning;ConvNets;parameter sharing;model compression;convolutional neural networks;recursive networks", "primary_area": "", "supplementary_material": "/attachment/ab2099bc83af6ee8418569095403d9b023ad6ac0.zip", "author": "Woochul Kang;Daeyeon Kim", "authorids": "~Woochul_Kang1;ssregibility@gmail.com", "gender": "M;", "homepage": "https://sites.google.com/site/woochulkang/;", "dblp": "20/3530;", "google_scholar": "pgo9aYAAAAAJ;", "orcid": "0000-0002-4757-8999;", "linkedin": ";", "or_profile": "~Woochul_Kang1;ssregibility@gmail.com", "aff": "Incheon National University, South Korea;", "aff_domain": "inu.ac.kr;", "position": "Associate Professor;", "bibtex": "@misc{\nkang2021learning,\ntitle={Learning Deeply Shared Filter Bases for Efficient ConvNets},\nauthor={Woochul Kang and Daeyeon Kim},\nyear={2021},\nurl={https://openreview.net/forum?id=L-88RyVtXGr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=L-88RyVtXGr", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;3;4;3", "wc_review": "474;253;215;379", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1047;452;533;744", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 330.25, 102.8235746315017 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 694.0, 229.99673910731866 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4058924370254057059&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Incheon National University", "aff_unique_dep": "", "aff_unique_url": "https://www.inu.ac.kr", "aff_unique_abbr": "INU", "aff_country_unique_index": "0", "aff_country_unique": "South Korea" }, { "id": "L2LEB4vd9Qw", "title": "Multimodal Attention for Layout Synthesis in Diverse Domains", "track": "main", "status": "Reject", "tldr": "", "abstract": "We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents and 3D objects. Most complex scenes, natural or human-designed, can be expressed as a meaningful arrangement of simpler compositional graphical primitives. Generating a new layout or extending an existing layout requires understanding the relationships between these primitives. To do this, we propose a multimodal attention framework, MMA, that leverages self-attention to learn contextual relationships between layout elements and generate novel layouts in a given domain. Our framework allows us to generate a new layout either from an empty set or from an initial seed set of primitives, and can easily scale to support an arbitrary of primitives per layout. Further, our analyses show that the model is able to automatically capture the semantic properties of the primitives. We propose simple improvements in both representation of layout primitives, as well as training methods to demonstrate competitive performance in very diverse data domains such as object bounding boxes in natural images (COCO bounding boxes), documents (PubLayNet), mobile applications (RICO dataset) as well as 3D shapes (PartNet).", "keywords": "layout generation;layout synthesis;multimodal attention;transformers;document layouts;generative model;3D", "primary_area": "", "supplementary_material": "", "author": "Kamal Gupta;Vijay Mahadevan;Alessandro Achille;Justin Lazarow;Larry S. Davis;Abhinav Shrivastava", "authorids": "~Kamal_Gupta1;~Vijay_Mahadevan1;~Alessandro_Achille1;~Justin_Lazarow1;~Larry_S._Davis1;~Abhinav_Shrivastava2", "gender": ";M;M;M;M;M", "homepage": "https://kampta.github.io;;;;http://www.umiacs.umd.edu/~lsd/;http://abhinavsh.info", "dblp": ";;190/7328;127/3611;d/LarrySDavis;65/10572", "google_scholar": "tC3td8cAAAAJ;n9fRgvkAAAAJ;;PASh6VEAAAAJ;https://scholar.google.com.tw/citations?user=lc0ARagAAAAJ;mIF9BowAAAAJ", "orcid": ";;;;;0000-0001-8928-8554", "linkedin": "kamalgupta09;;;;;", "or_profile": "~Kamal_Gupta1;~Vijay_Mahadevan1;~Alessandro_Achille1;~Justin_Lazarow1;~Larry_S._Davis1;~Abhinav_Shrivastava2", "aff": "NVIDIA;Amazon;California Institute of Technology;University of California, San Diego;Amazon;Department of Computer Science, University of Maryland, College Park", "aff_domain": "nvidia.com;amazon.com;caltech.edu;ucsd.edu;amazon.com;cs.umd.edu", "position": "Researcher;Researcher;Postdoc;PhD student;Amazon Sr. Principal Scientist;Assistant Professor", "bibtex": "@misc{\ngupta2021multimodal,\ntitle={Multimodal Attention for Layout Synthesis in Diverse Domains},\nauthor={Kamal Gupta and Vijay Mahadevan and Alessandro Achille and Justin Lazarow and Larry S. Davis and Abhinav Shrivastava},\nyear={2021},\nurl={https://openreview.net/forum?id=L2LEB4vd9Qw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=L2LEB4vd9Qw", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;4;4;3", "wc_review": "373;1235;1067;347", "wc_reply_reviewers": "0;161;0;0", "wc_reply_authors": "408;1384;714;569", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 755.5, 400.04093540536576 ], "wc_reply_reviewers_avg": [ 40.25, 69.71504500464731 ], "wc_reply_authors_avg": [ 768.75, 371.339046559879 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6800538860574338446&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;3;1;4", "aff_unique_norm": "NVIDIA;Amazon;California Institute of Technology;University of California, San Diego;University of Maryland, College Park", "aff_unique_dep": "NVIDIA Corporation;Amazon.com, Inc.;;;Department of Computer Science", "aff_unique_url": "https://www.nvidia.com;https://www.amazon.com;https://www.caltech.edu;https://www.ucsd.edu;https://www/umd.edu", "aff_unique_abbr": "NVIDIA;Amazon;Caltech;UCSD;UMD", "aff_campus_unique_index": "1;2;3", "aff_campus_unique": ";Pasadena;San Diego;College Park", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "L3iGqaCTWS9", "title": "Hybrid and Non-Uniform DNN quantization methods using Retro Synthesis data for efficient inference", "track": "main", "status": "Reject", "tldr": "", "abstract": "Existing post-training quantization methods attempt to compensate for the quantization loss by determining the quantized weights and activation ranges with the help of training data. Quantization aware training methods, on the other hand, achieve accuracy near to FP32 models by training the quantized model which consume more time. Both these methods are not effective for privacy constraint applications as they are tightly coupled with training data. In contrast, this paper proposes a data-independent post-training quantization scheme that eliminates the need for training data. This is achieved by generating a faux dataset hereafter called as $\\textit{\u2018Retro-Synthesis Data\u2019}$ from the FP32 model layer statistics and further using it for quantization. This approach outperformed state-of-the-art methods including, but not limited to, ZeroQ and DFQ on models with and without batch-normalization layers for 8, 6 and 4 bit precisions. We also introduced two futuristic variants of post-training quantization methods namely $\\textit{\u2018Hybrid-Quantization\u2019}$ and $\\textit{\u2018Non-Uniform Quantization\u2019}$. The Hybrid-Quantization scheme determines the sensitivity of each layer for per-tensor and per-channel quantization, and thereby generates hybrid quantized models that are $10 - 20\\%$ efficient in inference time while achieving same or better accuracy as compared to per-channel quantization. Also this method outperformed FP32 accuracy when applied for models such as ResNet-18, and ResNet-50 onImageNet dataset. In the proposed Non-Uniform quantization scheme, the weights are grouped into different clusters and these clusters are assigned with a varied number of quantization steps depending on the number of weights and their ranges in respective cluster. This method resulted in an accuracy improvement of $1\\%$ against state-of-the-art quantization methods on ImageNet dataset.", "keywords": "quantization;dnn inference;data free quantization;synthetic data;model compression", "primary_area": "", "supplementary_material": "", "author": "TEJPRATAP GVSL;Raja Kumar;Pradeep NS", "authorids": "~TEJPRATAP_GVSL1;~Raja_Kumar2;pradeep.ns@samsung.com", "gender": "M;M;", "homepage": ";https://raja-kumar.github.io/;", "dblp": ";;", "google_scholar": ";wlU2x_kAAAAJ;", "orcid": "0000-0003-1318-776X;;", "linkedin": ";raja-kumar-58971010a/;", "or_profile": "~TEJPRATAP_GVSL1;~Raja_Kumar2;pradeep.ns@samsung.com", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "@misc{\ngvsl2021hybrid,\ntitle={Hybrid and Non-Uniform {\\{}DNN{\\}} quantization methods using Retro Synthesis data for efficient inference},\nauthor={TEJPRATAP GVSL and Raja Kumar and Pradeep NS},\nyear={2021},\nurl={https://openreview.net/forum?id=L3iGqaCTWS9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=L3iGqaCTWS9", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "5;4;4;5", "wc_review": "785;643;379;368", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "695;973;773;394", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 543.75, 177.5406643560849 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 708.75, 208.09417939961705 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:gfQ2AgpHLM8J:scholar.google.com/&scioq=Hybrid+and+Non-Uniform+DNN+quantization+methods+using+Retro+Synthesis+data+for+efficient+inference&hl=en&as_sdt=0,5", "gs_version_total": 2 }, { "id": "L4n9FPoQL1", "title": "Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled Learning and Conditional Generation with Extra Data", "track": "main", "status": "Reject", "tldr": "", "abstract": "The scarcity of class-labeled data is a ubiquitous bottleneck in a wide range of machine learning problems. While abundant unlabeled data normally exist and provide a potential solution, it is extremely challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled~(PU) classification and conditional generation with extra unlabeled data \\emph{simultaneously}, both of which aim to make full use of agnostic unlabeled data to improve classification and generation performances. In particular, we present a novel training framework to jointly target both PU classification and conditional generation when exposing to extra data, especially out-of-distribution unlabeled data, by exploring the interplay between them: 1) enhancing the performance of PU classifiers with the assistance of a novel Conditional Generative Adversarial Network~(CGAN) that is robust to noisy labels, 2) leveraging extra data with predicted labels from a PU classifier to help the generation. Our key contribution is a Classifier-Noise-Invariant Conditional GAN~(CNI-CGAN) that can learn the clean data distribution from noisy labels predicted by a PU classifier. Theoretically, we proved the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets, verifying the simultaneous improvements on both classification and generation.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/e6e64475850ca5ceca63b7eb261920622a638a04.zip", "author": "Bing Yu;Ke Sun;He Wang;Zhouchen Lin;Zhanxing Zhu", "authorids": "~Bing_Yu1;~Ke_Sun3;~He_Wang6;~Zhouchen_Lin1;~Zhanxing_Zhu1", "gender": ";M;M;M;M", "homepage": ";http://drhewang.com/;https://zhouchenlin.github.io;https://zhanxingzhu.github.io/;https://sites.google.com/view/kesun", "dblp": "47/2129;01/6368-2;l/ZhouchenLin;87/7756.html;69/476-13", "google_scholar": ";https://scholar.google.co.jp/citations?user=BaaPAVYAAAAJ;https://scholar.google.com.tw/citations?user=TanjFwoAAAAJ;a2sHceIAAAAJ;lYdNhFQAAAAJ", "orcid": ";0000-0002-2281-5679;0000-0003-1493-7569;;", "linkedin": ";;;;", "or_profile": "~Bing_Yu1;~He_Wang6;~Zhouchen_Lin1;~Zhanxing_Zhu1;~Ke_Sun6", "aff": "Peking University;University of Leeds;Peking University;Peking University;University of Alberta", "aff_domain": "pku.edu.cn;leeds.ac.uk;pku.edu.cn;pku.edu.cn;ualberta.ca", "position": "PhD student;Associate Professor;Professor;Assistant Professor;PhD student", "bibtex": "@misc{\nyu2021classify,\ntitle={Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled Learning and Conditional Generation with Extra Data},\nauthor={Bing Yu and Ke Sun and He Wang and Zhouchen Lin and Zhanxing Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=L4n9FPoQL1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=L4n9FPoQL1", "pdf_size": 0, "rating": "5;6;6", "confidence": "3;4;4", "wc_review": "361;408;273", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "771;362;245", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 347.3333333333333, 55.954346470036526 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 459.3333333333333, 225.49846020661772 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.9999999999999997, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:zTJ7IRRe_iUJ:scholar.google.com/&scioq=Classify+and+Generate+Reciprocally:+Simultaneous+Positive-Unlabelled+Learning+and+Conditional+Generation+with+Extra+Data&hl=en&as_sdt=0,33", "gs_version_total": 6, "aff_unique_index": "0;1;0;0;2", "aff_unique_norm": "Peking University;University of Leeds;University of Alberta", "aff_unique_dep": ";;", "aff_unique_url": "http://www.pku.edu.cn;https://www.leeds.ac.uk;https://www.ualberta.ca", "aff_unique_abbr": "Peking U;Leeds;UAlberta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0;2", "aff_country_unique": "China;United Kingdom;Canada" }, { "id": "L4v_5Qtshj7", "title": "Goal-Driven Imitation Learning from Observation by Inferring Goal Proximity", "track": "main", "status": "Reject", "tldr": "", "abstract": "Humans can effectively learn to estimate how close they are to completing a desired task simply by watching others fulfill the task. To solve the task, they can then take actions towards states with higher estimated proximity to the goal. From this intuition, we propose a simple yet effective method for imitation learning that learns a goal proximity function from expert demonstrations and online agent experience, and then uses the learned proximity to provide a dense reward signal for training a policy to solve the task. By predicting task progress as the temporal distance to the goal, the goal proximity function improves generalization to unseen states over methods that aim to directly imitate expert behaviors. We demonstrate that our proposed method efficiently learns a set of goal-driven tasks from state-only demonstrations in navigation, robotic arm manipulation, and locomotion tasks.", "keywords": "Imitation Learning;Learning from Observation", "primary_area": "", "supplementary_material": "/attachment/2a47ca7fc4a70bb381b6ef91c3c5eae970be4a81.zip", "author": "Andrew Szot;Youngwoon Lee;Shao-Hua Sun;Joseph J Lim", "authorids": "~Andrew_Szot1;~Youngwoon_Lee1;~Shao-Hua_Sun1;~Joseph_J_Lim1", "gender": "M;M;M;M", "homepage": "https://www.andrewszot.com;https://youngwoon.github.io;http://shaohua0116.github.io;http://people.csail.mit.edu/lim/", "dblp": ";117/4767;158/9680;08/3086", "google_scholar": "IwIWKPYAAAAJ;CDPa3AgAAAAJ;uXsfnaQAAAAJ;jTnQTBoAAAAJ", "orcid": ";0000-0001-9918-1056;0000-0001-7579-6734;", "linkedin": ";;shaohua0116/;", "or_profile": "~Andrew_Szot1;~Youngwoon_Lee1;~Shao-Hua_Sun1;~Joseph_J_Lim1", "aff": "Georgia Institute of Technology;University of Southern California;University of Southern California;University of Southern California", "aff_domain": "gatech.edu;usc.edu;usc.edu;usc.edu", "position": "PhD student;PhD student;PhD student;Assistant Professor", "bibtex": "@misc{\nszot2021goaldriven,\ntitle={Goal-Driven Imitation Learning from Observation by Inferring Goal Proximity},\nauthor={Andrew Szot and Youngwoon Lee and Shao-Hua Sun and Joseph J Lim},\nyear={2021},\nurl={https://openreview.net/forum?id=L4v_5Qtshj7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer5;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=L4v_5Qtshj7", "pdf_size": 0, "rating": "5;5;6;6;7", "confidence": "3;5;3;5;4", "wc_review": "256;791;459;546;506", "wc_reply_reviewers": "0;0;0;354;167", "wc_reply_authors": "442;1052;320;1082;586", "reply_reviewers": "0;0;0;2;2", "reply_authors": "1;2;1;2;3", "rating_avg": [ 5.8, 0.7483314773547882 ], "confidence_avg": [ 4.0, 0.8944271909999159 ], "wc_review_avg": [ 511.6, 171.68412856172813 ], "wc_reply_reviewers_avg": [ 104.2, 140.6533327013619 ], "wc_reply_authors_avg": [ 696.4, 314.2365987595971 ], "reply_reviewers_avg": [ 0.8, 0.9797958971132713 ], "reply_authors_avg": [ 1.8, 0.7483314773547883 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:U_L8uoTdSpkJ:scholar.google.com/&scioq=Goal-Driven+Imitation+Learning+from+Observation+by+Inferring+Goal+Proximity&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "Georgia Institute of Technology;University of Southern California", "aff_unique_dep": ";", "aff_unique_url": "https://www.gatech.edu;https://www.usc.edu", "aff_unique_abbr": "Georgia Tech;USC", "aff_campus_unique_index": "1;1;1", "aff_campus_unique": ";Los Angeles", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "L5b6jUonKFB", "title": "Deep Continuous Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "CNNs and computational models of biological vision share some fundamental principles, which, combined with recent developments in deep learning, have opened up new avenues of research in neuroscience. However, in contrast to biological models, conventional CNN architectures are based on spatio-temporally discrete representations, and thus cannot accommodate certain aspects of biological complexity such as continuously varying receptive field sizes and temporal dynamics of neuronal responses. Here we propose deep continuous networks (DCNs), which combine spatially continuous convolutional filter representations, with the continuous time framework of neural ODEs. This allows us to learn the spatial support of the filters during training, as well as model the temporal evolution of feature maps, linking DCNs closely to biological models. We show that DCNs are versatile. Experimentally, we demonstrate their applicability to a standard classification problem, where they allow for parameter reductions and meta-parametrization. We illustrate the biological plausibility of the scale distributions learned by DCNs and explore their performance in a pattern completion task, which is inspired by models from computational neuroscience. Finally, we suggest that the continuous representations learned by DCNs may enable computationally efficient implementations.", "keywords": "continuous representations;neuroscience;convolutional neural networks;gaussian scale-space;learnable scale;receptive field size;neural ODEs;pattern completion", "primary_area": "", "supplementary_material": "/attachment/d17f44cb1d224f914513e6f801f86646377a3bb9.zip", "author": "Nergis Tomen;Silvia Laura Pintea;Jan van Gemert", "authorids": "~Nergis_Tomen1;~Silvia_Laura_Pintea1;~Jan_van_Gemert1", "gender": ";Not Specified;M", "homepage": "https://www.tudelft.nl/ewi/over-de-faculteit/afdelingen/intelligent-systems/pattern-recognition-bioinformatics/computer-vision-lab/people/nergis-toemen/;https://silvialaurapintea.github.io/;https://jvgemert.github.io/", "dblp": "159/9895;150/4232;25/3153", "google_scholar": "6vcKI6MAAAAJ;shTkx9EAAAAJ;JUdMRGcAAAAJ", "orcid": "0000-0003-3916-1859;;0000-0002-3913-2786", "linkedin": ";;jan-van-gemert-1628b94/", "or_profile": "~Nergis_Tomen1;~Silvia_Laura_Pintea1;~Jan_C_van_Gemert1", "aff": "Delft University of Technology;Delft University of Technology;Delft University of Technology", "aff_domain": "tudelft.nl;tudelft.nl;tudelft.nl", "position": "Postdoc;Researcher;Associate Professor", "bibtex": "@misc{\ntomen2021deep,\ntitle={Deep Continuous Networks},\nauthor={Nergis Tomen and Silvia Laura Pintea and Jan van Gemert},\nyear={2021},\nurl={https://openreview.net/forum?id=L5b6jUonKFB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=L5b6jUonKFB", "pdf_size": 0, "rating": "5;6;7", "confidence": "4;3;4", "wc_review": "235;692;367", "wc_reply_reviewers": "271;0;0", "wc_reply_authors": "1110;1113;950", "reply_reviewers": "1;0;0", "reply_authors": "2;2;2", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 431.3333333333333, 192.0352976813262 ], "wc_reply_reviewers_avg": [ 90.33333333333333, 127.7506251343696 ], "wc_reply_authors_avg": [ 1057.6666666666667, 76.14168080332581 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2388810081875178780&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "aff_unique_index": "0;0;0", "aff_unique_norm": "Delft University of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.tudelft.nl", "aff_unique_abbr": "TU Delft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Netherlands" }, { "id": "L7Irrt5sMQa", "title": "The Surprising Power of Graph Neural Networks with Random Node Initialization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph neural networks (GNNs) are effective models for representation learning on graph-structured data. However, standard GNNs are limited in their expressive power, as they cannot distinguish graphs beyond the capability of the Weisfeiler-Leman (1-WL) graph isomorphism heuristic. This limitation motivated a large body of work, including higher-order GNNs, which are provably more powerful models. To date, higher-order invariant and equivariant networks are the only models with known universality results, but these results are practically hindered by prohibitive computational complexity. Thus, despite their limitations, standard GNNs are commonly used, due to their strong practical performance. In practice, GNNs have shown a promising performance when enhanced with random node initialization (RNI), where the idea is to train and run the models with randomized initial node features. In this paper, we analyze the expressive power of GNNs with RNI, and pose the following question: are GNNs with RNI more expressive than GNNs? We prove that this is indeed the case, by showing that GNNs with RNI are universal, a first such result for GNNs not relying on computationally demanding higher-order properties. We then empirically analyze the effect of RNI on GNNs, based on carefully constructed datasets. Our empirical findings support the superior performance of GNNs with RNI over standard GNNs. In fact, we demonstrate that the performance of GNNs with RNI is often comparable with or better than that of higher-order GNNs, while keeping the much lower memory requirements of standard GNNs. However, this improvement typically comes at the cost of slower model convergence. Somewhat surprisingly, we found that the convergence rate and the accuracy of the models can be improved by using only a partial random initialization regime.", "keywords": "graph representation learning;graph neural networks;expressiveness;universality;random node initialization;Weisfeiler-Lehman heuristic;higher-order graph neural networks", "primary_area": "", "supplementary_material": "/attachment/d9b76656d4a6b9e50bae603887c18226047ab4c8.zip", "author": "Ralph Abboud;Ismail Ilkan Ceylan;Martin Grohe;Thomas Lukasiewicz", "authorids": "ralph.abboud@cs.ox.ac.uk;~Ismail_Ilkan_Ceylan2;~Martin_Grohe1;~Thomas_Lukasiewicz2", "gender": ";;M;", "homepage": ";https://www.cs.ox.ac.uk/people/ismaililkan.ceylan/;http://www.lics.rwth-aachen.de/~grohe;https://www.cs.ox.ac.uk/people/thomas.lukasiewicz/", "dblp": ";147/6111;g/MGrohe;l/ThomasLukasiewicz", "google_scholar": ";avJ5kQcAAAAJ;https://scholar.google.com.tw/citations?user=Sou5ih0AAAAJ;arjucpEAAAAJ", "orcid": ";0000-0003-4118-4689;0000-0002-0292-9142;", "linkedin": ";;;", "or_profile": "ralph.abboud@cs.ox.ac.uk;~Ismail_Ilkan_Ceylan2;~Martin_Grohe1;~Thomas_Lukasiewicz2", "aff": ";University of Oxford;RWTH Aachen University;Department of Computer Science, University of Oxford", "aff_domain": ";oxford.ac.uk;rwth-aachen.de;cs.ox.ac.uk", "position": ";Assistant Professor;Full Professor;Full Professor", "bibtex": "@misc{\nabboud2021the,\ntitle={The Surprising Power of Graph Neural Networks with Random Node Initialization},\nauthor={Ralph Abboud and Ismail Ilkan Ceylan and Martin Grohe and Thomas Lukasiewicz},\nyear={2021},\nurl={https://openreview.net/forum?id=L7Irrt5sMQa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=L7Irrt5sMQa", "pdf_size": 0, "rating": "5;5;7;7", "confidence": "4;4;3;3", "wc_review": "345;504;363;554", "wc_reply_reviewers": "0;348;0;0", "wc_reply_authors": "481;1883;589;819", "reply_reviewers": "0;1;0;0", "reply_authors": "1;3;1;1", "rating_avg": [ 6.0, 1.0 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 441.5, 89.49441323345273 ], "wc_reply_reviewers_avg": [ 87.0, 150.68842025849233 ], "wc_reply_authors_avg": [ 943.0, 556.2679210596275 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 292, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5840493289924648102&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Oxford;RWTH Aachen University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://www.rwth-aachen.de", "aff_unique_abbr": "Oxford;RWTH", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Aachen;Oxford", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United Kingdom;Germany" }, { "title": "The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2771", "id": "L7WD8ZdscQ5", "poster": "", "openreview": "https://openreview.net/forum?id=L7WD8ZdscQ5", "slides": "https://iclr.cc/virtual/2021/poster/2771", "video": "https://iclr.cc/virtual/2021/poster/2771", "author_site": "Wei Tao, sheng long, Gaowei Wu, Qing Tao", "tldr": "", "abstract": "The adaptive stochastic gradient descent (SGD) with momentum has been widely adopted in deep learning as well as convex optimization. In practice, the last iterate is commonly used as the final solution. However, the available regret analysis and the setting of constant momentum parameters only guarantee the optimal convergence of the averaged solution. In this paper, we fill this theory-practice gap by investigating the convergence of the last iterate (referred to as {\\it individual convergence}), which is a more difficult task than convergence analysis of the averaged solution. Specifically, in the constrained convex cases, we prove that the adaptive Polyak's Heavy-ball (HB) method, in which the step size is only updated using the exponential moving average strategy, attains an individual convergence rate of $O(\\frac{1}{\\sqrt{t}})$, as opposed to that of $O(\\frac{\\log t}{\\sqrt {t}})$ of SGD, where $t$ is the number of iterations. Our new analysis not only shows how the HB momentum and its time-varying weight help us to achieve the acceleration in convex optimization but also gives valuable hints how the momentum parameters should be scheduled in deep learning. Empirical results validate the correctness of our convergence analysis in optimizing convex functions and demonstrate the improved performance of the adaptive HB methods in training deep networks.", "keywords": "Deep learning;convex optimization;momentum methods;adaptive heavy-ball methods;optimal convergence", "primary_area": "", "supplementary_material": "/attachment/d6b8197575da8df8b53edccb93c71a89d4c0b9b0.zip", "author": "Wei Tao;Sheng Long;Gaowei Wu;Qing Tao", "authorids": "~Wei_Tao3;ls15186322349@163.com;gaowei.wu@ia.ac.cn;qing.tao@ia.ac.cn", "gender": "M;;;", "homepage": ";;;", "dblp": "https://dblp.uni-trier.de/pid/17/6159.html;;;", "google_scholar": "M-pMjh0AAAAJ;;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Wei_Tao3;ls15186322349@163.com;gaowei.wu@ia.ac.cn;qing.tao@ia.ac.cn", "aff": "Academy of Military Science;;;", "aff_domain": "ams.edu;;;", "position": "Assistant Professor;;;", "bibtex": "@inproceedings{\ntao2021the,\ntitle={The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods},\nauthor={Wei Tao and Sheng Long and Gaowei Wu and Qing Tao},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=L7WD8ZdscQ5}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;3;4;4", "wc_review": "379;166;308;499", "wc_reply_reviewers": "0;0;0;152", "wc_reply_authors": "455;79;312;760", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 338.0, 120.50518661036959 ], "wc_reply_reviewers_avg": [ 38.0, 65.81793068761733 ], "wc_reply_authors_avg": [ 401.5, 246.67843440398272 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 17, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2736208670765643400&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=L7WD8ZdscQ5", "email": "ams.edu;;;", "author_num": 4, "aff_unique_index": "0", "aff_unique_norm": "Academy of Military Science", "aff_unique_dep": "", "aff_unique_url": "", "aff_unique_abbr": "", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "L8BElg6Qldb", "title": "Nonvacuous Loss Bounds with Fast Rates for Neural Networks via Conditional Information Measures", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present a framework to derive bounds on the test loss of randomized learning algorithms for the case of bounded loss functions. This framework leads to bounds that depend on the conditional information density between the the output hypothesis and the choice of the training set, given a larger set of data samples from which the training set is formed. Furthermore, the bounds pertain to the average test loss as well as to its tail probability, both for the PAC-Bayesian and the single-draw settings. If the conditional information density is bounded uniformly in the size $n$ of the training set, our bounds decay as $1/n$, which is referred to as a fast rate. This is in contrast with the tail bounds involving conditional information measures available in the literature, which have a less benign $1/\\sqrt{n}$ dependence. We demonstrate the usefulness of our tail bounds by showing that they lead to estimates of the test loss achievable with several neural network architectures trained on MNIST and Fashion-MNIST that match the state-of-the-art bounds available in the literature.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/327d57b53a4479f25dcf9d759377389a85447e9c.zip", "author": "Fredrik Hellstr\u00f6m;Giuseppe Durisi", "authorids": "~Fredrik_Hellstr\u00f6m1;~Giuseppe_Durisi1", "gender": ";M", "homepage": "https://fredrikhellstrom.github.io/;https://gdurisi.github.io/", "dblp": "167/6308;", "google_scholar": "zTJcV04AAAAJ;A9_oZxwAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Fredrik_Hellstr\u00f6m1;~Giuseppe_Durisi1", "aff": "Chalmers University;ETH Zurich", "aff_domain": "chalmers.se;eth.ch", "position": "PhD student;Postdoc", "bibtex": "@misc{\nhellstr{\\\"o}m2021nonvacuous,\ntitle={Nonvacuous Loss Bounds with Fast Rates for Neural Networks via Conditional Information Measures},\nauthor={Fredrik Hellstr{\\\"o}m and Giuseppe Durisi},\nyear={2021},\nurl={https://openreview.net/forum?id=L8BElg6Qldb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=L8BElg6Qldb", "pdf_size": 0, "rating": "6;6;7", "confidence": "3;4;4", "wc_review": "252;418;153", "wc_reply_reviewers": "0;138;0", "wc_reply_authors": "579;1763;209", "reply_reviewers": "0;2;0", "reply_authors": "1;5;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 274.3333333333333, 109.3323170684476 ], "wc_reply_reviewers_avg": [ 46.0, 65.05382386916237 ], "wc_reply_authors_avg": [ 850.3333333333334, 662.7947562321905 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 2.3333333333333335, 1.8856180831641267 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3287390286171391660&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Chalmers University of Technology;ETH Zurich", "aff_unique_dep": ";", "aff_unique_url": "https://www.chalmers.se;https://www.ethz.ch", "aff_unique_abbr": "Chalmers;ETHZ", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Sweden;Switzerland" }, { "id": "LDSeViRs4-Q", "title": "Increasing-Margin Adversarial (IMA) training to Improve Adversarial Robustness of Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep neural networks (DNNs), including convolutional neural networks, are known to be vulnerable to adversarial attacks, which may lead to disastrous consequences in life-critical applications. Adversarial samples are usually generated by attack algorithms and can also be induced by white noises, and therefore the threats are real. In this study, we propose a novel training method, named Increasing Margin Adversarial (IMA) Training, to improve DNN robustness against adversarial noises. During training, the IMA method increases the margins of training samples by moving the decision boundaries of the DNN model far away from the training samples to improve robustness. The IMA method is evaluated on six publicly available datasets (including a COVID-19 CT image dataset) under strong 100-PGD white-box adversarial attacks, and the results show that the proposed method significantly improved classification accuracy on noisy data while keeping a relatively high accuracy on clean data. We hope our approach may facilitate the development of robust DNN applications, especially for COVID-19 diagnosis using CT images.", "keywords": "Robustness;CNN;Medical image classification", "primary_area": "", "supplementary_material": "", "author": "Linhai Ma;Liang Liang", "authorids": "~Linhai_Ma1;~Liang_Liang2", "gender": "M;", "homepage": "https://sarielma.github.io/;", "dblp": "226/9775;", "google_scholar": "https://scholar.google.com.hk/citations?view_op=list_works;", "orcid": ";", "linkedin": ";", "or_profile": "~Linhai_Ma1;~Liang_Liang2", "aff": "University of Miami;University of Miami", "aff_domain": "miami.edu;miami.edu", "position": "PhD student;Assistant Professor", "bibtex": "@misc{\nma2021increasingmargin,\ntitle={Increasing-Margin Adversarial ({\\{}IMA{\\}}) training to Improve Adversarial Robustness of Neural Networks},\nauthor={Linhai Ma and Liang Liang},\nyear={2021},\nurl={https://openreview.net/forum?id=LDSeViRs4-Q}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=LDSeViRs4-Q", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "1;4;4;3", "wc_review": "342;345;205;270", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1135;1270;296;327", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 3.0, 1.224744871391589 ], "wc_review_avg": [ 290.5, 57.7775908116633 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 757.0, 448.1835561463629 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2664962538102936012&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 11, "aff_unique_index": "0;0", "aff_unique_norm": "University of Miami", "aff_unique_dep": "", "aff_unique_url": "https://www.miami.edu", "aff_unique_abbr": "UM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "LFjnKhTNNQD", "title": "Prepare for the Worst: Generalizing across Domain Shifts with Adversarial Batch Normalization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adversarial training is the industry standard for producing models that are robust to small adversarial perturbations. However, machine learning practitioners need models that are robust to other kinds of changes that occur naturally, such as changes in the style or illumination of input images. Such changes in input distribution have been effectively modeled as shifts in the mean and variance of deep image features. We adapt adversarial training by adversarially perturbing these feature statistics, rather than image pixels, to produce models that are robust to distributional shifts. We also visualize images from adversarially crafted distributions. Our method, Adversarial Batch Normalization (AdvBN), significantly improves the performance of ResNet-50 on ImageNet-C (+8.1%), Stylized-ImageNet (+6.7%), and ImageNet-Instagram (+3.9%) over standard training practices. In addition, we demonstrate that AdvBN can also improve generalization on semantic segmentation.", "keywords": "adversarial training;distributional shifts", "primary_area": "", "supplementary_material": "", "author": "Manli Shu;Zuxuan Wu;Micah Goldblum;Tom Goldstein", "authorids": "~Manli_Shu1;~Zuxuan_Wu1;~Micah_Goldblum1;~Tom_Goldstein1", "gender": "F;M;;M", "homepage": "https://azshue.github.io/;https://zxwu.azurewebsites.net/;;https://www.cs.umd.edu/~tomg/", "dblp": "263/3503;150/8447;241/7231;25/8184", "google_scholar": "https://scholar.google.com/citations?hl=en;7t12hVkAAAAJ;pGDKzuUAAAAJ;KmSuVtgAAAAJ", "orcid": ";;;", "linkedin": "manli-shu-a804a8164/;;;", "or_profile": "~Manli_Shu1;~Zuxuan_Wu1;~Micah_Goldblum1;~Tom_Goldstein1", "aff": "Department of Computer Science, University of Maryland, College Park;Fudan University;University of Maryland, College Park;University of Maryland, College Park", "aff_domain": "cs.umd.edu;fudan.edu;umd.edu;umd.edu", "position": "PhD student;Associate Professor;Postdoc;Associate Professor", "bibtex": "@misc{\nshu2021prepare,\ntitle={Prepare for the Worst: Generalizing across Domain Shifts with Adversarial Batch Normalization},\nauthor={Manli Shu and Zuxuan Wu and Micah Goldblum and Tom Goldstein},\nyear={2021},\nurl={https://openreview.net/forum?id=LFjnKhTNNQD}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=LFjnKhTNNQD", "pdf_size": 0, "rating": "3;5;5;5;6", "confidence": "5;4;4;4;4", "wc_review": "253;432;507;970;401", "wc_reply_reviewers": "0;0;0;550;0", "wc_reply_authors": "660;265;678;1104;287", "reply_reviewers": "0;0;0;2;0", "reply_authors": "1;1;1;3;1", "rating_avg": [ 4.8, 0.9797958971132712 ], "confidence_avg": [ 4.2, 0.39999999999999997 ], "wc_review_avg": [ 512.6, 243.1416048314233 ], "wc_reply_reviewers_avg": [ 110.0, 220.0 ], "wc_reply_authors_avg": [ 598.8, 307.8593185206516 ], "reply_reviewers_avg": [ 0.4, 0.8 ], "reply_authors_avg": [ 1.4, 0.8 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.9185586535436918, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12768862785024127649&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "University of Maryland, College Park;Fudan University;University of Maryland", "aff_unique_dep": "Department of Computer Science;;", "aff_unique_url": "https://www/umd.edu;https://www.fudan.edu.cn;https://www/umd.edu", "aff_unique_abbr": "UMD;Fudan;UMD", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "College Park;", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "United States;China" }, { "id": "LFs3CnHwfM", "title": "A Robust Fuel Optimization Strategy For Hybrid Electric Vehicles: A Deep Reinforcement Learning Based Continuous Time Design Approach", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper deals with the fuel optimization problem for hybrid electric vehicles in reinforcement learning framework. Firstly, considering the hybrid electric vehicle as a completely observable non-linear system with uncertain dynamics, we solve an open-loop deterministic optimization problem. This is followed by the design of a deep reinforcement learning based optimal controller for the non-linear system using concurrent learning based system identifier such that the actual states and the control policy are able to track the optimal trajectory and optimal policy, autonomously even in the presence of external disturbances, modeling errors, uncertainties and noise and signigicantly reducing the computational complexity at the same time, which is in sharp contrast to the conventional methods like PID and Model Predictive Control (MPC) as well as traditional RL approaches like ADP, DDP and DQN that mostly depend on a set of pre-defined rules and provide sub-optimal solutions under similar conditions. The low value of the H-infinity ($H_{\\infty})$ performance index of the proposed optimization algorithm addresses the robustness issue. The optimization technique thus proposed is compared with the traditional fuel optimization strategies for hybrid electric vehicles to illustate the efficacy of the proposed method.", "keywords": "Deep Reinforcement Learning;Optimal Control;Fuel Management System;Hybrid Electric vehicles;H\u221e Performance Index", "primary_area": "", "supplementary_material": "", "author": "Nilanjan Mukherjee;Sudeshna Sarkar", "authorids": "~Nilanjan_Mukherjee1;~Sudeshna_Sarkar1", "gender": "M;F", "homepage": ";http://cse.iitkgp.ac.in/~sudeshna/", "dblp": ";61/3197", "google_scholar": ";https://scholar.google.com.tw/citations?user=AwP_bbsAAAAJ", "orcid": ";0000-0003-3439-4282", "linkedin": "nilanjan-mukherjee-98805a1b8;", "or_profile": "~Nilanjan_Mukherjee1;~Sudeshna_Sarkar1", "aff": "Indian Institute of Technology Kharagpur;Indian Institute of Technology Kharagpur, Dhirubhai Ambani Institute Of Information and Communication Technology", "aff_domain": "iitkgp.ac.in;iitkgp.ac.in", "position": "PhD student;Full Professor", "bibtex": "@misc{\nmukherjee2021a,\ntitle={A Robust Fuel Optimization Strategy For Hybrid Electric Vehicles: A Deep Reinforcement Learning Based Continuous Time Design Approach},\nauthor={Nilanjan Mukherjee and Sudeshna Sarkar},\nyear={2021},\nurl={https://openreview.net/forum?id=LFs3CnHwfM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=LFs3CnHwfM", "pdf_size": 0, "rating": "2;3;4;5", "confidence": "3;4;2;3", "wc_review": "529;392;258;221", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "787;816;709;509", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.5, 1.118033988749895 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 350.0, 121.35691162846886 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 705.25, 119.87154583136066 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3162277660168379, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:TyLwN453O7oJ:scholar.google.com/&scioq=A+Robust+Fuel+Optimization+Strategy+For+Hybrid+Electric+Vehicles:+A+Deep+Reinforcement+Learning+Based+Continuous+Time+Design+Approach&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Indian Institute of Technology Kharagpur", "aff_unique_dep": "", "aff_unique_url": "https://www.iitkgp.ac.in", "aff_unique_abbr": "IIT Kharagpur", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Kharagpur", "aff_country_unique_index": "0;0", "aff_country_unique": "India" }, { "title": "Topology-Aware Segmentation Using Discrete Morse Theory", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2910", "id": "LGgdb4TS4Z", "poster": "", "openreview": "https://openreview.net/forum?id=LGgdb4TS4Z", "slides": "https://iclr.cc/virtual/2021/poster/2910", "video": "https://iclr.cc/virtual/2021/poster/2910", "author_site": "Xiaoling Hu, Yusu Wang, Li Fuxin, Dimitris Samaras, Chao Chen", "tldr": "", "abstract": "In the segmentation of fine-scale structures from natural and biomedical images, per-pixel accuracy is not the only metric of concern. Topological correctness, such as vessel connectivity and membrane closure, is crucial for downstream analysis tasks. In this paper, we propose a new approach to train deep image segmentation networks for better topological accuracy. In particular, leveraging the power of discrete Morse theory (DMT), we identify global structures, including 1D skeletons and 2D patches, which are important for topological accuracy. Trained with a novel loss based on these global structures, the network performance is significantly improved especially near topologically challenging locations (such as weak spots of connections and membranes). On diverse datasets, our method achieves superior performance on both the DICE score and topological metrics.", "keywords": "Topology;Morse theory;Image segmentation", "primary_area": "", "supplementary_material": "", "author": "Xiaoling Hu;Yusu Wang;Li Fuxin;Dimitris Samaras;Chao Chen", "authorids": "~Xiaoling_Hu1;~Yusu_Wang1;~Li_Fuxin1;~Dimitris_Samaras3;~Chao_Chen1", "gender": "M;;;M;M", "homepage": "https://huxiaoling.github.io/;;;https://www.cs.stonybrook.edu/~samaras/;https://chaochen.github.io/", "dblp": "59/11113-2;;;s/DimitrisSamaras;66/3019-12", "google_scholar": "6MfwhCAAAAAJ;;;https://scholar.google.com/citations?hl=en;J-iIIFAAAAAJ", "orcid": ";;;0000-0002-1373-0294;0000-0003-1703-6483", "linkedin": "xiaoling-hu-1329337b/;;;;", "or_profile": "~Xiaoling_Hu1;~Yusu_Wang1;~Li_Fuxin1;~Dimitris_Samaras3;~Chao_Chen1", "aff": "Stony Brook University;;;Stony Brook University;State University of New York, Stony Brook", "aff_domain": "stonybrook.edu;;;cs.stonybrook.edu;stonybrook.edu", "position": "PhD student;;;Full Professor;Assistant Professor", "bibtex": "@inproceedings{\nhu2021topologyaware,\ntitle={Topology-Aware Segmentation Using Discrete Morse Theory},\nauthor={Xiaoling Hu and Yusu Wang and Li Fuxin and Dimitris Samaras and Chao Chen},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LGgdb4TS4Z}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "3;4;2;3", "wc_review": "940;337;194;198", "wc_reply_reviewers": "0;0;0;92", "wc_reply_authors": "1231;600;41;507", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;1;2", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 417.25, 307.2534580765528 ], "wc_reply_reviewers_avg": [ 23.0, 39.83716857408418 ], "wc_reply_authors_avg": [ 594.75, 424.02262616516117 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.3162277660168379, "gs_citation": 112, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14034437358191020810&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=LGgdb4TS4Z", "email": "stonybrook.edu;;;cs.stonybrook.edu;stonybrook.edu", "author_num": 5, "aff_unique_index": "0;0;1", "aff_unique_norm": "Stony Brook University;State University of New York", "aff_unique_dep": ";", "aff_unique_url": "https://www.stonybrook.edu;https://www.stonybrook.edu", "aff_unique_abbr": "SBU;SUNY Stony Brook", "aff_campus_unique_index": "1", "aff_campus_unique": ";Stony Brook", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "LIOgGKRCYkG", "title": "Target Training: Tricking Adversarial Attacks to Fail", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent adversarial defense approaches have failed. Untargeted gradient-based attacks cause classifiers to choose any wrong class. Our novel white-box defense tricks untargeted attacks into becoming attacks targeted at designated target classes. From these target classes, we derive the real classes. The Target Training defense tricks the minimization at the core of untargeted, gradient-based adversarial attacks: minimize the sum of (1) perturbation and (2) classifier adversarial loss. Target Training changes the classifier minimally, and trains it with additional duplicated points (at 0 distance) labeled with designated classes. These differently-labeled duplicated samples minimize both terms (1) and (2) of the minimization, steering attack convergence to samples of designated classes, from which correct classification is derived. Importantly, Target Training eliminates the need to know the attack and the overhead of generating adversarial samples of attacks that minimize perturbations. Without using adversarial samples and against an adaptive attack aware of our defense, Target Training exceeds even default, unsecured classifier accuracy of 84.3% for CIFAR10 with 86.6% against DeepFool attack; and achieves 83.2% against CW-$L_2$ (\u03ba=0) attack. Using adversarial samples, we achieve 75.6% against CW-$L_2$ (\u03ba=40). Due to our deliberate choice of low-capacity classifiers, Target Training does not withstand $L_\\infty$ adaptive attacks in CIFAR10 but withstands CW-$L_\\infty$ (\u03ba=0) in MNIST. Target Training presents a fundamental change in adversarial defense strategy.", "keywords": "adversarial machine learning", "primary_area": "", "supplementary_material": "", "author": "Blerta Lindqvist", "authorids": "~Blerta_Lindqvist1", "gender": "", "homepage": "https://www.linkedin.com/in/blertalindqvist/", "dblp": "197/4815", "google_scholar": "https://scholar.google.com/citations?hl=en", "orcid": "0000-0002-4950-2250", "linkedin": "blertalindqvist/", "or_profile": "~Blerta_Lindqvist1", "aff": "Aalto University", "aff_domain": "aalto.fi", "position": "PhD student", "bibtex": "@misc{\nlindqvist2021target,\ntitle={Target Training: Tricking Adversarial Attacks to Fail},\nauthor={Blerta Lindqvist},\nyear={2021},\nurl={https://openreview.net/forum?id=LIOgGKRCYkG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=LIOgGKRCYkG", "pdf_size": 0, "rating": "5;5;5;7", "confidence": "5;3;3;2", "wc_review": "316;288;299;193", "wc_reply_reviewers": "0;0;82;0", "wc_reply_authors": "794;989;619;297", "reply_reviewers": "0;0;1;0", "reply_authors": "2;2;3;2", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 3.25, 1.0897247358851685 ], "wc_review_avg": [ 274.0, 47.81736086402093 ], "wc_reply_reviewers_avg": [ 20.5, 35.50704155516198 ], "wc_reply_authors_avg": [ 674.75, 254.35052093518505 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -0.6622661785325219, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:8BB1THXoQKMJ:scholar.google.com/&scioq=Target+Training:+Tricking+Adversarial+Attacks+to+Fail&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Aalto University", "aff_unique_dep": "", "aff_unique_url": "https://www.aalto.fi", "aff_unique_abbr": "Aalto", "aff_country_unique_index": "0", "aff_country_unique": "Finland" }, { "id": "LIR3aVGIlln", "title": "Equivariant Normalizing Flows for Point Processes and Sets", "track": "main", "status": "Reject", "tldr": "", "abstract": "A point process describes how random sets of exchangeable points are generated. The points usually influence the positions of each other via attractive and repulsive forces. To model this behavior, it is enough to transform the samples from the uniform process with a sufficiently complex equivariant function. However, learning the parameters of the resulting process is challenging since the likelihood is hard to estimate and often intractable. This leads us to our proposed model - CONFET. Based on continuous normalizing flows, it allows arbitrary interactions between points while having tractable likelihood. Experiments on various real and synthetic datasets show the improved performance of our new scalable approach.", "keywords": "point process;set;normalizing flow;equivariance", "primary_area": "", "supplementary_material": "/attachment/e9acfcaa0478cc4c81a611a753f7a3f320b4f324.zip", "author": "Marin Bilo\u0161;Stephan G\u00fcnnemann", "authorids": "~Marin_Bilo\u01611;~Stephan_G\u00fcnnemann1", "gender": ";M", "homepage": ";http://www.daml.in.tum.de", "dblp": ";43/3011", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Marin_Bilo\u01611;~Stephan_G\u00fcnnemann1", "aff": ";Technical University Munich", "aff_domain": ";tum.de", "position": ";Professor", "bibtex": "@misc{\nbilo{\\v{s}}2021equivariant,\ntitle={Equivariant Normalizing Flows for Point Processes and Sets},\nauthor={Marin Bilo{\\v{s}} and Stephan G{\\\"u}nnemann},\nyear={2021},\nurl={https://openreview.net/forum?id=LIR3aVGIlln}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=LIR3aVGIlln", "pdf_size": 0, "rating": "5;5;6;8", "confidence": "4;3;3;3", "wc_review": "525;601;334;252", "wc_reply_reviewers": "102;96;165;144", "wc_reply_authors": "451;684;527;42", "reply_reviewers": "1;1;1;1", "reply_authors": "1;1;2;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 428.0, 140.668048966352 ], "wc_reply_reviewers_avg": [ 126.75, 28.80429655450728 ], "wc_reply_authors_avg": [ 426.0, 237.08964549300757 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.4714045207910316, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2347120085648408116&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Technical University of Munich", "aff_unique_dep": "", "aff_unique_url": "https://www.tum.de", "aff_unique_abbr": "TUM", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "id": "LJRsOvDJ4gP", "title": "Asymptotic Optimality of Self-Representative Low-Rank Approximation and Its Applications", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We propose a novel technique for sampling representatives from a large, unsupervised dataset. The approach is based on the concept of {\\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation. As the exact computation of self-rank requires a computationally expensive combinatorial search, we propose an efficient algorithm that jointly estimates self-rank and selects the optimal samples with high accuracy. A theoretical upper bound is derived that reaches the tightest bound for two asymptotic cases. The best approximation ratio for self-representative low-rank approximation was presented in ICML 2017~\\cite{Chierichetti-icml-2017}, which was further improved by the bound $\\sqrt{1+K}$ reported in~NeurIPS 2019~\\cite{dan2019optimal}. Both of these bounds depend solely on the number of selected samples. In this paper, for the first time, we present an adaptive approximation ratio depending on spectral properties of the original dataset, $\\small{\\boldsymbol{A}\\in \\mathbb{R}^{N\\times M}}$. In particular, our performance bound is proportional to the condition number $\\kappa(\\boldsymbol{A})$. Our derived approximation ratio is expressed as $1+(\\kappa(\\boldsymbol{A})^2-1)/(N-K)$ which approaches $1$ in two asymptotic cases.\nIn addition to evaluating the proposed algorithm on a synthetic dataset, we show that the proposed sampling scheme can be utilized in real-world applications such as graph node sampling for optimizing the shortest path criterion, and learning a classifier with sampled data.", "keywords": "data selection;low rank approximation;column subset selection", "primary_area": "", "supplementary_material": "/attachment/5e9c457ad328d15ae6438b530a25995c62a4d823.zip", "author": "Saeed Vahidian;mohsen Joneidi;Ashkan Esmaeili;Siavash Khodadadeh;Sharare zehtabian;Ladislau Boloni;Nazanin Rahnavard;Bill Lin;Mubarak Shah", "authorids": "~Saeed_Vahidian1;joneidi@knights.ucf.edu;ashkan.esmaeili@ucf.edu;~Siavash_Khodadadeh1;sharare.zehtabian@knights.ucf.edu;~Ladislau_Boloni1;~Nazanin_Rahnavard1;~Bill_Lin1;~Mubarak_Shah3", "gender": "male;;;M;;M;F;M;M", "homepage": "https://scholar.google.com/citations?user=8Jd1aUEAAAAJ&hl=en;;;http://siavashkh.com/;;http://www.cs.ucf.edu/~lboloni/;http://lcwnlab.eecs.ucf.edu/;;https://www.crcv.ucf.edu/person/mubarak-shah/", "dblp": "165/0383;;;;;b/LadislauBoloni;;l/BillLin.html;s/MubarakShah", "google_scholar": "8Jd1aUEAAAAJ;;;https://scholar.google.com/citations?hl=en;;drG1_tsAAAAJ;https://scholar.google.com.tw/citations?user=PzgFISkAAAAJ;j3geh3QAAAAJ;https://scholar.google.com.tw/citations?user=p8gsO3gAAAAJ", "orcid": ";;;;;0000-0001-5336-9651;;;0000-0002-8216-1128", "linkedin": ";;;siavash-khodadadeh/;;lotzi-b%C3%B6l%C3%B6ni-4a3b79/;;;mubarak-shah-b6aa68213/", "or_profile": "~Saeed_Vahidian1;joneidi@knights.ucf.edu;ashkan.esmaeili@ucf.edu;~Siavash_Khodadadeh1;sharare.zehtabian@knights.ucf.edu;~Ladislau_Boloni1;~Nazanin_Rahnavard1;~Bill_Lin1;~Mubarak_Shah3", "aff": "University of California, San Diego;;;University of Central Florida;;University of Central Florida;University of Central Florida;University of California, San Diego;University of Central Florida", "aff_domain": "ucsd.edu;;;ucf.edu;;ucf.edu;ucf.edu;ucsd.edu;ucf.edu", "position": "PhD student;;;PhD student;;Full Professor;Associate Professor;Full Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=LJRsOvDJ4gP", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "3;3;4;4", "wc_review": "316;600;922;477", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 578.75, 222.29639560730624 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:hOpU9xmFlngJ:scholar.google.com/&scioq=Asymptotic+Optimality+of+Self-Representative+Low-Rank+Approximation+and+Its+Applications&hl=en&as_sdt=0,31", "gs_version_total": 0, "aff_unique_index": "0;1;1;1;0;1", "aff_unique_norm": "University of California, San Diego;University of Central Florida", "aff_unique_dep": ";", "aff_unique_url": "https://www.ucsd.edu;https://www.ucf.edu", "aff_unique_abbr": "UCSD;UCF", "aff_campus_unique_index": "0;0", "aff_campus_unique": "San Diego;", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "LLoe0U9ShkN", "title": "Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes", "track": "main", "status": "Reject", "tldr": "", "abstract": "We derive the optimal approximate posterior over the top-layer weights in a Bayesian neural network for regression, and show that it exhibits strong dependencies on the lower-layer weights. We adapt this result to develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network. We extend this approach to deep Gaussian processes, unifying inference in the two model classes. Our approximate posterior uses learned \"global\" inducing points, which are defined only at the input layer and propagated through the network to obtain inducing inputs at subsequent layers. By contrast, standard, \"local\", inducing point methods from the deep Gaussian process literature optimise a separate set of inducing inputs at every layer, and thus do not model correlations across layers. Our method gives state-of-the-art performance for a variational Bayesian method, without data augmentation or tempering, on CIFAR-10 of $86.7\\%$.", "keywords": "Bayesian neural networks;deep Gaussian processes;variational inference;inducing points", "primary_area": "", "supplementary_material": "/attachment/9ce4d7d3ec911e02b2e30aeeec99917ccd504d0e.zip", "author": "Sebastian W. Ober;Laurence Aitchison", "authorids": "~Sebastian_W._Ober1;~Laurence_Aitchison1", "gender": ";", "homepage": ";http://www.gatsby.ucl.ac.uk/~laurence/", "dblp": ";155/1918.html", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Sebastian_W._Ober1;~Laurence_Aitchison1", "aff": ";University of Bristol", "aff_domain": ";bristol.ac.uk", "position": ";Assistant Professor", "bibtex": "@misc{\nober2021global,\ntitle={Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes},\nauthor={Sebastian W. Ober and Laurence Aitchison},\nyear={2021},\nurl={https://openreview.net/forum?id=LLoe0U9ShkN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer6;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=LLoe0U9ShkN", "pdf_size": 0, "rating": "6;7;7", "confidence": "3;3;3", "wc_review": "332;328;369", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "237;230;346", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 343.0, 18.457157599876172 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 271.0, 53.10994884827763 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 64, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8024621603786330099&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "University of Bristol", "aff_unique_dep": "", "aff_unique_url": "https://www.bristol.ac.uk", "aff_unique_abbr": "Bristol", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "LMslR3CTzE_", "title": "Neural Subgraph Matching", "track": "main", "status": "Reject", "tldr": "", "abstract": "Subgraph matching is the problem of determining the presence and location(s) of a given query graph in a large target graph. \nDespite being an NP-complete problem, the subgraph matching problem is crucial in domains ranging from network science and database systems to biochemistry and cognitive science. \nHowever, existing techniques based on combinatorial matching and integer programming cannot handle matching problems with both large target and query graphs.\nHere we propose NeuroMatch, an accurate, efficient, and robust neural approach to subgraph matching. NeuroMatch decomposes query and target graphs into small subgraphs and embeds them using graph neural networks. Trained to capture geometric constraints corresponding to subgraph relations, NeuroMatch then efficiently performs subgraph matching directly in the embedding space. Experiments demonstrate NeuroMatch is 100x faster than existing combinatorial approaches and 18% more accurate than existing approximate subgraph matching methods.", "keywords": "Graph neural networks;Subgraph matching;Order Embedding", "primary_area": "", "supplementary_material": "/attachment/6346f690d231b9c83474f8c0b212b097f9fb6012.zip", "author": "Zhitao Ying;Andrew Wang;Jiaxuan You;Chengtao Wen;Arquimedes Canedo;Jure Leskovec", "authorids": "~Zhitao_Ying1;anwang@cs.stanford.edu;~Jiaxuan_You2;chengtao.wen@siemens.com;arquimedes.canedo@siemens.com;~Jure_Leskovec1", "gender": "M;;;;;", "homepage": "https://www.cs.yale.edu/homes/ying-rex;;;;;http://cs.stanford.edu/~jure/", "dblp": "209/4936;;;;;l/JureLeskovec", "google_scholar": "6fqNXooAAAAJ;;;;;Q_kKkIUAAAAJ", "orcid": ";;;;;0000-0002-5411-923X", "linkedin": "rex-ying-92770148/;;;;;leskovec/", "or_profile": "~Zhitao_Ying1;anwang@cs.stanford.edu;~Jiaxuan_You2;chengtao.wen@siemens.com;arquimedes.canedo@siemens.com;~Jure_Leskovec1", "aff": "Stanford University;;;;;", "aff_domain": "stanford.edu;;;;;", "position": "PhD student;;;;;", "bibtex": "@misc{\nying2021neural,\ntitle={Neural Subgraph Matching},\nauthor={Zhitao Ying and Andrew Wang and Jiaxuan You and Chengtao Wen and Arquimedes Canedo and Jure Leskovec},\nyear={2021},\nurl={https://openreview.net/forum?id=LMslR3CTzE_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=LMslR3CTzE_", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "5;3;3;5", "wc_review": "521;232;622;399", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "746;312;743;617", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 443.5, 145.4140639690673 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 604.5, 176.7179956880453 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.2294157338705618, "gs_citation": 88, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2338443380827600210&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0", "aff_unique_norm": "Stanford University", "aff_unique_dep": "", "aff_unique_url": "https://www.stanford.edu", "aff_unique_abbr": "Stanford", "aff_campus_unique_index": "0", "aff_campus_unique": "Stanford", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "LNtTXJ9XXr", "title": "Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adversarial training is a commonly used technique to improve model robustness against adversarial examples. Despite its success as a defense mechanism, adversarial training often fails to generalize well to unperturbed test data. While previous work assumes it is caused by the discrepancy between robust and non-robust features, in this paper, we introduce \\emph{Adversarial Masking}, a new hypothesis that this trade-off is caused by different feature maskings applied. Specifically, the rescaling operation in the batch normalization layer, when combined together with ReLU activation, serves as a feature masking layer to select different features for model training. By carefully manipulating different maskings, a well-balanced trade-off can be achieved between model performance on unperturbed and perturbed data. Built upon this hypothesis, we further propose Robust Masking (RobMask), which constructs unique masking for every specific attack perturbation by learning a set of primary adversarial feature maskings. By incorporating different feature maps after the masking, we can distill better features to help model generalization. Sufficiently, adversarial training can be treated as an effective regularizer to achieve better generalization. Experiments on multiple benchmarks demonstrate that RobMask achieves significant improvement on clean test accuracy compared to strong state-of-the-art baselines.", "keywords": "Adversarial Machine Learning;Adversarial Robustness;Adversarial Training;Generalization", "primary_area": "", "supplementary_material": "/attachment/708c4a3e28f25a4548693e7c8305cfdb7e51fcce.zip", "author": "Minhao Cheng;Zhe Gan;Yu Cheng;Shuohang Wang;Cho-Jui Hsieh;Jingjing Liu", "authorids": "~Minhao_Cheng1;~Zhe_Gan1;~Yu_Cheng1;~Shuohang_Wang1;~Cho-Jui_Hsieh1;~Jingjing_Liu2", "gender": "M;M;M;M;M;", "homepage": "https://cmhcbb.github.io/;http://zhegan27.github.io/;https://ych133.github.io;;http://web.cs.ucla.edu/~chohsieh/index.html;https://air.tsinghua.edu.cn/en/info/1046/1194.htm#:~:text=Jingjing%20Liu%20is%20Professor%2C%20Principal,CVPR%2C%20ACL%2C%20etc.)", "dblp": "174/1717;41/7845;96/3060-1.html;173/5469.html;14/2770;30/3008-1", "google_scholar": "_LkC1yoAAAAJ;E64XWyMAAAAJ;https://scholar.google.com/citations?hl=en;mN-IO6wAAAAJ;Wy89g4IAAAAJ;BzJ_GboAAAAJ", "orcid": "0000-0003-3965-4215;;;;;", "linkedin": ";zhe-gan-a2229a78/;chengyu05/;;;jingjing-liu-65703431/", "or_profile": "~Minhao_Cheng1;~Zhe_Gan1;~Yu_Cheng1;~Shuohang_Wang1;~Cho-Jui_Hsieh1;~Jingjing_Liu2", "aff": "University of California, Los Angeles;Microsoft;Microsoft Research;Microsoft;University of California, Los Angeles;Microsoft", "aff_domain": "ucla.edu;microsoft.com;microsoft.com;microsoft.com;ucla.edu;microsoft.com", "position": "PhD student;Principal Researcher;Principal Researcher;Researcher;Assistant Professor;Sr Principal Research Manager", "bibtex": "@misc{\ncheng2021adversarial,\ntitle={Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization},\nauthor={Minhao Cheng and Zhe Gan and Yu Cheng and Shuohang Wang and Cho-Jui Hsieh and Jingjing Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=LNtTXJ9XXr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=LNtTXJ9XXr", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "5;4;2;4", "wc_review": "1333;728;189;513", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1657;318;377;491", "reply_reviewers": "0;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 690.75, 417.4987275429711 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 710.75, 549.845603328789 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.7608859102526822, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:jKG-pgVxfmoJ:scholar.google.com/&scioq=Adversarial+Masking:+Towards+Understanding+Robustness+Trade-off+for+Generalization&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;1;1;0;1", "aff_unique_norm": "University of California, Los Angeles;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "https://www.ucla.edu;https://www.microsoft.com", "aff_unique_abbr": "UCLA;Microsoft", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "LO7tSIUIub", "title": "Correspondence between neuroevolution and gradient descent", "track": "main", "status": "Desk Reject", "tldr": "", "abstract": "We show analytically that training a neural network by stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations. Our results provide a connection between two distinct types of neural-network training, and provide justification for the empirical success of neuroevolution.", "keywords": "neuroevolution;gradient descent;theoretical description of learning", "primary_area": "", "supplementary_material": "", "author": "Anonymous", "authorids": "ICLR.cc/2021/Conference/Paper2025/Authors", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nanonymous2021correspondence,\ntitle={Correspondence between neuroevolution and gradient descent},\nauthor={Anonymous},\nbooktitle={Submitted to International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LO7tSIUIub},\nnote={under review}\n}", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=LO7tSIUIub", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0, "gs_citation": 28, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5110975217820202068&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 14 }, { "id": "LQTsVy-Xli", "title": "Auto-view contrastive learning for few-shot image recognition", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Few-shot learning aims to recognize new classes with few annotated instances within each category. Recently, metric-based meta-learning approaches have shown the superior performance in tackling few-shot learning problems. Despite their success, existing metric-based few-shot approaches often fail to push the fine-grained sub-categories apart in the embedding space given no fine-grained labels. This may result in poor generalization to fine-grained sub-categories, and thus affects model interpretation. To alleviate this problem, we introduce contrastive loss into few-shot classification for learning latent fine-grained structure in the embedding space. Furthermore, to overcome the drawbacks of random image transformation used in current contrastive learning in producing noisy and inaccurate image pairs (i.e., views), we develop a learning-to-learn algorithm to automatically generate different views of the same image. Extensive experiments on standard few-shot learning benchmarks and few-shot fine-grained image classification demonstrate the superiority of our method. ", "keywords": "Few-shot learning;Contrastive learning;Metric-based meta learning", "primary_area": "", "supplementary_material": "", "author": "Xu Luo;Yuxuan Chen;Liangjian Wen;Lili Pan;Zenglin Xu", "authorids": "~Xu_Luo1;~Yuxuan_Chen2;~Liangjian_Wen1;~Lili_Pan2;~Zenglin_Xu1", "gender": "M;M;M;F;M", "homepage": "https://frankluox.github.io/;;;;https://faculty.fudan.edu.cn/xuzenglin/en/index.htm", "dblp": "06/2622-3;;231/7379;60/5610-1;68/1538", "google_scholar": "https://scholar.google.com/citations?hl=en;45GyXBUAAAAJ;jwHflLcAAAAJ;gXpdHzMAAAAJ;gF0H9nEAAAAJ", "orcid": "0000-0001-9827-1244;;0009-0000-3493-6403;;0000-0001-5550-6461", "linkedin": ";;;;", "or_profile": "~Xu_Luo1;~Yuxuan_Chen2;~Liangjian_Wen1;~Lili_Pan2;~Zenglin_Xu1", "aff": "Huawei Technologies Ltd.;;Huawei Noah\u2019s Ark ;University of Electronic Science and Technology of China;Harbin Institute of Technology Shenzhen", "aff_domain": "huawei.com;;huawei.com;uestc.edu.cn;hit.edu.cn", "position": "Intern;;Researcher;Associate Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=LQTsVy-Xli", "pdf_size": 0, "rating": "4;4;5;7", "confidence": "4;4;4;3", "wc_review": "275;229;148;117", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 192.25, 62.88630614052634 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.9428090415820632, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:1-9Y8PMipywJ:scholar.google.com/&scioq=Auto-view+contrastive+learning+for+few-shot+image+recognition&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "Huawei;University of Electronic Science and Technology of China;Harbin Institute of Technology", "aff_unique_dep": "Huawei Technologies;;", "aff_unique_url": "https://www.huawei.com;https://www.uestc.edu.cn;https://www.hit.edu.cn/", "aff_unique_abbr": "Huawei;UESTC;HIT", "aff_campus_unique_index": "1", "aff_campus_unique": ";Shenzhen", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "title": "Emergent Symbols through Binding in External Memory", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2965", "id": "LSFCEb3GYU7", "poster": "", "openreview": "https://openreview.net/forum?id=LSFCEb3GYU7", "slides": "https://iclr.cc/virtual/2021/poster/2965", "video": "https://iclr.cc/virtual/2021/poster/2965", "author_site": "Taylor Webb, Ishan Sinha, Jonathan Cohen", "tldr": "", "abstract": "A key aspect of human intelligence is the ability to infer abstract rules directly from high-dimensional sensory data, and to do so given only a limited amount of training experience. Deep neural network algorithms have proven to be a powerful tool for learning directly from high-dimensional data, but currently lack this capacity for data-efficient induction of abstract rules, leading some to argue that symbol-processing mechanisms will be necessary to account for this capacity. In this work, we take a step toward bridging this gap by introducing the Emergent Symbol Binding Network (ESBN), a recurrent network augmented with an external memory that enables a form of variable-binding and indirection. This binding mechanism allows symbol-like representations to emerge through the learning process without the need to explicitly incorporate symbol-processing machinery, enabling the ESBN to learn rules in a manner that is abstracted away from the particular entities to which those rules apply. Across a series of tasks, we show that this architecture displays nearly perfect generalization of learned rules to novel entities given only a limited number of training examples, and outperforms a number of other competitive neural network architectures.", "keywords": "abstract rules;out-of-distribution generalization;external memory;indirection;variable binding", "primary_area": "", "supplementary_material": "", "author": "Taylor Whittington Webb;Ishan Sinha;Jonathan Cohen", "authorids": "~Taylor_Whittington_Webb1;sinha.ishan@gmail.com;~Jonathan_Cohen1", "gender": "M;;M", "homepage": "https://scholar.google.com/citations?user=WCmrJoQAAAAJ&hl=en;;https://jdc.princeton.edu", "dblp": "183/6144;;31/5509-3", "google_scholar": "WCmrJoQAAAAJ;;https://scholar.google.com.tw/citations?user=NCkkQAMAAAAJ", "orcid": ";;0000-0003-2316-0763", "linkedin": ";;", "or_profile": "~Taylor_Whittington_Webb1;sinha.ishan@gmail.com;~Jonathan_Cohen1", "aff": "University of California, Los Angeles;;Princeton University", "aff_domain": "ucla.edu;;princeton.edu", "position": "Postdoc;;Full Professor", "bibtex": "@inproceedings{\nwebb2021emergent,\ntitle={Emergent Symbols through Binding in External Memory},\nauthor={Taylor Whittington Webb and Ishan Sinha and Jonathan Cohen},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LSFCEb3GYU7}\n}", "github": "[![github](/images/github_icon.svg) taylorwwebb/emergent_symbols](https://github.com/taylorwwebb/emergent_symbols) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=LSFCEb3GYU7)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;4;4", "wc_review": "569;607;443;430", "wc_reply_reviewers": "166;0;0;0", "wc_reply_authors": "1410;210;821;585", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;2;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 512.25, 77.06936810432533 ], "wc_reply_reviewers_avg": [ 41.5, 71.88010851410841 ], "wc_reply_authors_avg": [ 756.5, 435.6882486365681 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 81, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6169432592073428363&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=LSFCEb3GYU7", "email": "ucla.edu;;princeton.edu", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of California, Los Angeles;Princeton University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ucla.edu;https://www.princeton.edu", "aff_unique_abbr": "UCLA;Princeton", "aff_campus_unique_index": "0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "LT0KSFnQDWF", "title": "Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting", "track": "main", "status": "Reject", "tldr": "", "abstract": "While Graph Neural Networks (GNNs) have achieved remarkable results in a variety of applications, recent studies exposed important shortcomings in their ability to capture the structure of the underlying graph. It has been shown that the expressive power of standard GNNs is bounded by the Weisfeiler-Lehman (WL) graph isomorphism test, from which they inherit proven limitations such as the inability to detect and count graph substructures. On the other hand, there is significant empirical evidence, e.g. in network science and bioinformatics, that substructures are often informative for downstream tasks, suggesting that it is desirable to design GNNs capable of leveraging this important source of information. To this end, we propose a novel topologically-aware message passing scheme based on substructure encoding. We show that our architecture allows incorporating domain-specific inductive biases and that it is strictly more expressive than the WL test. Importantly, in contrast to recent works on the expressivity of GNNs, we do not attempt to adhere to the WL hierarchy; this allows us to retain multiple attractive properties of standard GNNs such as locality and linear network complexity, while being able to disambiguate even hard instances of graph isomorphism. We extensively evaluate our method on graph classification and regression tasks and show state-of-the-art results on multiple datasets including molecular graphs and social networks.", "keywords": "graph neural networks;graph representation learning;network analysis;network motifs;subgraph isomoprhism", "primary_area": "", "supplementary_material": "/attachment/ecf737279553dc1b43294da62adc15eab22fdb39.zip", "author": "Giorgos Bouritsas;Fabrizio Frasca;Stefanos Zafeiriou;Michael M. Bronstein", "authorids": "~Giorgos_Bouritsas1;~Fabrizio_Frasca1;~Stefanos_Zafeiriou1;~Michael_M._Bronstein1", "gender": ";M;M;M", "homepage": "http://users.uoa.gr/~gbouritsas/;https://noired.github.io;http://www.imperial.ac.uk/people/s.zafeiriou/;http://www.inf.usi.ch/bronstein/", "dblp": "190/1675;228/1840;25/1885.html;07/2668", "google_scholar": "eNUJDXUAAAAJ;PT2CDA4AAAAJ;QKOH5iYAAAAJ;UU3N6-UAAAAJ", "orcid": "0000-0002-8476-4918;0000-0002-5165-1394;;", "linkedin": "giorgos-bouritsas;;;mbronstein/", "or_profile": "~Giorgos_Bouritsas1;~Fabrizio_Frasca1;~Stefanos_Zafeiriou1;~Michael_M._Bronstein1", "aff": "EPFL - EPF Lausanne;Imperial College London;Imperial College London;Imperial College London", "aff_domain": "epfl.ch;imperial.ac.uk;ic.ac.uk;imperial.ac.uk", "position": "Intern;PhD student;Full Professor;Professor", "bibtex": "@misc{\nbouritsas2021improving,\ntitle={Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting},\nauthor={Giorgos Bouritsas and Fabrizio Frasca and Stefanos Zafeiriou and Michael M. Bronstein},\nyear={2021},\nurl={https://openreview.net/forum?id=LT0KSFnQDWF}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=LT0KSFnQDWF", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "5;4;5;4", "wc_review": "263;349;459;256", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1125;714;1223;387", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;2;1", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 331.75, 82.0895090739371 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 862.25, 334.2973040573316 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.4472135954999579, "gs_citation": 555, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4001320536989009260&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "EPFL;Imperial College London", "aff_unique_dep": ";", "aff_unique_url": "https://www.epfl.ch;https://www.imperial.ac.uk", "aff_unique_abbr": "EPFL;ICL", "aff_campus_unique_index": "0", "aff_campus_unique": "Lausanne;", "aff_country_unique_index": "0;1;1;1", "aff_country_unique": "Switzerland;United Kingdom" }, { "title": "Proximal Gradient Descent-Ascent: Variable Convergence under K\u0141 Geometry", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2697", "id": "LVotkZmYyDi", "poster": "", "openreview": "https://openreview.net/forum?id=LVotkZmYyDi", "slides": "https://iclr.cc/virtual/2021/poster/2697", "video": "https://iclr.cc/virtual/2021/poster/2697", "author_site": "Ziyi Chen, Yi Zhou, Tengyu Xu, Yingbin Liang", "tldr": "", "abstract": "The gradient descent-ascent (GDA) algorithm has been widely applied to solve minimax optimization problems. In order to achieve convergent policy parameters for minimax optimization, it is important that GDA generates convergent variable sequences rather than convergent sequences of function value or gradient norm. However, the variable convergence of GDA has been proved only under convexity geometries, and it is lack of understanding in general nonconvex minimax optimization. This paper fills such a gap by studying the convergence of a more general proximal-GDA for regularized nonconvex-strongly-concave minimax optimization. Specifically, we show that proximal-GDA admits a novel Lyapunov function, which monotonically decreases in the minimax optimization process and drives the variable sequences to a critical point. By leveraging this Lyapunov function and the KL geometry that parameterizes the local geometries of general nonconvex functions, we formally establish the variable convergence of proximal-GDA to a certain critical point $x^*$, i.e., $x_t\\to x^*, y_t\\to y^*(x^*)$. Furthermore, over the full spectrum of the KL-parameterized geometry, we show that proximal-GDA achieves different types of convergence rates ranging from sublinear convergence up to finite-step convergence, depending on the geometry associated with the KL parameter. This is the first theoretical result on the variable convergence for nonconvex minimax optimization. ", "keywords": "Kurdyka-\u0141ojasiewicz geometry;minimax;nonconvex;proximal gradient descent-ascent;variable convergence", "primary_area": "", "supplementary_material": "", "author": "Ziyi Chen;Yi Zhou;Tengyu Xu;Yingbin Liang", "authorids": "~Ziyi_Chen2;~Yi_Zhou2;~Tengyu_Xu1;~Yingbin_Liang1", "gender": "M;M;;F", "homepage": ";https://sites.google.com/site/yizhouhomepage/home;;https://sites.google.com/view/yingbinliang/home", "dblp": "37/1439-2;;;51/332", "google_scholar": "zjSBVOIAAAAJ;4fK8bYIAAAAJ;;lGgLAiIAAAAJ", "orcid": ";;;", "linkedin": "ziyi-chen-84616184/;;;", "or_profile": "~Ziyi_Chen2;~Yi_Zhou2;~Tengyu_Xu1;~Yingbin_Liang1", "aff": "University of Utah;University of Utah;;The Ohio State University", "aff_domain": "utah.edu;utah.edu;;osu.edu", "position": "PhD student;Assistant Professor;;Professor", "bibtex": "@inproceedings{\nchen2021proximal,\ntitle={Proximal Gradient Descent-Ascent: Variable Convergence under K{\\L} Geometry},\nauthor={Ziyi Chen and Yi Zhou and Tengyu Xu and Yingbin Liang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LVotkZmYyDi}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "5;7;8;8", "confidence": "4;3;5;4", "wc_review": "534;707;356;287", "wc_reply_reviewers": "0;111;0;0", "wc_reply_authors": "769;463;127;142", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 471.0, 163.3600318315346 ], "wc_reply_reviewers_avg": [ 27.75, 48.064409910036346 ], "wc_reply_authors_avg": [ 375.25, 263.99467324171525 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.28867513459481287, "gs_citation": 37, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17819071836422478120&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=LVotkZmYyDi", "email": "utah.edu;utah.edu;;osu.edu", "author_num": 4, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of Utah;Ohio State University", "aff_unique_dep": ";", "aff_unique_url": "https://www.utah.edu;https://www.osu.edu", "aff_unique_abbr": "Utah;OSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3031", "id": "LXMSvPmsm0g", "poster": "", "openreview": "https://openreview.net/forum?id=LXMSvPmsm0g", "slides": "https://iclr.cc/virtual/2021/poster/3031", "video": "https://iclr.cc/virtual/2021/poster/3031", "author_site": "Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, Zhangyang Wang", "tldr": "", "abstract": "The lottery ticket hypothesis states that a highly sparsified sub-network can be trained in isolation, given the appropriate weight initialization. This paper extends that hypothesis from one-shot task learning, and demonstrates for the first time that such extremely compact and independently trainable sub-networks can be also identified in the lifelong learning scenario, which we call lifelong tickets. We show that the resulting lifelong ticket can further be leveraged to improve the performance of learning over continual tasks. However, it is highly non-trivial to conduct network pruning in the lifelong setting. Two critical roadblocks arise: i) As many tasks now arrive sequentially, finding tickets in a greedy weight pruning fashion will inevitably suffer from the intrinsic bias, that the earlier emerging tasks impact more; ii) As lifelong learning is consistently challenged by catastrophic forgetting, the compact network capacity of tickets might amplify the risk of forgetting. In view of those, we introduce two pruning options, e.g., top-down and bottom-up, for finding lifelong tickets. Compared to the top-down pruning that extends vanilla (iterative) pruning over sequential tasks, we show that the bottom-up one, which can dynamically shrink and (re-)expand model capacity, effectively avoids the undesirable excessive pruning in the early stage. We additionally introduce lottery teaching that further overcomes forgetting via knowledge distillation aided by external unlabeled data. Unifying those ingredients, we demonstrate the existence of very competitive lifelong tickets, e.g., achieving 3-8% of the dense model size with even higher accuracy, compared to strong class-incremental learning baselines on CIFAR-10/CIFAR-100/Tiny-ImageNet datasets. Codes available at https://github.com/VITA-Group/Lifelong-Learning-LTH.", "keywords": "lottery tickets;winning tickets;lifelong learning", "primary_area": "", "supplementary_material": "/attachment/158a5ed465b4ec20e83600b104905bb303919713.zip", "author": "Tianlong Chen;Zhenyu Zhang;Sijia Liu;Shiyu Chang;Zhangyang Wang", "authorids": "~Tianlong_Chen1;~Zhenyu_Zhang4;~Sijia_Liu1;~Shiyu_Chang2;~Zhangyang_Wang1", "gender": "M;M;M;Unspecified;M", "homepage": "https://tianlong-chen.github.io;https://zhenyu.gallery;https://lsjxjtu.github.io/;http://people.csail.mit.edu/chang87/;https://vita-group.github.io", "dblp": ";01/1844-15;128/6972-1;28/9988;119/4026", "google_scholar": "LE3ctn0AAAAJ;ZLyJRxoAAAAJ;C7dO_UgAAAAJ;r21asW4AAAAJ;pxFyKAIAAAAJ", "orcid": "0000-0001-7774-8197;;;;", "linkedin": "tianlong-chen-783862167/;zhenyu-allen-zhang-a9b1391a3/;;;", "or_profile": "~Tianlong_Chen1;~Zhenyu_Zhang4;~Sijia_Liu1;~Shiyu_Chang2;~Zhangyang_Wang1", "aff": "University of Texas, Austin;University of Science and Technology of China;Michigan State University;International Business Machines;University of Texas, Austin", "aff_domain": "utexas.edu;ustc.edu;msu.edu;ibm.com;utexas.edu", "position": "PhD student;MS student;Assistant Professor;Researcher;Assistant Professor", "bibtex": "@inproceedings{\nchen2021long,\ntitle={Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning},\nauthor={Tianlong Chen and Zhenyu Zhang and Sijia Liu and Shiyu Chang and Zhangyang Wang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LXMSvPmsm0g}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer6;AnonReviewer3", "pdf_size": 0, "rating": "5;7;8", "confidence": "3;4;3", "wc_review": "286;871;658", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "825;745;670", "reply_reviewers": "0;0;0", "reply_authors": "2;2;1", "rating_avg": [ 6.666666666666667, 1.247219128924647 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 605.0, 241.74780247191492 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 746.6666666666666, 63.28945848682508 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.18898223650461363, "gs_citation": 83, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14841002616571717879&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=LXMSvPmsm0g", "email": "utexas.edu;ustc.edu;msu.edu;ibm.com;utexas.edu", "author_num": 5, "aff_unique_index": "0;1;2;3;0", "aff_unique_norm": "University of Texas at Austin;University of Science and Technology of China;Michigan State University;International Business Machines Corporation", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.utexas.edu;http://www.ustc.edu.cn;https://www.msu.edu;https://www.ibm.com", "aff_unique_abbr": "UT Austin;USTC;MSU;IBM", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Austin;", "aff_country_unique_index": "0;1;0;0;0", "aff_country_unique": "United States;China" }, { "title": "Fast And Slow Learning Of Recurrent Independent Mechanisms", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2822", "id": "Lc28QAB4ypz", "poster": "", "openreview": "https://openreview.net/forum?id=Lc28QAB4ypz", "slides": "https://iclr.cc/virtual/2021/poster/2822", "video": "https://iclr.cc/virtual/2021/poster/2822", "author_site": "Kanika Madan, Nan Rosemary Ke, Anirudh Goyal, Bernhard Schoelkopf, Yoshua Bengio", "tldr": "", "abstract": "Decomposing knowledge into interchangeable pieces promises a generalization advantage when there are changes in distribution. A learning agent interacting with its environment is likely to be faced with situations requiring novel combinations of existing pieces of knowledge. We hypothesize that such a decomposition of knowledge is particularly relevant for being able to generalize in a systematic way to out-of-distribution changes. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs and its reward function are stationary and can be re-used across tasks. An attention mechanism dynamically selects which modules can be adapted to the current task, and the parameters of the \\textit{selected} modules are allowed to change quickly as the learner is confronted with variations in what it experiences, while the parameters of the attention mechanisms act as stable, slowly changing, meta-parameters. We focus on pieces of knowledge captured by an ensemble of modules sparsely communicating with each other via a bottleneck of attention. We find that meta-learning the modular aspects of the proposed system greatly helps in achieving faster adaptation in a reinforcement learning setup involving navigation in a partially observed grid world with image-level input. We also find that reversing the role of parameters and meta-parameters does not work nearly as well, suggesting a particular role for fast adaptation of the dynamically selected modules.", "keywords": "modular representations;better generalization;learning mechanisms", "primary_area": "", "supplementary_material": "/attachment/090355a0e7cae0d35fc0d83417ed761068075971.zip", "author": "Kanika Madan;Nan Rosemary Ke;Anirudh Goyal;Bernhard Sch\u00f6lkopf;Yoshua Bengio", "authorids": "~Kanika_Madan3;~Nan_Rosemary_Ke1;~Anirudh_Goyal1;~Bernhard_Sch\u00f6lkopf1;~Yoshua_Bengio1", "gender": ";F;M;;M", "homepage": ";https://nke001.github.io/;https://anirudh9119.github.io/;;http://yoshuabengio.org", "dblp": ";120/5291;172/1039;;56/953", "google_scholar": ";https://scholar.google.ca/citations?user=dxwPYhQAAAAJ;krrh6OUAAAAJ;;kukA0LcAAAAJ", "orcid": ";;;;", "linkedin": ";;;;yoshuabengio/?originalSubdomain=ca", "or_profile": "~Kanika_Madan3;~Nan_Rosemary_Ke1;~Anirudh_Goyal1;~Bernhard_Sch\u00f6lkopf1;~Yoshua_Bengio1", "aff": ";Mila;University of Montreal;;University of Montreal", "aff_domain": ";mila.quebec;umontreal.ca;;umontreal.ca", "position": ";PhD student;PhD student;;Full Professor", "bibtex": "@inproceedings{\nmadan2021fast,\ntitle={Fast And Slow Learning Of Recurrent Independent Mechanisms},\nauthor={Kanika Madan and Nan Rosemary Ke and Anirudh Goyal and Bernhard Sch{\\\"o}lkopf and Yoshua Bengio},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Lc28QAB4ypz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer5", "pdf_size": 0, "rating": "5;7;7;7", "confidence": "3;5;3;4", "wc_review": "329;821;476;723", "wc_reply_reviewers": "0;128;134;0", "wc_reply_authors": "1396;1214;1589;807", "reply_reviewers": "0;1;1;0", "reply_authors": "4;3;4;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 587.25, 195.02355626949273 ], "wc_reply_reviewers_avg": [ 65.5, 65.5343421421166 ], "wc_reply_authors_avg": [ 1251.5, 288.86545310922867 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 3.0, 1.224744871391589 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 55, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12812547036167110347&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=Lc28QAB4ypz", "email": ";mila.quebec;umontreal.ca;;umontreal.ca", "author_num": 5, "aff_unique_index": "0;1;1", "aff_unique_norm": "Mila;University of Montreal", "aff_unique_dep": "Quebec Artificial Intelligence Institute;", "aff_unique_url": "https://mila.quebec;https://wwwumontreal.ca", "aff_unique_abbr": "Mila;UM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Canada" }, { "id": "LcPefbNSwx_", "title": "Factor Normalization for Deep Neural Network Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep neural network (DNN) models often involve features of high dimensions. In most cases, the high-dimensional features can be decomposed into two parts. The first part is a low-dimensional factor. The second part is the residual feature, with much-reduced variability and inter-feature correlation. This leads to a number of interesting theoretical findings for deep neural network training. Accordingly, we are inspired to develop a new factor normalization method for better performance. The proposed method leads to a new deep learning model with two important features. First, it allows factor related feature extraction. Second, it allows adaptive learning rates for factors and residuals, respectively. This leads to fast convergence speed on both training and validation datsets. A number of empirical experiments are presented to demonstrate its superior performance. The code is available at https://github.com/HazardNeo4869/FactorNormalization", "keywords": "factor normalization;ultrahigh dimensional features;adaptive learning rate;factor decomposition", "primary_area": "", "supplementary_material": "/attachment/dcdde06437a430fa26dc10408b2ddda84820657c.zip", "author": "Haobo Qi;Jing Zhou;Hansheng Wang", "authorids": "qihaobo_gsm@pku.edu.cn;~Jing_Zhou3;hansheng@pku.edu.cn", "gender": ";F;", "homepage": ";http://stat.ruc.edu.cn/jxtd/jsdw/jjshtjx/db27e9fe838941e791d850927d7a9e8e.htm;", "dblp": ";;", "google_scholar": ";;", "orcid": ";0000-0002-1099-7612;", "linkedin": ";%E9%9D%99-%E5%91%A8-792a08200/;", "or_profile": "qihaobo_gsm@pku.edu.cn;~Jing_Zhou3;hansheng@pku.edu.cn", "aff": ";Renmin University of China;", "aff_domain": ";ruc.edu.cn;", "position": ";Associate Professor;", "bibtex": "@misc{\nqi2021factor,\ntitle={Factor Normalization for Deep Neural Network Models},\nauthor={Haobo Qi and Jing Zhou and Hansheng Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=LcPefbNSwx_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=LcPefbNSwx_", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "3;3;3;4", "wc_review": "254;618;133;898", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 475.75, 302.15093496462987 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:aou0CMDDaUUJ:scholar.google.com/&scioq=Factor+Normalization+for+Deep+Neural+Network+Models&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Renmin University of China", "aff_unique_dep": "", "aff_unique_url": "http://www.ruc.edu.cn", "aff_unique_abbr": "RUC", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "title": "Learning from Demonstration with Weakly Supervised Disentanglement", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3178", "id": "Ldau9eHU-qO", "poster": "", "openreview": "https://openreview.net/forum?id=Ldau9eHU-qO", "slides": "https://iclr.cc/virtual/2021/poster/3178", "video": "https://iclr.cc/virtual/2021/poster/3178", "author_site": "Yordan Hristov, Subramanian Ramamoorthy", "tldr": "", "abstract": "Robotic manipulation tasks, such as wiping with a soft sponge, require control from multiple rich sensory modalities. Human-robot interaction, aimed at teach- ing robots, is difficult in this setting as there is potential for mismatch between human and machine comprehension of the rich data streams. We treat the task of interpretable learning from demonstration as an optimisation problem over a probabilistic generative model. To account for the high-dimensionality of the data, a high-capacity neural network is chosen to represent the model. The latent variables in this model are explicitly aligned with high-level notions and concepts that are manifested in a set of demonstrations. We show that such alignment is best achieved through the use of labels from the end user, in an appropriately restricted vocabulary, in contrast to the conventional approach of the designer picking a prior over the latent variables. Our approach is evaluated in the context of two table-top robot manipulation tasks performed by a PR2 robot \u2013 that of dabbing liquids with a sponge (forcefully pressing a sponge and moving it along a surface) and pouring between different containers. The robot provides visual information, arm joint positions and arm joint efforts. We have made videos of the tasks and data available - see supplementary materials at: https://sites.google.com/view/weak-label-lfd.", "keywords": "representation learning for robotics;physical symbol grounding;semi-supervised learning", "primary_area": "", "supplementary_material": "/attachment/39868fdd1ca17a6a10f6c0e6729b6f8b277cc279.zip", "author": "Yordan Hristov;Subramanian Ramamoorthy", "authorids": "~Yordan_Hristov1;~Subramanian_Ramamoorthy1", "gender": ";M", "homepage": "https://yordanh.github.io/;http://rad.inf.ed.ac.uk/", "dblp": "202/1717;97/5598", "google_scholar": ";K_v3RvMAAAAJ", "orcid": ";0000-0002-6300-5103", "linkedin": ";subramanian-ramamoorthy-9650595/", "or_profile": "~Yordan_Hristov1;~Subramanian_Ramamoorthy1", "aff": "University of Edinburgh;Edinburgh University, University of Edinburgh", "aff_domain": "ed.ac.uk;inf.ed.ac.uk", "position": "PhD student;Full Professor", "bibtex": "@inproceedings{\nhristov2021learning,\ntitle={Learning from Demonstration with Weakly Supervised Disentanglement},\nauthor={Yordan Hristov and Subramanian Ramamoorthy},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ldau9eHU-qO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "5;7;7", "confidence": "4;4;3", "wc_review": "535;348;473", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "558;353;275", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 452.0, 77.77317446695015 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 395.3333333333333, 119.3491609615343 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.49999999999999983, "gs_citation": 12, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3791116520748802118&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Ldau9eHU-qO", "email": "ed.ac.uk;inf.ed.ac.uk", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Edinburgh", "aff_unique_dep": "", "aff_unique_url": "https://www.ed.ac.uk", "aff_unique_abbr": "Edinburgh", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "LgqmtA-wzXi", "title": "Solving Non-Stationary Bandit Problems with an RNN and an Energy Minimization Loss", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We consider a Multi-Armed Bandit problem in which the rewards are non-stationary and are dependent on past actions and potentially on past contexts. At the heart of our method, we employ a recurrent neural network, which models these sequences. \nIn order to balance between exploration and exploitation, we present an energy minimization term that prevents the neural network from becoming too confident in support of a certain action. This term provably limits the gap between the maximal and minimal probabilities assigned by the network. In a diverse set of experiments, we demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits, can solve intuitive extensions of various benchmark problems, and is effective in a real-world recommendation system scenario.", "keywords": "Recurrent Neural Networks;Mutli Arm-Bandits", "primary_area": "", "supplementary_material": "/attachment/fde026b9250e384e732ff48f6fedc3b0cb8bc0f1.zip", "author": "Michael Rotman;Lior Wolf", "authorids": "~Michael_Rotman1;~Lior_Wolf1", "gender": ";M", "homepage": "https://rotmanmichael.com;http://www.cs.tau.ac.il/~wolf", "dblp": "217/3007;83/4103", "google_scholar": "tzlpNi8AAAAJ;UbFrXTsAAAAJ", "orcid": ";0000-0001-5578-8892", "linkedin": ";", "or_profile": "~Michael_Rotman1;~Lior_Wolf1", "aff": "General Electric;Tel Aviv University", "aff_domain": "ge.com;tau.ac.il", "position": "Researcher;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=LgqmtA-wzXi", "pdf_size": 0, "rating": "2;3;4;5", "confidence": "5;4;3;3", "wc_review": "698;672;526;363", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 1.118033988749895 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 564.75, 133.66258825864475 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9438798074485388, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ghjOrn9GuH0J:scholar.google.com/&scioq=Solving+Non-Stationary+Bandit+Problems+with+an+RNN+and+an+Energy+Minimization+Loss&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "General Electric;Tel Aviv University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ge.com;https://www.tau.ac.il", "aff_unique_abbr": "GE;TAU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Israel" }, { "id": "LhAqAxwH5cn", "title": "Robust Loss Functions for Complementary Labels Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "In ordinary-label learning, the correct label is given to each training sample. Similarly, a complementary label is also provided for each training sample in complementary-label learning. A complementary label indicates a class that the example does not belong to. Robust learning of classifiers has been investigated from many viewpoints under label noise, but little attention has been paid to complementary-label learning. In this paper, we present a new algorithm of complementary-label learning with the robustness of loss function. We also provide two sufficient conditions on a loss function so that the minimizer of the risk for complementary labels is theoretically guaranteed to be consistent with the minimizer of the risk for ordinary labels. Finally, the empirical results validate our method\u2019s superiority to current state-of-the-art techniques. Especially in cifar10, our algorithm achieves a much higher test accuracy than the gradient ascent algorithm, and the parameters of our model are less than half of the ResNet-34 they used.", "keywords": "Complementary Labels;Robustness;Machine Learning", "primary_area": "", "supplementary_material": "", "author": "Defu Liu;Guowu Yang", "authorids": "~Defu_Liu1;guowu@uestc.edu.cn", "gender": "M;", "homepage": ";", "dblp": ";", "google_scholar": "fYC9G_EAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Defu_Liu1;guowu@uestc.edu.cn", "aff": "University of Electronic Science and Technology of China;", "aff_domain": "uestc.edu.cn;", "position": "Postdoc;", "bibtex": "@misc{\nliu2021robust,\ntitle={Robust Loss Functions for Complementary Labels Learning},\nauthor={Defu Liu and Guowu Yang},\nyear={2021},\nurl={https://openreview.net/forum?id=LhAqAxwH5cn}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=LhAqAxwH5cn", "pdf_size": 0, "rating": "3;5;7;7", "confidence": "4;4;3;2", "wc_review": "573;312;195;234", "wc_reply_reviewers": "0;47;0;0", "wc_reply_authors": "389;229;160;216", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.6583123951777 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 328.5, 147.31344134192236 ], "wc_reply_reviewers_avg": [ 11.75, 20.351596988934308 ], "wc_reply_authors_avg": [ 248.5, 85.16014325962585 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8181818181818182, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:rS37kndhhBIJ:scholar.google.com/&scioq=Robust+Loss+Functions+for+Complementary+Labels+Learning&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Electronic Science and Technology of China", "aff_unique_dep": "", "aff_unique_url": "https://www.uestc.edu.cn", "aff_unique_abbr": "UESTC", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "title": "Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2907", "id": "LhY8QdUGSuw", "poster": "", "openreview": "https://openreview.net/forum?id=LhY8QdUGSuw", "slides": "https://iclr.cc/virtual/2021/poster/2907", "video": "https://iclr.cc/virtual/2021/poster/2907", "author_site": "Vinay Ramasesh, Ethan Dyer, Maithra Raghu", "tldr": "", "abstract": "Catastrophic forgetting is a recurring challenge to developing versatile deep learning models. Despite its ubiquity, there is limited understanding of its connections to neural network (hidden) representations and task semantics. In this paper, we address this important knowledge gap. Through quantitative analysis of neural representations, we find that deeper layers are disproportionately responsible for forgetting, with sequential training resulting in an erasure of earlier task representational subspaces. Methods to mitigate forgetting stabilize these deeper layers, but show diversity on precise effects, with some increasing feature reuse while others store task representations orthogonally, preventing interference. These insights also enable the development of an analytic argument and empirical picture relating forgetting to task semantic similarity, where we find that maximal forgetting occurs for task sequences with intermediate similarity.", "keywords": "Catastrophic forgetting;continual learning;representation analysis;representation learning", "primary_area": "", "supplementary_material": "/attachment/48dc49eb873201ad44237ead36a1d2a451cc8a53.zip", "author": "Vinay Venkatesh Ramasesh;Ethan Dyer;Maithra Raghu", "authorids": "~Vinay_Venkatesh_Ramasesh1;~Ethan_Dyer1;~Maithra_Raghu1", "gender": "M;M;F", "homepage": "http://ramasesh.github.io;;http://maithraraghu.com/", "dblp": ";;", "google_scholar": ";;tiE4g64AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Vinay_Venkatesh_Ramasesh1;~Ethan_Dyer1;~Maithra_Raghu1", "aff": ";Google;Google Brain", "aff_domain": ";google.com;cornell.edu", "position": ";Staff;Senior Research Scientist", "bibtex": "@inproceedings{\nramasesh2021anatomy,\ntitle={Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics},\nauthor={Vinay Venkatesh Ramasesh and Ethan Dyer and Maithra Raghu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LhY8QdUGSuw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "5;4;4;3", "wc_review": "1093;335;568;563", "wc_reply_reviewers": "0;0;0;161", "wc_reply_authors": "551;569;353;1016", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;3", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 639.75, 278.0947455454705 ], "wc_reply_reviewers_avg": [ 40.25, 69.71504500464731 ], "wc_reply_authors_avg": [ 622.25, 242.61427719736528 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.7071067811865475, "gs_citation": 211, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18149737172198321521&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=LhY8QdUGSuw", "email": ";google.com;cornell.edu", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3231", "id": "LiX3ECzDPHZ", "poster": "", "openreview": "https://openreview.net/forum?id=LiX3ECzDPHZ", "slides": "https://iclr.cc/virtual/2021/poster/3231", "video": "https://iclr.cc/virtual/2021/poster/3231", "author_site": "Jensen Gao, Siddharth Reddy, Glen Berseth, Nick Hardy, Nikhilesh Natraj, Karunesh Ganguly, Anca Dragan, Sergey Levine", "tldr": "", "abstract": "We aim to help users communicate their intent to machines using flexible, adaptive interfaces that translate arbitrary user input into desired actions. In this work, we focus on assistive typing applications in which a user cannot operate a keyboard, but can instead supply other inputs, such as webcam images that capture eye gaze or neural activity measured by a brain implant. Standard methods train a model on a fixed dataset of user inputs, then deploy a static interface that does not learn from its mistakes; in part, because extracting an error signal from user behavior can be challenging. We investigate a simple idea that would enable such interfaces to improve over time, with minimal additional effort from the user: online learning from user feedback on the accuracy of the interface's actions. In the typing domain, we leverage backspaces as feedback that the interface did not perform the desired action. We propose an algorithm called x-to-text (X2T) that trains a predictive model of this feedback signal, and uses this model to fine-tune any existing, default interface for translating user input into actions that select words or characters. We evaluate X2T through a small-scale online user study with 12 participants who type sentences by gazing at their desired words, a large-scale observational study on handwriting samples from 60 users, and a pilot study with one participant using an electrocorticography-based brain-computer interface. The results show that X2T learns to outperform a non-adaptive default interface, stimulates user co-adaptation to the interface, personalizes the interface to individual users, and can leverage offline data collected from the default interface to improve its initial performance and accelerate online learning.", "keywords": "reinforcement learning;human-computer interaction", "primary_area": "", "supplementary_material": "", "author": "Jensen Gao;Siddharth Reddy;Glen Berseth;Nicholas Hardy;Nikhilesh Natraj;Karunesh Ganguly;Anca Dragan;Sergey Levine", "authorids": "jenseng@berkeley.edu;~Siddharth_Reddy1;~Glen_Berseth1;nhardy01@gmail.com;nikhilesh.natraj@ucsf.edu;karunesh.ganguly@ucsf.edu;~Anca_Dragan1;~Sergey_Levine1", "gender": ";M;M;;;;F;M", "homepage": ";https://people.eecs.berkeley.edu/~reddy/;http://fracturedplane.com/;;;;http://www.ancadragan.com/;https://people.eecs.berkeley.edu/~svlevine/", "dblp": ";176/5053;147/5478;;;;;80/7594", "google_scholar": ";7GSWYLQAAAAJ;https://scholar.google.ca/citations?user=-WZcuuwAAAAJ;;;;;8R35rCwAAAAJ", "orcid": ";;0000-0001-7351-8028;;;;;", "linkedin": ";;glen-berseth-0523278b?trk=hp-identity-name;;;;;", "or_profile": "jenseng@berkeley.edu;~Siddharth_Reddy1;~Glen_Berseth1;nhardy01@gmail.com;nikhilesh.natraj@ucsf.edu;karunesh.ganguly@ucsf.edu;~Anca_Dragan1;~Sergey_Levine1", "aff": ";University of California, Berkeley;University of California, Berkeley;;;;University of California, Berkeley;Google", "aff_domain": ";berkeley.edu;berkeley.edu;;;;berkeley.edu;google.com", "position": ";PhD student;Postdoc;;;;Associate Professor;Research Scientist", "bibtex": "@inproceedings{\ngao2021xt,\ntitle={X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback},\nauthor={Jensen Gao and Siddharth Reddy and Glen Berseth and Nicholas Hardy and Nikhilesh Natraj and Karunesh Ganguly and Anca Dragan and Sergey Levine},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LiX3ECzDPHZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "4;7;8", "confidence": "4;4;4", "wc_review": "298;676;897", "wc_reply_reviewers": "576;82;259", "wc_reply_authors": "1240;485;356", "reply_reviewers": "3;1;1", "reply_authors": "5;2;2", "rating_avg": [ 6.333333333333333, 1.699673171197595 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 623.6666666666666, 247.32479769638053 ], "wc_reply_reviewers_avg": [ 305.6666666666667, 204.356442412652 ], "wc_reply_authors_avg": [ 693.6666666666666, 389.88915803796795 ], "reply_reviewers_avg": [ 1.6666666666666667, 0.9428090415820634 ], "reply_authors_avg": [ 3.0, 1.4142135623730951 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10242488567331410860&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=LiX3ECzDPHZ", "email": ";berkeley.edu;berkeley.edu;;;;berkeley.edu;google.com", "author_num": 8, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "University of California, Berkeley;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.berkeley.edu;https://www.google.com", "aff_unique_abbr": "UC Berkeley;Google", "aff_campus_unique_index": "0;0;0;1", "aff_campus_unique": "Berkeley;Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "LjFGgI-_tT0", "title": "BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayesian Fine-tuning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite their theoretical appealingness, Bayesian neural networks (BNNs) are falling far behind in terms of adoption in real-world applications compared with normal NNs, mainly due to their limited scalability in training, and low fidelity in their uncertainty estimates. In this work, we develop a new framework, named BayesAdapter, to address these issues and bring Bayesian deep learning to the masses. The core notion of BayesAdapter is to adapt pre-trained deterministic NNs to be BNNs via Bayesian fine-tuning. We implement Bayesian fine-tuning with a plug-and-play instantiation of stochastic variational inference, and propose exemplar reparameterization to reduce gradient variance and stabilize the fine-tuning. Together, they enable training BNNs as if one were training deterministic NNs with minimal added overheads. During Bayesian fine-tuning, we further propose an uncertainty regularization to supervise and calibrate the uncertainty quantification of learned BNNs at low cost. To empirically evaluate BayesAdapter, we conduct extensive experiments on a diverse set of challenging benchmarks, and observe satisfactory training efficiency, competitive predictive performance, and calibrated and faithful uncertainty estimates. ", "keywords": "Bayesian neural networks;Bayesian fine-tuning;uncertainty estimation;OOD detection", "primary_area": "", "supplementary_material": "/attachment/1c497653db290634d98b7b7d955514bbd1bb0a5c.zip", "author": "Zhijie Deng;Xiao Yang;Hao Zhang;Yinpeng Dong;Jun Zhu", "authorids": "~Zhijie_Deng1;~Xiao_Yang4;~Hao_Zhang2;~Yinpeng_Dong2;~Jun_Zhu2", "gender": "M;M;M;M;M", "homepage": "https://thudzj.github.io/;https://ml.cs.tsinghua.edu.cn/~xiaoyang/;https://cseweb.ucsd.edu/~haozhang/;https://dongyp13.github.io;http://ml.cs.tsinghua.edu.cn/~jun", "dblp": "209/4959;57/33851;55/2270-25;183/0980;50/2644-1", "google_scholar": "J3dR0sUAAAAJ;bwkwp0MAAAAJ;H1d4BS8AAAAJ;6_4ad84AAAAJ;axsP38wAAAAJ", "orcid": "0000-0002-0932-1631;0000-0001-9502-9962;;;", "linkedin": ";;;;", "or_profile": "~Zhijie_Deng1;~Xiao_Yang4;~Hao_Zhang2;~Yinpeng_Dong2;~Jun_Zhu2", "aff": "Tsinghua University;Tsinghua University;University of California, Berkeley;Tsinghua University;Tsinghua University", "aff_domain": "tsinghua.edu.cn;mail.tsinghua.edu.cn;berkeley.edu;tsinghua.edu.cn;mail.tsinghua.edu.cn", "position": "PhD student;PhD student;Postdoc;PhD student;Professor", "bibtex": "@misc{\ndeng2021bayesadapter,\ntitle={BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayesian Fine-tuning},\nauthor={Zhijie Deng and Xiao Yang and Hao Zhang and Yinpeng Dong and Jun Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=LjFGgI-_tT0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=LjFGgI-_tT0", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;3;4;4", "wc_review": "601;617;313;479", "wc_reply_reviewers": "0;219;60;241", "wc_reply_authors": "1015;1050;354;882", "reply_reviewers": "0;2;1;2", "reply_authors": "2;4;1;3", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 502.5, 121.7322882393985 ], "wc_reply_reviewers_avg": [ 130.0, 102.52072961113767 ], "wc_reply_authors_avg": [ 825.25, 279.20187588911364 ], "reply_reviewers_avg": [ 1.25, 0.82915619758885 ], "reply_authors_avg": [ 2.5, 1.118033988749895 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:qkxU5C8SMrMJ:scholar.google.com/&scioq=BayesAdapter:+Being+Bayesian,+Inexpensively+and+Robustly,+via+Bayesian+Fine-tuning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1;0;0", "aff_unique_norm": "Tsinghua University;University of California, Berkeley", "aff_unique_dep": ";", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.berkeley.edu", "aff_unique_abbr": "THU;UC Berkeley", "aff_campus_unique_index": "1", "aff_campus_unique": ";Berkeley", "aff_country_unique_index": "0;0;1;0;0", "aff_country_unique": "China;United States" }, { "id": "Ljcb2tylYn1", "title": "CNN Based Analysis of the Luria\u2019s Alternating Series Test for Parkinson\u2019s Disease Diagnostics", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Deep-learning based image classification is applied in this studies to the Luria's alternating series tests to support diagnostics of the Parkinson's disease. Luria's alternating series tests belong to the family of fine-motor drawing tests and been used in neurology and psychiatry for nearly a century. Introduction of the digital tables and later tablet PCs has allowed deviating from the classical paper and pen setting, and observe kinematic and pressure parameters describing the test. While such setting has led to a highly accurate machine learning models, the visual component of the tests is left unused. Namely, the shapes of the drawn lines are not used to classify the drawings, which eventually has caused the shift in the assessment paradigm from visual-based to the numeric parameters based. The approach proposed in this paper allows combining two assessment paradigms by augmenting initial drawings by the kinematic and pressure parameters. The paper demonstrates that the resulting network has the accuracy similar to those of human practitioner.", "keywords": "Parkinson's disease;drawing tests;data augmentation;CNN;diagnostics support", "primary_area": "", "supplementary_material": "", "author": "Sergei Zarembo;Sven Nomm;Kadri Medijainen;Pille Taba;Aaro Toomela", "authorids": "sezare@taltech.ee;~Sven_Nomm1;kadri.medijainen@ut.ee;pille.taba@kliinikum.ee;aaro.toomela@tlu.ee", "gender": ";;;;", "homepage": ";;;;", "dblp": ";;;;", "google_scholar": ";;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": ";;;;", "aff": ";;;;", "aff_domain": ";;;;", "position": ";;;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=Ljcb2tylYn1", "pdf_size": 0, "rating": "2;4;5;5", "confidence": "5;4;2;5", "wc_review": "560;85;28;239", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 1.224744871391589 ], "confidence_avg": [ 4.0, 1.224744871391589 ], "wc_review_avg": [ 228.0, 206.63615366145393 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.49999999999999994, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9339975620893565870&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 2 }, { "title": "Adaptive Federated Optimization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2691", "id": "LkFG3lB13U5", "poster": "", "openreview": "https://openreview.net/forum?id=LkFG3lB13U5", "slides": "https://iclr.cc/virtual/2021/poster/2691", "video": "https://iclr.cc/virtual/2021/poster/2691", "author_site": "Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Kone\u010dn\u00fd, Sanjiv Kumar, H. Brendan McMahan", "tldr": "", "abstract": "Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have had notable success in combating such issues. In this work, we propose federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, and analyze their convergence in the presence of heterogeneous data for general non-convex settings. Our results highlight the interplay between client heterogeneity and communication efficiency. We also perform extensive experiments on these methods and show that the use of adaptive optimizers can significantly improve the performance of federated learning.", "keywords": "Federated learning;optimization;adaptive optimization;distributed optimization", "primary_area": "", "supplementary_material": "/attachment/6f8380100620a35ffbd4e24181a875c2dcd511e1.zip", "author": "Sashank J. Reddi;Zachary Charles;Manzil Zaheer;Zachary Garrett;Keith Rush;Jakub Kone\u010dn\u00fd;Sanjiv Kumar;Hugh Brendan McMahan", "authorids": "~Sashank_J._Reddi1;~Zachary_Charles1;~Manzil_Zaheer1;zachgarrett@google.com;krush@google.com;~Jakub_Kone\u010dn\u00fd1;~Sanjiv_Kumar1;~Hugh_Brendan_McMahan1", "gender": "M;;M;;;M;;M", "homepage": ";;https://www.aclweb.org/anthology/people/m/manzil-zaheer/;;;http://jakubkonecny.com/;http://www.sanjivk.com/;", "dblp": "50/10452;;40/10701;;;139/0872;;", "google_scholar": "70lgwYwAAAAJ;;A33FhJMAAAAJ;;;https://scholar.google.sk/citations?user=4vq7eXQAAAAJ;https://scholar.google.com/citations?hl=en;", "orcid": ";;;;;;;", "linkedin": ";;;;;;;", "or_profile": "~Sashank_J._Reddi1;~Zachary_Charles1;~Manzil_Zaheer1;zachgarrett@google.com;krush@google.com;~Jakub_Kone\u010dn\u00fd1;~Sanjiv_Kumar1;~Hugh_Brendan_McMahan1", "aff": "Google;;Google DeepMind;;;Google;Google;Google", "aff_domain": "google.com;;deepmind.com;;;google.com;google.com;google.com", "position": "Research Scientist;;Researcher;;;Research Scientist;Research Scientist;Research Scientist", "bibtex": "@inproceedings{\nreddi2021adaptive,\ntitle={Adaptive Federated Optimization},\nauthor={Sashank J. Reddi and Zachary Charles and Manzil Zaheer and Zachary Garrett and Keith Rush and Jakub Kone{\\v{c}}n{\\'y} and Sanjiv Kumar and Hugh Brendan McMahan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LkFG3lB13U5}\n}", "github": "[![github](/images/github_icon.svg) google-research/federated](https://github.com/google-research/federated/tree/master/optimization) + [![Papers with Code](/images/pwc_icon.svg) 4 community implementations](https://paperswithcode.com/paper/?openreview=LkFG3lB13U5)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;4;5;4", "wc_review": "593;371;157;611", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "788;462;453;436", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 433.0, 185.27277187973414 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 534.75, 146.51173161218182 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 1819, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4151755605437418819&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=LkFG3lB13U5", "email": "google.com;;deepmind.com;;;google.com;google.com;google.com", "author_num": 8, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;1;0;0;0", "aff_country_unique": "United States;United Kingdom" }, { "title": "Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3003", "id": "LmUJqB1Cz8", "poster": "", "openreview": "https://openreview.net/forum?id=LmUJqB1Cz8", "slides": "https://iclr.cc/virtual/2021/poster/3003", "video": "https://iclr.cc/virtual/2021/poster/3003", "author_site": "Deunsol Yoon, Sunghoon Hong, Byung-Jun Lee, Kee-Eung Kim", "tldr": "", "abstract": "Safe and reliable electricity transmission in power grids is crucial for modern society. It is thus quite natural that there has been a growing interest in the automatic management of power grids, exempli\ufb01ed by the Learning to Run a Power Network Challenge (L2RPN), modeling the problem as a reinforcement learning (RL) task. However, it is highly challenging to manage a real-world scale power grid, mostly due to the massive scale of its state and action space. In this paper, we present an off-policy actor-critic approach that effectively tackles the unique challenges in power grid management by RL, adopting the hierarchical policy together with the afterstate representation. Our agent ranked \ufb01rst in the latest challenge (L2RPN WCCI 2020), being able to avoid disastrous situations while maintaining the highest level of operational ef\ufb01ciency in every test scenarios. This paper provides a formal description of the algorithmic aspect of our approach, as well as further experimental studies on diverse power grids.", "keywords": "power grid management;deep reinforcement learning;graph neural network", "primary_area": "", "supplementary_material": "", "author": "Deunsol Yoon;Sunghoon Hong;Byung-Jun Lee;Kee-Eung Kim", "authorids": "~Deunsol_Yoon1;~Sunghoon_Hong2;~Byung-Jun_Lee1;~Kee-Eung_Kim4", "gender": "M;M;M;M", "homepage": ";https://sunghoonhong.github.io;https://dmlab.korea.ac.kr/professor.html;http://ailab.kaist.ac.kr", "dblp": "225/5388.html;;130/1678-1;35/6703", "google_scholar": ";C5Vy-ZAAAAAJ;FwoohI4AAAAJ;https://scholar.google.com/citations?hl=ko", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Deunsol_Yoon1;~Sunghoon_Hong2;~Byung-Jun_Lee1;~Kee-Eung_Kim2", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "position": "MS student;MS student;PhD student;Full Professor", "bibtex": "@inproceedings{\nyoon2021winning,\ntitle={Winning the L2{\\{}RPN{\\}} Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic},\nauthor={Deunsol Yoon and Sunghoon Hong and Byung-Jun Lee and Kee-Eung Kim},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LmUJqB1Cz8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer5;AnonReviewer3", "pdf_size": 0, "rating": "7;7;7;9", "confidence": "4;3;2;4", "wc_review": "464;470;474;321", "wc_reply_reviewers": "15;0;0;0", "wc_reply_authors": "172;411;194;198", "reply_reviewers": "1;0;0;0", "reply_authors": "1;2;1;2", "rating_avg": [ 7.5, 0.8660254037844386 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 432.25, 64.32874551862487 ], "wc_reply_reviewers_avg": [ 3.75, 6.49519052838329 ], "wc_reply_authors_avg": [ 243.75, 97.06795300200783 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 70, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7017403933610012598&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=LmUJqB1Cz8", "email": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "author_num": 4, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "LnVNgfvrQjC", "title": "CAFENet: Class-Agnostic Few-Shot Edge Detection Network", "track": "main", "status": "Reject", "tldr": "", "abstract": "We tackle a novel few-shot learning challenge, few-shot semantic edge detection, aiming to localize boundaries of novel categories using only a few labeled samples. Reliable boundary information has been shown to boost the performance of semantic segmentation and localization, while also playing a key role in its own right in object reconstruction, image generation and medical imaging. Few-shot semantic edge detection allows recovery of accurate boundaries with just a few examples. In this work, we present a Class-Agnostic Few-shot Edge detection Network (CAFENet) based on meta-learning strategy. CAFENet employs a semantic segmentation module in small-scale to compensate for lack of semantic information in edge labels. The predicted segmentation mask is used to generate an attention map to highlight the target object region, and make the decoder module concentrate on that region. We also propose a new regularization method based on multi-split matching. In meta-training, the metric-learning problem with high-dimensional vectors are divided into smaller subproblems with low-dimensional sub-vectors. Since there are no existing datasets for few-shot semantic edge detection, we construct two new datasets, FSE-1000 and SBD-5i, and evaluate the performance of the proposed CAFENet on them. Extensive simulation results confirm that the proposed CAFENet achieves better performance compared to the baseline methods using fine-tuning or few-shot segmentation.", "keywords": "Few-shot edge detection;Few-shot learning;Semantic edge detection", "primary_area": "", "supplementary_material": "/attachment/78a55a1772235733be1ed5d28ad07bd79f4a93bd.zip", "author": "Younghyun Park;Jun Seo;Jaekyun Moon", "authorids": "~Younghyun_Park1;~Jun_Seo1;~Jaekyun_Moon2", "gender": "M;M;M", "homepage": "https://github.com/MoonLab-YH;;http://comstolab.kaist.ac.kr/people.html", "dblp": "137/2568;222/1700;78/2744", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Younghyun_Park1;~Jun_Seo1;~Jaekyun_Moon2", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;KAIST", "aff_domain": "kaist.ac.kr;kaist.ac.kr;kaist.edu", "position": "MS student;PhD student;Full Professor", "bibtex": "@misc{\npark2021cafenet,\ntitle={{\\{}CAFEN{\\}}et: Class-Agnostic Few-Shot Edge Detection Network},\nauthor={Younghyun Park and Jun Seo and Jaekyun Moon},\nyear={2021},\nurl={https://openreview.net/forum?id=LnVNgfvrQjC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=LnVNgfvrQjC", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "5;5;5;3", "wc_review": "658;196;365;273", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "582;475;469;278", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 373.0, 175.08426542667962 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 451.0, 109.5331000200396 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17418988577107749512&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "Lnomatc-1s", "title": "Learning-Augmented Sketches for Hessians", "track": "main", "status": "Reject", "tldr": "", "abstract": "We study learning-based sketching for Hessians, which is known to provide considerable speedups to second order optimization. A number of works have shown how to sketch or subsample the Hessian to speed up each iteration, but such sketches are usually specific to the matrix at hand, rather than being learned from a distribution. We extend such schemes to learned sketches, where we learn different potentially different sketches for the different iterations, and show empirically that learned sketches, compared with their \"non-learned\" counterparts, improve the approximation accuracy for a large number of important problems, including LASSO, SVM, and matrix estimation with nuclear norm constraints. ", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/d4746dd214c8a381a8b659c386be7706b7e4b512.zip", "author": "Yi Li;Honghao Lin;David Woodruff", "authorids": "~Yi_Li8;~Honghao_Lin1;~David_Woodruff2", "gender": "M;M;M", "homepage": ";https://honghlin.github.io;http://www.cs.cmu.edu/~dwoodruf/", "dblp": "59/871-2;https://dblp.uni-trier.de/pid/264/2663.html;w/DPWoodruff", "google_scholar": ";;https://scholar.google.com.tw/citations?user=0G2t-6sAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Yi_Li8;~Honghao_Lin1;~David_Woodruff1", "aff": "Nanyang Technological University;Shanghai University of Finance and Economics;Carnegie Mellon University", "aff_domain": "ntu.edu.sg;shufe.edu;cmu.edu", "position": "Assistant Professor;Researcher;Associate Professor", "bibtex": "@misc{\nli2021learningaugmented,\ntitle={Learning-Augmented Sketches for Hessians},\nauthor={Yi Li and Honghao Lin and David Woodruff},\nyear={2021},\nurl={https://openreview.net/forum?id=Lnomatc-1s}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Lnomatc-1s", "pdf_size": 0, "rating": "4;6;6", "confidence": "5;4;3", "wc_review": "544;1213;203", "wc_reply_reviewers": "0;10;0", "wc_reply_authors": "461;837;330", "reply_reviewers": "0;1;0", "reply_authors": "2;1;1", "rating_avg": [ 5.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 653.3333333333334, 419.5158585268924 ], "wc_reply_reviewers_avg": [ 3.3333333333333335, 4.714045207910316 ], "wc_reply_authors_avg": [ 542.6666666666666, 214.88653336638438 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18215798707448204443&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;2", "aff_unique_norm": "Nanyang Technological University;Shanghai University of Finance and Economics;Carnegie Mellon University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ntu.edu.sg;http://www.sufe.edu.cn;https://www.cmu.edu", "aff_unique_abbr": "NTU;SUFE;CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2", "aff_country_unique": "Singapore;China;United States" }, { "id": "LpSGtq6F5xN", "title": "A Mixture of Variational Autoencoders for Deep Clustering", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this study, we propose a deep clustering algorithm that utilizes a variational autoencoder (VAE) framework with a multi encoder-decoder neural architecture. This setup enforces a complementary structure that guides the learned latent representations towards a more meaningful space arrangement. It differs from previous VAE-based clustering algorithms by employing a new generative model that uses multiple encoder-decoders.\nWe show that this modeling results in both better clustering capabilities and improved data generation. The proposed method is evaluated on standard datasets and is shown to outperform state-of-the-art deep clustering methods significantly.", "keywords": "deep clustering;variational auto encoder;VAE", "primary_area": "", "supplementary_material": "", "author": "Avi Caciularu;Jacob Goldberger", "authorids": "~Avi_Caciularu1;~Jacob_Goldberger1", "gender": "M;M", "homepage": "http://aviclu.github.io/;http://www.eng.biu.ac.il/goldbej/", "dblp": "https://dblp.uni-trier.de/pid/207/8509;65/6574", "google_scholar": "https://scholar.google.co.il/citations?user=fPG_0aQAAAAJ;https://scholar.google.co.il/citations?user=vgzrOK4AAAAJ", "orcid": ";", "linkedin": "avicaciularu/;", "or_profile": "~Avi_Caciularu1;~Jacob_Goldberger1", "aff": ";Bar-Ilan University", "aff_domain": ";biu.ac.il", "position": ";Full Professor", "bibtex": "@misc{\ncaciularu2021a,\ntitle={A Mixture of Variational Autoencoders for Deep Clustering},\nauthor={Avi Caciularu and Jacob Goldberger},\nyear={2021},\nurl={https://openreview.net/forum?id=LpSGtq6F5xN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=LpSGtq6F5xN", "pdf_size": 0, "rating": "5;5;5;6", "confidence": "4;3;4;4", "wc_review": "441;274;339;220", "wc_reply_reviewers": "81;0;0;0", "wc_reply_authors": "249;185;270;42", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 318.5, 82.32405480781422 ], "wc_reply_reviewers_avg": [ 20.25, 35.074028853269766 ], "wc_reply_authors_avg": [ 186.5, 89.10808044167487 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Uv4EweXBZlUJ:scholar.google.com/&scioq=A+Mixture+of+Variational+Autoencoders+for+Deep+Clustering&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Bar-Ilan University", "aff_unique_dep": "", "aff_unique_url": "https://www.biu.ac.il", "aff_unique_abbr": "BIU", "aff_country_unique_index": "0", "aff_country_unique": "Israel" }, { "id": "Lq1srMWfUAi", "title": "CoNES: Convex Natural Evolutionary Strategies", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We present a novel algorithm -- convex natural evolutionary strategies (CoNES) -- for optimizing high-dimensional blackbox functions by leveraging tools from convex optimization and information geometry. CoNES is formulated as an efficiently-solvable convex program that adapts the evolutionary strategies (ES) gradient estimate to promote rapid convergence. The resulting algorithm is invariant to the parameterization of the belief distribution. Our numerical results demonstrate that CoNES vastly outperforms conventional blackbox optimization methods on a suite of functions used for benchmarking blackbox optimizers. Furthermore, CoNES demonstrates the ability to converge faster than conventional blackbox methods on a selection of OpenAI's MuJoCo reinforcement learning tasks for locomotion.", "keywords": "blackbox optimization;evolutionary strategies", "primary_area": "", "supplementary_material": "/attachment/4d78594e1014df2686ef25bf68c149432a0ec3d5.zip", "author": "Sushant Veer;Anirudha Majumdar", "authorids": "~Sushant_Veer1;~Anirudha_Majumdar1", "gender": "M;M", "homepage": ";https://irom-lab.princeton.edu/majumdar/", "dblp": "173/5950;116/6436", "google_scholar": "1FiIlQsAAAAJ;ibu3FwsAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Sushant_Veer1;~Anirudha_Majumdar1", "aff": "Princeton University;Princeton University", "aff_domain": "princeton.edu;princeton.edu", "position": "Postdoc;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Lq1srMWfUAi", "pdf_size": 0, "rating": "2;3;6", "confidence": "5;5;4", "wc_review": "789;268;460", "wc_reply_reviewers": "587;0;0", "wc_reply_authors": "1224;660;388", "reply_reviewers": "1;0;0", "reply_authors": "3;1;1", "rating_avg": [ 3.6666666666666665, 1.699673171197595 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 505.6666666666667, 215.13458320058373 ], "wc_reply_reviewers_avg": [ 195.66666666666666, 276.7144537043356 ], "wc_reply_authors_avg": [ 757.3333333333334, 348.16598851058893 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9707253433941506, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:KGOz34gNipwJ:scholar.google.com/&scioq=CoNES:+Convex+Natural+Evolutionary+Strategies&hl=en&as_sdt=0,5", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "Princeton University", "aff_unique_dep": "", "aff_unique_url": "https://www.princeton.edu", "aff_unique_abbr": "Princeton", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "LtS9mII3jFi", "title": "HyperReal: Complex-Valued Layer Functions For Complex-Valued Scaling Invariance", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Complex-valued measurements in MRI and SAR imaging often have complex-valued scaling ambiguity, calling for models that are invariant to complex-valued scaling of pixels. Deep Complex Networks (DCN) extends real-valued algebra to complex-valued algebra in neural networks, but it does not address the issue of complex-valued scaling. SurReal complex-valued networks adopt a manifold view of complex numbers and derive a distance metric that is invariant to complex scaling. With distance features, it achieves complex-scaling invariance. However, rich complex-valued information is lost in this representation, and additionally, SurReal is also prevented from using complex-valued non-linearity, limiting its expressive power. We simplify the manifold formulation of SurReal and propose a new layer function that achieves complex-scaling invariance within the complex domain. We can then build hierarchical complex-valued features with complex-scaling invariance. Our so-called HyperReal model results in a much leaner model with better generalization. Benchmarked on MSTAR, HyperReal beats DCN (and matches SurReal) with only 3%(40%) of their respective parameter counts.", "keywords": "Complex Deep Learning;Invariance;Equivariance;Manifold;SAR Imaging", "primary_area": "", "supplementary_material": "", "author": "Utkarsh Singhal;Yifei Xing;Stella Yu", "authorids": "~Utkarsh_Singhal1;xingyifei2016@berkley.edu;~Stella_Yu2", "gender": "M;;F", "homepage": ";;http://www.eecs.umich.edu/~stellayu", "dblp": ";;58/5089", "google_scholar": "lvA86MYAAAAJ;;https://scholar.google.com/citations?hl=en", "orcid": ";;", "linkedin": ";;", "or_profile": "~Utkarsh_Singhal1;xingyifei2016@berkley.edu;~Stella_Yu2", "aff": "University of California, Berkeley;;University of California, Berkeley", "aff_domain": "berkeley.edu;;berkeley.edu", "position": "PhD student;;Director, ICSI Vision Group", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=LtS9mII3jFi", "pdf_size": 0, "rating": "5;5;5", "confidence": "3;4;3", "wc_review": "320;486;90", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 5.0, 0.0 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 298.6666666666667, 162.3685793358911 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:6Jasj-oNatUJ:scholar.google.com/&scioq=HyperReal:+Complex-Valued+Layer+Functions+For+Complex-Valued+Scaling+Invariance&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of California, Berkeley", "aff_unique_dep": "", "aff_unique_url": "https://www.berkeley.edu", "aff_unique_abbr": "UC Berkeley", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Berkeley", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "LtgEkhLScK3", "title": "Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep reinforcement learning (DRL) has successfully solved various problems recently, typically with a unimodal policy representation. However, grasping the decomposable and hierarchical structures within a complex task can be essential for further improving its learning efficiency and performance, which may lead to a multimodal policy or a mixture-of-experts (MOE). To our best knowledge, present DRL algorithms for general utility do not deploy MOE methods as policy function approximators due to the lack of differentiability, or without explicit probabilistic representation. In this work, we propose a differentiable probabilistic mixture-of-experts (PMOE) embedded in the end-to-end training scheme for generic off-policy and on-policy algorithms using stochastic policies, e.g., Soft Actor-Critic (SAC) and Proximal Policy Optimisation (PPO). Experimental results testify the advantageous performance of our method over unimodal polices and three different MOE methods, as well as a method of option frameworks, based on two types of DRL algorithms. We also demonstrate the distinguishable primitives learned with PMOE in different environments.", "keywords": "Deep Reinforcement Learning;Sample Efficiency;Gaussian Mixture Models;Mixture-of-Experts", "primary_area": "", "supplementary_material": "/attachment/12ae2601295f51432aba6e999a8b9e0ee8f225a6.zip", "author": "Jie Ren;Yewen Li;Zihan Ding;Wei Pan;Hao Dong", "authorids": "~Jie_Ren4;~Yewen_Li1;~Zihan_Ding1;~Wei_Pan2;~Hao_Dong3", "gender": ";M;M;M;M", "homepage": "https://jieren98.github.io/;https://scholar.google.com/citations?user=W5796yEAAAAJ&hl=zh-CN;https://quantumiracle.github.io/webpage/;http://panweihit.github.io;https://zsdonghao.github.io", "dblp": ";55/2231;;;14/1525-3.html", "google_scholar": "wlVxP3QAAAAJ;W5796yEAAAAJ;t5DgPBAAAAAJ;GqryWPsAAAAJ;xLFL4sMAAAAJ", "orcid": ";0009-0008-0073-123X;;0000-0003-1121-9879;0000-0003-2261-9122", "linkedin": ";;;wei-pan-6b558b17/;", "or_profile": "~Jie_Ren4;~Yewen_Li1;~Zihan_Ding1;~Wei_Pan2;~Hao_Dong3", "aff": "Xi'an University of Electronic Science and Technology;Xidian University;Princeton University;Delft University of Technology;Peking University", "aff_domain": "xidian.edu.cn;xidian.edu.cn;princeton.edu;tudelft.nl;pku.edu.cn", "position": "Undergrad student;Undergrad student;PhD student;Assistant Professor;Assistant Professor", "bibtex": "@misc{\nren2021probabilistic,\ntitle={Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning},\nauthor={Jie Ren and Yewen Li and Zihan Ding and Wei Pan and Hao Dong},\nyear={2021},\nurl={https://openreview.net/forum?id=LtgEkhLScK3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=LtgEkhLScK3", "pdf_size": 0, "rating": "3;4;6;6", "confidence": "5;4;4;4", "wc_review": "334;579;452;350", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "217;434;166;167", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 428.75, 97.8452221623519 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 246.0, 110.48303037118416 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7777777777777777, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4548001461972204318&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;2;3;4", "aff_unique_norm": "Xi'an University of Electronic Science and Technology;Xidian University;Princeton University;Delft University of Technology;Peking University", "aff_unique_dep": ";;;;", "aff_unique_url": "http://www.xidian.edu.cn/;http://www.xidian.edu.cn/;https://www.princeton.edu;https://www.tudelft.nl;http://www.pku.edu.cn", "aff_unique_abbr": "Xidian University;Xidian;Princeton;TU Delft;Peking U", "aff_campus_unique_index": "0", "aff_campus_unique": "Xi'an;", "aff_country_unique_index": "0;0;1;2;0", "aff_country_unique": "China;United States;Netherlands" }, { "title": "Protecting DNNs from Theft using an Ensemble of Diverse Models", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2696", "id": "LucJxySuJcE", "poster": "", "openreview": "https://openreview.net/forum?id=LucJxySuJcE", "slides": "https://iclr.cc/virtual/2021/poster/2696", "video": "https://iclr.cc/virtual/2021/poster/2696", "author_site": "Sanjay Kariyappa, Atul Prakash, Moinuddin K Qureshi", "tldr": "", "abstract": "Several recent works have demonstrated highly effective model stealing (MS) attacks on Deep Neural Networks (DNNs) in black-box settings, even when the training data is unavailable. These attacks typically use some form of Out of Distribution (OOD) data to query the target model and use the predictions obtained to train a clone model. Such a clone model learns to approximate the decision boundary of the target model, achieving high accuracy on in-distribution examples. We propose Ensemble of Diverse Models (EDM) to defend against such MS attacks. EDM is made up of models that are trained to produce dissimilar predictions for OOD inputs. By using a different member of the ensemble to service different queries, our defense produces predictions that are highly discontinuous in the input space for the adversary's OOD queries. Such discontinuities cause the clone model trained on these predictions to have poor generalization on in-distribution examples. Our evaluations on several image classification tasks demonstrate that EDM defense can severely degrade the accuracy of clone models (up to $39.7\\%$). Our defense has minimal impact on the target accuracy, negligible computational costs during inference, and is compatible with existing defenses for MS attacks.", "keywords": "Model stealing;machine learning security", "primary_area": "", "supplementary_material": "/attachment/3d185c122b2fc5e08fd7d1dc17f89aa7aafe5986.zip", "author": "Sanjay Kariyappa;Atul Prakash;Moinuddin K Qureshi", "authorids": "~Sanjay_Kariyappa1;~Atul_Prakash1;~Moinuddin_K_Qureshi2", "gender": "M;;M", "homepage": "https://sanjaykariyappa.github.io/;https://www.eecs.umich.edu/~aprakash;https://www.cc.gatech.edu/~moin/", "dblp": "223/6062;p/AtulPrakash;", "google_scholar": "qd9U-h4AAAAJ;kIkHa2IAAAAJ;", "orcid": ";0000-0002-4907-3687;", "linkedin": "sanjay-kariyappa-74583924/;atul-prakash-8729a44/;", "or_profile": "~Sanjay_Kariyappa1;~Atul_Prakash1;~Moinuddin_K_Qureshi2", "aff": "Georgia Institute of Technology;University of Michigan;", "aff_domain": "gatech.edu;umich.edu;", "position": "PhD student;Professor;", "bibtex": "@inproceedings{\nkariyappa2021protecting,\ntitle={Protecting {\\{}DNN{\\}}s from Theft using an Ensemble of Diverse Models},\nauthor={Sanjay Kariyappa and Atul Prakash and Moinuddin K Qureshi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LucJxySuJcE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "5;4;3;3", "wc_review": "384;972;179;317", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "695;791;434;569", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 463.0, 303.0239264480612 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 622.25, 134.20390270033133 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8528028654224418, "gs_citation": 37, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6774410133971591053&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=LucJxySuJcE", "email": "gatech.edu;umich.edu;", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Georgia Institute of Technology;University of Michigan", "aff_unique_dep": ";", "aff_unique_url": "https://www.gatech.edu;https://www.umich.edu", "aff_unique_abbr": "Georgia Tech;UM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "LuyryrCs6Ez", "title": "CURI: A Benchmark for Productive Concept Learning Under Uncertainty", "track": "main", "status": "Reject", "tldr": "", "abstract": "Humans can learn and reason under substantial uncertainty in a space of infinitely many concepts, including structured relational concepts (\u201ca scene with objects that have the same color\u201d) and ad-hoc categories defined through goals (\u201cobjects that could fall on one\u2019s head\u201d). In contrast, standard classification benchmarks: 1) consider only a fixed set of category labels, 2) do not evaluate compositional concept learning and 3) do not explicitly capture a notion of reasoning under uncertainty. We introduce a new few-shot, meta-learning benchmark, Compositional Reasoning Under Uncertainty (CURI) to bridge this gap. CURI evaluates different aspects of productive and systematic generalization, including abstract understandings of disentangling, productive generalization, learning boolean operations, variable binding, etc. Importantly, it also defines a model-independent \u201ccompositionality gap\u201d to evaluate difficulty of generalizing out-of-distribution along each of these axes. Extensive evaluations across a range of modeling choices spanning different modalities (image, schemas, and sounds), splits, privileged auxiliary concept information, and choices of negatives reveal substantial scope for modeling advances on the proposed task. All code and datasets will be available online.", "keywords": "compositional learning;meta-learning;systematicity;reasoning", "primary_area": "", "supplementary_material": "", "author": "Shanmukha Ramakrishna Vedantam;Arthur Szlam;Maximilian Nickel;Ari S. Morcos;Brenden M. Lake", "authorids": "~Shanmukha_Ramakrishna_Vedantam1;~Arthur_Szlam1;~Maximilian_Nickel1;~Ari_S._Morcos1;~Brenden_M._Lake1", "gender": "M;M;M;M;M", "homepage": "http://vrama91.github.io;;https://mnick.github.io/;https://cims.nyu.edu/~brenden/;http://www.arimorcos.com", "dblp": "154/6748.html;22/6733;83/10622;47/9567;217/3720", "google_scholar": "v1CRzeAAAAAJ;;KDqGTIUAAAAJ;vspmOX8AAAAJ;v-A_7UsAAAAJ", "orcid": ";;0000-0001-5006-0827;;", "linkedin": ";;;;", "or_profile": "~Shanmukha_Ramakrishna_Vedantam1;~Arthur_Szlam1;~Maximilian_Nickel1;~Brenden_M._Lake1;~Ari_Morcos1", "aff": "Meta Facebook;CUNY City College;Meta Facebook;New York University;Meta AI (FAIR)", "aff_domain": "fb.com;;fb.com;nyu.edu;meta.com", "position": "Research Scientist;;Research Scientist;Assistant Professor;Research Scientist", "bibtex": "@misc{\nvedantam2021curi,\ntitle={{\\{}CURI{\\}}: A Benchmark for Productive Concept Learning Under Uncertainty},\nauthor={Shanmukha Ramakrishna Vedantam and Arthur Szlam and Maximilian Nickel and Ari S. Morcos and Brenden M. Lake},\nyear={2021},\nurl={https://openreview.net/forum?id=LuyryrCs6Ez}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=LuyryrCs6Ez", "pdf_size": 0, "rating": "5;6;6", "confidence": "4;3;3", "wc_review": "625;594;340", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "776;755;284", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 519.6666666666666, 127.67232363445973 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 605.0, 227.1431266844762 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.9999999999999997, "gs_citation": 29, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8315298936162694296&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;0;2;0", "aff_unique_norm": "Meta;City College of New York;New York University", "aff_unique_dep": "Meta Platforms, Inc.;;", "aff_unique_url": "https://meta.com;https://www.ccny.cuny.edu;https://www.nyu.edu", "aff_unique_abbr": "Meta;CCNY;NYU", "aff_campus_unique_index": "1", "aff_campus_unique": ";New York", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "LvJ8hLSusrv", "title": "Gradient-based tuning of Hamiltonian Monte Carlo hyperparameters", "track": "main", "status": "Reject", "tldr": "", "abstract": "Hamiltonian Monte Carlo (HMC) is one of the most successful sampling methods in machine learning. However, its performance is significantly affected by the choice of hyperparameter values, which require careful tuning. Existing approaches for automating this task either optimise a proxy for mixing speed or consider the HMC chain as an implicit variational distribution and optimize a tractable lower bound that is too loose to be useful in practice. Instead, we propose to optimize an objective that quantifies directly the speed of convergence to the target distribution. Our objective can be easily optimized using stochastic gradient descent. We evaluate our proposed method and compare to baselines on a variety of problems including synthetic 2D distributions, the posteriors of variational autoencoders and the Boltzmann distribution for molecular configurations of a 22 atom molecule. We find our method is competitive with or improves upon alternative baselines on all problems we consider.", "keywords": "Hamiltonian Monte Carlo;HMC;MCMC;Variational Inference", "primary_area": "", "supplementary_material": "/attachment/e2588f417732f376b2c28b6fa0df900d217576ef.zip", "author": "Andrew Campbell;Wenlong Chen;Vincent Stimper;Jos\u00e9 Miguel Hern\u00e1ndez-Lobato;Yichuan Zhang", "authorids": "~Andrew_Campbell4;wc327@cam.ac.uk;vs488@cam.ac.uk;~Jos\u00e9_Miguel_Hern\u00e1ndez-Lobato1;~Yichuan_Zhang1", "gender": ";;;;M", "homepage": ";;;;https://yichuan-zhang.github.io/", "dblp": "93/3398;;;;12/7841", "google_scholar": ";;;;", "orcid": "0000-0003-2086-0238;;;;", "linkedin": ";;;;", "or_profile": "~Andrew_Campbell4;wc327@cam.ac.uk;vs488@cam.ac.uk;~Jos\u00e9_Miguel_Hern\u00e1ndez-Lobato1;~Yichuan_Zhang1", "aff": "University of Oxford;;;;", "aff_domain": "ox.ac.uk;;;;", "position": "PhD student;;;;", "bibtex": "@misc{\ncampbell2021gradientbased,\ntitle={Gradient-based tuning of Hamiltonian Monte Carlo hyperparameters},\nauthor={Andrew Campbell and Wenlong Chen and Vincent Stimper and Jos{\\'e} Miguel Hern{\\'a}ndez-Lobato and Yichuan Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=LvJ8hLSusrv}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=LvJ8hLSusrv", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;4;5;3", "wc_review": "517;630;385;407", "wc_reply_reviewers": "162;245;0;0", "wc_reply_authors": "974;1157;870;374", "reply_reviewers": "1;1;0;0", "reply_authors": "2;3;2;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 484.75, 97.63804330280283 ], "wc_reply_reviewers_avg": [ 101.75, 105.89706086572942 ], "wc_reply_authors_avg": [ 843.75, 290.0192881516676 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Y2TObP797ooJ:scholar.google.com/&scioq=Gradient-based+tuning+of+Hamiltonian+Monte+Carlo+hyperparameters&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Oxford", "aff_unique_dep": "", "aff_unique_url": "https://www.ox.ac.uk", "aff_unique_abbr": "Oxford", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "Lvb2BKqL49a", "title": "Regularized Mutual Information Neural Estimation", "track": "main", "status": "Reject", "tldr": "", "abstract": "With the variational lower bound of mutual information (MI), the estimation of MI can be understood as an optimization task via stochastic gradient descent. In this work, we start by showing how Mutual Information Neural Estimator (MINE) searches for the optimal function $T$ that maximizes the Donsker-Varadhan representation. With our synthetic dataset, we directly observe the neural network outputs during the optimization to investigate why MINE succeeds or fails: We discover the drifting phenomenon, where the constant term of $T$ is shifting through the optimization process, and analyze the instability caused by the interaction between the $logsumexp$ and the insufficient batch size. Next, through theoretical and experimental evidence, we propose a novel lower bound that effectively regularizes the neural network to alleviate the problems of MINE. We also introduce an averaging strategy that produces an unbiased estimate by utilizing multiple batches to mitigate the batch size limitation. Finally, we show that $L^2$ regularization achieves significant improvements in both discrete and continuous settings.", "keywords": "Information Theory;Regularization", "primary_area": "", "supplementary_material": "/attachment/9487689522a21e9f72ec956eb9a851156f2d1e30.zip", "author": "Kwanghee Choi;Siyeong Lee", "authorids": "~Kwanghee_Choi1;~Siyeong_Lee1", "gender": "M;M", "homepage": ";", "dblp": "84/3338;213/8232", "google_scholar": "IGXBRggAAAAJ;iGSaIU0AAAAJ", "orcid": ";", "linkedin": ";siyeong/", "or_profile": "~Kwanghee_Choi1;~Siyeong_Lee1", "aff": "Sogang University;Naver Labs", "aff_domain": "sogang.ac.kr;naverlabs.com", "position": "Undergrad student;Researcher", "bibtex": "@misc{\nchoi2021regularized,\ntitle={Regularized Mutual Information Neural Estimation},\nauthor={Kwanghee Choi and Siyeong Lee},\nyear={2021},\nurl={https://openreview.net/forum?id=Lvb2BKqL49a}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=Lvb2BKqL49a", "pdf_size": 0, "rating": "3;5;6;7", "confidence": "5;4;3;2", "wc_review": "788;700;259;79", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1282;1055;235;11", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.25, 1.479019945774904 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 456.5, 296.098378921601 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 645.75, 534.7716218162666 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9827076298239908, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17781334326497129101&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Sogang University;NAVER LABS", "aff_unique_dep": ";", "aff_unique_url": "https://www.sogang.ac.kr;https://labs.naver.com", "aff_unique_abbr": "Sogang;Naver Labs", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "title": "Quantifying Differences in Reward Functions", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3348", "id": "LwEQnp6CYev", "poster": "", "openreview": "https://openreview.net/forum?id=LwEQnp6CYev", "slides": "https://iclr.cc/virtual/2021/poster/3348", "video": "https://iclr.cc/virtual/2021/poster/3348", "author_site": "Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike", "tldr": "", "abstract": "For many tasks, the reward function is inaccessible to introspection or too complex to be specified procedurally, and must instead be learned from user data. Prior work has evaluated learned reward functions by evaluating policies optimized for the learned reward. However, this method cannot distinguish between the learned reward function failing to reflect user preferences and the policy optimization process failing to optimize the learned reward. Moreover, this method can only tell us about behavior in the evaluation environment, but the reward may incentivize very different behavior in even a slightly different deployment environment. To address these problems, we introduce the Equivalent-Policy Invariant Comparison (EPIC) distance to quantify the difference between two reward functions directly, without a policy optimization step. We prove EPIC is invariant on an equivalence class of reward functions that always induce the same optimal policy. Furthermore, we find EPIC can be efficiently approximated and is more robust than baselines to the choice of coverage distribution. Finally, we show that EPIC distance bounds the regret of optimal policies even under different transition dynamics, and we confirm empirically that it predicts policy training success. Our source code is available at https://github.com/HumanCompatibleAI/evaluating-rewards.", "keywords": "rl;irl;reward learning;distance;benchmarks", "primary_area": "", "supplementary_material": "", "author": "Adam Gleave;Michael D Dennis;Shane Legg;Stuart Russell;Jan Leike", "authorids": "~Adam_Gleave1;~Michael_D_Dennis1;~Shane_Legg1;~Stuart_Russell1;~Jan_Leike1", "gender": "M;M;M;M;M", "homepage": "https://gleave.me;;http://www.vetta.org;https://people.eecs.berkeley.edu/~russell/;https://jan.leike.name", "dblp": "189/0008.html;;36/5739;;https://dblp.uni-trier.de/pers/hd/l/Leike:Jan", "google_scholar": "lBunDH0AAAAJ;WXXu26AAAAAJ;;https://scholar.google.com.tw/citations?user=KJGrjCAAAAAJ;beiWcokAAAAJ", "orcid": "0000-0002-3467-528X;;;;", "linkedin": "adamgleave/;;;;", "or_profile": "~Adam_Gleave1;~Michael_D_Dennis1;~Shane_Legg1;~Stuart_Russell1;~Jan_Leike1", "aff": "University of California, Berkeley;University of California, Berkeley;Google DeepMind;University of California, Berkeley;OpenAI", "aff_domain": "berkeley.edu;berkeley.edu;deepmind.com;berkeley.edu;openai.com", "position": "PhD student;PhD student;Chief Scientist;Full Professor;Alignment Team Lead", "bibtex": "@inproceedings{\ngleave2021quantifying,\ntitle={Quantifying Differences in Reward Functions},\nauthor={Adam Gleave and Michael D Dennis and Shane Legg and Stuart Russell and Jan Leike},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=LwEQnp6CYev}\n}", "github": "[![github](/images/github_icon.svg) HumanCompatibleAI/evaluating-rewards](https://github.com/HumanCompatibleAI/evaluating-rewards)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "2;3;4;4", "wc_review": "293;172;332;881", "wc_reply_reviewers": "0;40;14;110", "wc_reply_authors": "478;676;957;593", "reply_reviewers": "0;1;1;2", "reply_authors": "1;1;2;3", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 419.5, 272.8997068521694 ], "wc_reply_reviewers_avg": [ 41.0, 42.34383071948026 ], "wc_reply_authors_avg": [ 676.0, 176.81487493986472 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8528028654224418, "gs_citation": 85, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3868524216566349741&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 14, "pdf": "https://openreview.net/pdf?id=LwEQnp6CYev", "email": "berkeley.edu;berkeley.edu;deepmind.com;berkeley.edu;openai.com", "author_num": 5, "aff_unique_index": "0;0;1;0;2", "aff_unique_norm": "University of California, Berkeley;Google;OpenAI", "aff_unique_dep": ";Google DeepMind;", "aff_unique_url": "https://www.berkeley.edu;https://deepmind.com;https://openai.com", "aff_unique_abbr": "UC Berkeley;DeepMind;OpenAI", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0;1;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "LxBFTZT3UOU", "title": "A straightforward line search approach on the expected empirical loss for stochastic deep learning problems", "track": "main", "status": "Reject", "tldr": "", "abstract": " A fundamental challenge in deep learning is that the optimal step sizes for update steps of stochastic gradient descent are unknown. In traditional optimization, line searches are used to determine good step sizes, however, in deep learning, it is too costly to search for good step sizes on the expected empirical loss due to noisy losses. This empirical work shows that it is possible to approximate the expected empirical loss on vertical cross sections for common deep learning tasks considerably cheaply. This is achieved by applying traditional one-dimensional function fitting to measured noisy losses of such cross sections. The step to a minimum of the resulting approximation is then used as step size for the optimization. This approach leads to a robust and straightforward optimization method which performs well across datasets and architectures without the need of hyperparameter tuning.\n", "keywords": "Empirical Optimization;Expected Loss;Line Search", "primary_area": "", "supplementary_material": "/attachment/0a6acc139c6975dea933a56a99b0b16bb034864b.zip", "author": "Maximus Mutschler;Andreas Zell", "authorids": "~Maximus_Mutschler1;~Andreas_Zell1", "gender": ";M", "homepage": "https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/kognitive-systeme/the-chair/staff/maximus-mutschler/;https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/kognitive-systeme/", "dblp": ";05/4192", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Maximus_Mutschler1;~Andreas_Zell1", "aff": "University of Tuebingen;Eberhard-Karls-Universit\u00e4t T\u00fcbingen", "aff_domain": "uni-tuebingen.de;uni-tuebingen.de", "position": "PhD student;Full Professor", "bibtex": "@misc{\nmutschler2021a,\ntitle={A straightforward line search approach on the expected empirical loss for stochastic deep learning problems},\nauthor={Maximus Mutschler and Andreas Zell},\nyear={2021},\nurl={https://openreview.net/forum?id=LxBFTZT3UOU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=LxBFTZT3UOU", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "5;4;4;4", "wc_review": "839;619;426;405", "wc_reply_reviewers": "0;83;0;66", "wc_reply_authors": "2256;682;650;836", "reply_reviewers": "0;1;0;1", "reply_authors": "4;2;1;1", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 572.25, 175.1447615545495 ], "wc_reply_reviewers_avg": [ 37.25, 37.7317836843158 ], "wc_reply_authors_avg": [ 1106.0, 667.6660842067687 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.0, 1.224744871391589 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Xvsn1GfalpQJ:scholar.google.com/&scioq=A+straightforward+line+search+approach+on+the+expected+empirical+loss+for+stochastic+deep+learning+problems&hl=en&as_sdt=0,33", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of Tuebingen;Eberhard Karls University of T\u00fcbingen", "aff_unique_dep": ";", "aff_unique_url": "https://www.uni-tuebingen.de/;https://www.uni-tuebingen.de/", "aff_unique_abbr": "Uni T\u00fcbingen;Uni T\u00fcbingen", "aff_campus_unique_index": "1", "aff_campus_unique": ";T\u00fcbingen", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "LxhlyKH6VP", "title": "ProGAE: A Geometric Autoencoder-based Generative Model for Disentangling Protein Conformational Space", "track": "main", "status": "Reject", "tldr": "", "abstract": "Understanding the protein conformational landscape is critical, as protein function, as well as modulations thereof due to ligand binding or changes in environment, are intimately connected with structural variations. This work focuses on learning a generative neural network on a simulated ensemble of protein structures obtained using molecular simulation to characterize the distinct structural fluctuations of a protein bound to various drug molecules. Specifically, we use a geometric autoencoder framework to learn separate latent space encodings of the intrinsic and extrinsic geometries of the system. For this purpose, the proposed Protein Geometric AutoEncoder (ProGAE) model is trained on the length of the alpha-carbon pseudobonds and the orientation of the backbone bonds of the protein. Using ProGAE latent embeddings, we reconstruct and generate the conformational ensemble of a protein at or near the experimental resolution. Empowered by the disentangled latent space learning, the intrinsic latent embedding help in geometric error correction, whereas the extrinsic latent embedding is successfully used for classification or property prediction of different drugs bound to a specific protein. Additionally, ProGAE is able to be transferred to the structures of a different state of the same protein or to a completely different protein of different size, where only the dense layer decoding from the latent representation needs to be retrained. Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations, charting the path toward scalable and improved approaches for analyzing and enhancing molecular simulations.", "keywords": "generative models;deep learning;interpretability", "primary_area": "", "supplementary_material": "/attachment/ffaade4b9074d12fc9f1ab9bd9c10ae8681b0281.zip", "author": "Norman Joseph Tatro;Payel Das;Pin-Yu Chen;Vijil Chenthamarakshan;Rongjie Lai", "authorids": "~Norman_Joseph_Tatro1;~Payel_Das1;~Pin-Yu_Chen1;~Vijil_Chenthamarakshan1;~Rongjie_Lai4", "gender": "M;F;M;M;M", "homepage": ";;http://www.pinyuchen.com;https://researcher.watson.ibm.com/researcher/view.php?person=us-ecvijil;https://www.rongjielai.com", "dblp": ";56/7926;39/8969;;", "google_scholar": "jS3UIPgAAAAJ;;jxwlCUUAAAAJ;g9hboJ0AAAAJ;Wp3DnKUAAAAJ", "orcid": "0000-0003-4699-2757;;0000-0003-1039-8369;;", "linkedin": "joseph-tatro/;;pin-yu-chen-940062a2;;", "or_profile": "~Norman_Joseph_Tatro1;~Payel_Das1;~Pin-Yu_Chen1;~Vijil_Chenthamarakshan1;~Rongjie_Lai4", "aff": "Rensselaer Polytechnic Institute;IBM, International Business Machines;International Business Machines;International Business Machines;Rensselaer Polytechnic Institute", "aff_domain": "rpi.edu;us.ibm.com;ibm.com;ibm.com;rpi.edu", "position": "PhD student;Principal Researcher;Research Staff Member;Senior Technical Staff member;Associate Professor", "bibtex": "@misc{\ntatro2021progae,\ntitle={Pro{\\{}GAE{\\}}: A Geometric Autoencoder-based Generative Model for Disentangling Protein Conformational Space},\nauthor={Norman Joseph Tatro and Payel Das and Pin-Yu Chen and Vijil Chenthamarakshan and Rongjie Lai},\nyear={2021},\nurl={https://openreview.net/forum?id=LxhlyKH6VP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=LxhlyKH6VP", "pdf_size": 0, "rating": "4;5;5;7", "confidence": "4;2;4;4", "wc_review": "379;279;428;592", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "741;585;634;285", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 419.5, 113.14702824201791 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 561.25, 169.17502031919489 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16396899287476369976&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;2;0", "aff_unique_norm": "Rensselaer Polytechnic Institute;International Business Machines;International Business Machines Corporation", "aff_unique_dep": ";;", "aff_unique_url": "https://www.rpi.edu;https://www.ibm.com;https://www.ibm.com", "aff_unique_abbr": "RPI;IBM;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "LzhEvTWpzH", "title": "Switching-Aligned-Words Data Augmentation for Neural Machine Translation", "track": "main", "status": "Reject", "tldr": "", "abstract": "In neural machine translation (NMT), data augmentation methods such as back-translation make it possible to use extra monolingual data to help improve translation performance, while it needs extra training data and the in-domain monolingual data is not always available. In this paper, we present a novel data augmentation method for neural machine translation by using only the original training data without extra data. More accurately, we randomly replace words or mixup with their aligned alternatives in another language when training neural machine translation models. Since aligned word pairs appear in the same position of each other during training, it is helpful to form bilingual embeddings which are proved useful to provide a performance boost \\citep{liu2019shared}. Experiments on both small and large scale datasets show that our method significantly outperforms the baseline models.", "keywords": "Machine Translation;Data augmentation", "primary_area": "", "supplementary_material": "", "author": "Fengshun Xiao;Zuchao Li;hai zhao", "authorids": "~Fengshun_Xiao1;~Zuchao_Li1;~hai_zhao1", "gender": ";M;M", "homepage": ";https://zcli-charlie.github.io/;http://bcmi.sjtu.edu.cn/~zhaohai/", "dblp": ";198/9339;25/1145-1.html", "google_scholar": "O6gNeEMAAAAJ;PyzBf5oAAAAJ;https://scholar.google.com.tw/citations?user=4dU5KS0AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Fengshun_Xiao1;~Zuchao_Li1;~hai_zhao1", "aff": "Shanghai Jiaotong University;Shanghai Jiaotong University;Shanghai Jiaotong University", "aff_domain": "sjtu.edu.cn;sjtu.edu.cn;sjtu.edu.cn", "position": "MS student;PhD student;Full Professor", "bibtex": "@misc{\nxiao2021switchingalignedwords,\ntitle={Switching-Aligned-Words Data Augmentation for Neural Machine Translation},\nauthor={Fengshun Xiao and Zuchao Li and hai zhao},\nyear={2021},\nurl={https://openreview.net/forum?id=LzhEvTWpzH}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=LzhEvTWpzH", "pdf_size": 0, "rating": "2;3;4;4", "confidence": "5;4;5;4", "wc_review": "202;346;418;433", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "52;111;108;198", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.25, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 349.75, 91.4231234425952 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 117.25, 52.20811718497421 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:9TvQDPYSDOAJ:scholar.google.com/&scioq=Switching-Aligned-Words+Data+Augmentation+for+Neural+Machine+Translation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Shanghai Jiao Tong University", "aff_unique_dep": "", "aff_unique_url": "https://www.sjtu.edu.cn", "aff_unique_abbr": "SJTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "M3NDrHEGyyO", "title": "Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies", "track": "main", "status": "Reject", "tldr": "", "abstract": "We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the controlled system must satisfy. The baseline policy can arise from a teacher agent, demonstration data or even a heuristic while the constraints might encode safety, fairness or other application-specific requirements. Importantly, the baseline policy may be sub-optimal for the task at hand, and is not guaranteed to satisfy the specified constraints. The key challenge therefore lies in effectively leveraging the baseline policy for faster learning, while still ensuring that the constraints are minimally violated. To reconcile these potentially competing aspects, we propose an iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraint-satisfying set. We analyze the convergence of our algorithm theoretically and provide a finite-sample guarantee. In our empirical experiments on five different control tasks, our algorithm consistently outperforms several state-of-the-art methods, achieving 10 times fewer constraint violations and 40% higher reward on average.", "keywords": "Reinforcement learning with constraints;Safe reinforcement learning", "primary_area": "", "supplementary_material": "", "author": "Tsung-Yen Yang;Justinian Rosca;Karthik R Narasimhan;Peter Ramadge", "authorids": "~Tsung-Yen_Yang2;justinian.rosca@siemens.com;~Karthik_R_Narasimhan1;~Peter_Ramadge1", "gender": ";;M;M", "homepage": "https://sites.google.com/view/tyjimmyyang;;http://www.karthiknarasimhan.com;http://ee.princeton.edu/people/faculty/peter-j-ramadge", "dblp": "204/7980;;147/0322;77/3256", "google_scholar": "g-hQdY8AAAAJ;;euc0GX4AAAAJ;BOMboVoAAAAJ", "orcid": ";;;", "linkedin": "tsung-yen-yang;;;", "or_profile": "~Tsung-Yen_Yang2;justinian.rosca@siemens.com;~Karthik_R_Narasimhan1;~Peter_Ramadge1", "aff": "Princeton University;;Princeton University;Princeton University", "aff_domain": "princeton.edu;;princeton.edu;princeton.edu", "position": "PhD student;;Assistant Professor;Full Professor", "bibtex": "@misc{\nyang2021accelerating,\ntitle={Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies},\nauthor={Tsung-Yen Yang and Justinian Rosca and Karthik R Narasimhan and Peter Ramadge},\nyear={2021},\nurl={https://openreview.net/forum?id=M3NDrHEGyyO}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=M3NDrHEGyyO", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "3;4;4;4", "wc_review": "657;678;270;726", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1213;787;457;1006", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;2", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 582.75, 182.2901190410495 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 865.75, 279.9690831145468 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15400171359750411868&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Princeton University", "aff_unique_dep": "", "aff_unique_url": "https://www.princeton.edu", "aff_unique_abbr": "Princeton", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "M4qXqdw3xC", "title": "Boundary Effects in CNNs: Feature or Bug?", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent studies have shown that the addition of zero padding drives convolutional neural networks (CNNs) to encode a significant amount of absolute position information in their internal representations, while a lack of padding precludes position encoding. Additionally, various studies have used image patches on background canvases (e.g., to accommodate that inputs to CNNs must be rectangular) without consideration that different backgrounds may contain varying levels of position information according to their color. These studies give rise to deeper questions about the role of boundary information in CNNs, that are explored in this paper: (i) What boundary heuristics (e.g., padding type, canvas color) enable optimal encoding of absolute position information for a particular downstream task?; (ii) Where in the latent representations do boundary effects destroy semantic and location information?; (iii) Does encoding position information affect the learning of semantic representations?; (iv) Does encoding position information always improve performance? To provide answers to these questions, we perform the largest case study to date on the role that padding and border heuristics play in CNNs. We first show that zero padding injects optimal position information into CNNs relative to other common padding types. We then design a series of novel tasks which allow us to accurately quantify boundary effects as a function of the distance to the border. A number of semantic objectives reveal the destructive effect of dealing with the border on semantic representations. Further, we demonstrate that the encoding of position information improves separability of learned semantic features. Finally, we demonstrate the implications of these findings on a number of real-world tasks to show that position information can act as a feature or a bug.", "keywords": "Boundary Effects;Absolute Position Information;Padding;Canvas color;Location Dependent Task", "primary_area": "", "supplementary_material": "", "author": "Md Amirul Islam;Matthew Kowal;Sen Jia;Konstantinos G. Derpanis;Neil Bruce", "authorids": "~Md_Amirul_Islam1;~Matthew_Kowal1;~Sen_Jia1;~Konstantinos_G._Derpanis1;~Neil_Bruce1", "gender": "M;M;;M;M", "homepage": "http://www.scs.ryerson.ca/~amirul/;https://mkowal2.github.io/;;http://socs.uoguelph.ca/~brucen/;https://csprofkgd.github.io/", "dblp": ";247/6389;35/3232;https://dblp.uni-trier.de/pers/hd/b/Bruce:Neil_D=_B=;39/253", "google_scholar": "https://scholar.google.ca/citations?user=AeibrqUAAAAJ;FCg8QxUAAAAJ;;Gnezf-4AAAAJ;https://scholar.google.ca/citations?user=3Br8x_gAAAAJ", "orcid": ";;;0000-0002-5710-1107;", "linkedin": ";mkowal2/;;;", "or_profile": "~Md_Amirul_Islam1;~Matthew_Kowal1;~Sen_Jia1;~Neil_Bruce1;~Kosta_Derpanis1", "aff": "Ryerson University;York University;University of Waterloo;University of Guelph;Samsung", "aff_domain": "ryerson.ca;yorku.ca;uwaterloo.ca;uoguelph.ca;samsung.com", "position": "PhD student;PhD student;Postdoc;Associate Professor;Researcher", "bibtex": "@misc{\nislam2021boundary,\ntitle={Boundary Effects in {\\{}CNN{\\}}s: Feature or Bug?},\nauthor={Md Amirul Islam and Matthew Kowal and Sen Jia and Konstantinos G. Derpanis and Neil Bruce},\nyear={2021},\nurl={https://openreview.net/forum?id=M4qXqdw3xC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=M4qXqdw3xC", "pdf_size": 0, "rating": "3;3;7;8", "confidence": "4;4;4;3", "wc_review": "965;861;543;187", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "845;844;812;438", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;2", "rating_avg": [ 5.25, 2.277608394786075 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 639.0, 303.75977350531457 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 734.75, 171.8420422946608 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.6970966755769258, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13940964568206515268&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;3;4", "aff_unique_norm": "Ryerson University;York University;University of Waterloo;University of Guelph;Samsung", "aff_unique_dep": ";;;;Samsung", "aff_unique_url": "https://www.ryerson.ca;https://www.yorku.ca;https://uwaterloo.ca;https://www.uoguelph.ca;https://www.samsung.com", "aff_unique_abbr": "Ryerson;York U;UW;U of G;Samsung", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;1", "aff_country_unique": "Canada;South Korea" }, { "id": "M6PP1Gq076C", "title": "Neural Bootstrapper", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Bootstrapping has been a primary tool for uncertainty quantification, and their theoretical and computational properties have been investigated in the field of statistics and machine learning. However, due to its nature of repetitive computations, the computational burden required to implement bootstrap procedures for the neural network is painfully heavy, and this fact seriously hurdles the practical use of these procedures on the uncertainty estimation of modern deep learning. To overcome the inconvenience, we propose a procedure called \\emph{Neural Bootstrapper} (NeuBoots). We reveal that the NeuBoots stably generate valid bootstrap samples that coincide with the desired target samples with minimal extra computational cost compared to traditional bootstrapping. \nConsequently, NeuBoots makes it feasible to construct bootstrap confidence intervals of outputs of neural networks and quantify their predictive uncertainty. We also suggest NeuBoots for deep convolutional neural networks to consider its utility in image classification tasks, including calibration, detection of out-of-distribution samples, and active learning. Empirical results demonstrate that NeuBoots is significantly beneficial for the above purposes. ", "keywords": "Bootstrapping;Uncertainty Estimation;Deep Learning", "primary_area": "", "supplementary_material": "", "author": "Minsuk Shin;Hyungjoo Cho;Sungbin Lim", "authorids": "mshin@mailbox.sc.edu;~Hyungjoo_Cho1;~Sungbin_Lim1", "gender": ";M;M", "homepage": ";;https://www.sungbin-lim.net", "dblp": ";;206/6907", "google_scholar": ";Pl95pGIAAAAJ;https://scholar.google.com/citations?hl=ko", "orcid": ";;0000-0003-2684-2022", "linkedin": ";;sungbin-lim-43b739b5/", "or_profile": "mshin@mailbox.sc.edu;~Hyungjoo_Cho1;~Sungbin_Lim1", "aff": ";Seoul National University;Ulsan National Institute of Science and Technology", "aff_domain": ";snu.ac.kr;unist.ac.kr", "position": ";PhD student;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=M6PP1Gq076C", "pdf_size": 0, "rating": "3;5;5;5", "confidence": "5;3;3;4", "wc_review": "614;1083;439;435", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 642.75, 264.2540207830337 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8280944237750706681&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1", "aff_unique_norm": "Seoul National University;Ulsan National Institute of Science and Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.snu.ac.kr;https://www.unist.ac.kr", "aff_unique_abbr": "SNU;UNIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "id": "M71R_ivbTQP", "title": "Extract Local Inference Chains of Deep Neural Nets", "track": "main", "status": "Reject", "tldr": "", "abstract": "We study how to explain the main steps/chains of inference that a deep neural net (DNN) relies on to produce predictions in a local region of data space. This problem is related to network pruning and interpretable machine learning but the highlighted differences are: (1) fine-tuning of neurons/filters is forbidden: only exact copies are allowed; (2) we target an extremely high pruning rate, e.g., $\\geq 95\\%$; (3) the interpretation is for the whole inference process in a local region rather than for individual neurons/filters or on a single sample. In this paper, we introduce an efficient method, \\name, to extract the local inference chains by optimizing a differentiable sparse scoring for the filters and layers to preserve the outputs on given data from a local region. Thereby, \\name~can extract an extremely small sub-network composed of filters exactly copied from the original DNN by removing the filters/layers with small scores. We then visualize the sub-network by applying existing interpretation technique to the retained layer/filter/neurons and on any sample from the local region. Its architecture reveals how the inference process stitches and integrates the information layer by layer and filter by filter. We provide detailed and insightful case studies together with three quantitative analyses over thousands of trials to demonstrate the quality, sparsity, fidelity and accuracy of the interpretation within the assigned local regions and over unseen data. In our empirical study, \\name~significantly enriches the interpretation and makes the inner mechanism of DNNs more transparent than before. ", "keywords": "Model Interpretability;Model Pruning;Attribution;Model Visualization", "primary_area": "", "supplementary_material": "/attachment/e772e2a9263bc376b69f25e4a457c87602f03e3d.zip", "author": "Haiyan Zhao;Tianyi Zhou;Guodong Long;Jing Jiang;Chengqi Zhang", "authorids": "~Haiyan_Zhao2;~Tianyi_Zhou1;~Guodong_Long2;~Jing_Jiang6;~Chengqi_Zhang1", "gender": "M;M;M;F;M", "homepage": "http://haiyan.tech/;https://tianyizhou.github.io/;https://www.uts.edu.au/staff/guodong.long;https://www.uts.edu.au/staff/jing.jiang;https://research.polyu.edu.hk/en/persons/chengqi-zhang", "dblp": ";88/8205-1;34/10089;68/1974-2;71/964", "google_scholar": ";OKvgizMAAAAJ;https://scholar.google.com.au/citations?user=Pl8m7hMAAAAJ;https://scholar.google.com.au/citations?hl=en;https://scholar.google.com.au/citations?user=B6lBmqEAAAAJ", "orcid": ";0000-0001-5348-0632;0000-0003-3740-9515;;0000-0001-5715-7154", "linkedin": ";tianyizhou;;;chengqi-zhang-55aa8910/", "or_profile": "~Haiyan_Zhao2;~Tianyi_Zhou1;~Guodong_Long2;~Jing_Jiang6;~Chengqi_Zhang1", "aff": ";University of Washington, Seattle;University of Technology Sydney;University of Technology Sydney;University of Technology Sydney", "aff_domain": ";uw.edu;uts.edu.au;uts.edu.au;uts.edu.au", "position": ";PhD student;Associate Professor;Lecturer;Full Professor", "bibtex": "@misc{\nzhao2021extract,\ntitle={Extract Local Inference Chains of Deep Neural Nets},\nauthor={Haiyan Zhao and Tianyi Zhou and Guodong Long and Jing Jiang and Chengqi Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=M71R_ivbTQP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=M71R_ivbTQP", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "2;3;3;4", "wc_review": "820;396;404;1412", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "609;576;730;1604", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;3", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 758.0, 414.7047142244708 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 879.75, 422.05827500476755 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:LlbIgNvD8o8J:scholar.google.com/&scioq=Extract+Local+Inference+Chains+of+Deep+Neural+Nets&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "University of Washington;University of Technology Sydney", "aff_unique_dep": ";", "aff_unique_url": "https://www.washington.edu;https://www.uts.edu.au", "aff_unique_abbr": "UW;UTS", "aff_campus_unique_index": "0", "aff_campus_unique": "Seattle;", "aff_country_unique_index": "0;1;1;1", "aff_country_unique": "United States;Australia" }, { "title": "Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3300", "id": "M88oFvqp_9", "poster": "", "openreview": "https://openreview.net/forum?id=M88oFvqp_9", "slides": "https://iclr.cc/virtual/2021/poster/3300", "video": "https://iclr.cc/virtual/2021/poster/3300", "author_site": "Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee", "tldr": "", "abstract": "We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e.g., dogs and cars). The goal is to learn a generative model that learns an intermediate distribution, which borrows a subset of properties from each domain, enabling the generation of images that did not exist in any domain exclusively. This challenging problem requires an accurate disentanglement of object shape, appearance, and background from each domain, so that the appearance and shape factors from the two domains can be interchanged. We augment an existing approach that can disentangle factors within a single domain but struggles to do so across domains. Our key technical contribution is to represent object appearance with a differentiable histogram of visual features, and to optimize the generator so that two images with the same latent appearance factor but different latent shape factors produce similar histograms. On multiple multi-domain datasets, we demonstrate our method leads to accurate and consistent appearance and shape transfer across domains.", "keywords": "multi-domain disentanglement;generative adversarial networks;appearance transfer", "primary_area": "", "supplementary_material": "", "author": "Utkarsh Ojha;Krishna Kumar Singh;Yong Jae Lee", "authorids": "~Utkarsh_Ojha1;~Krishna_Kumar_Singh4;~Yong_Jae_Lee2", "gender": "M;M;M", "homepage": "https://utkarshojha.github.io/;http://krsingh.cs.ucdavis.edu/;https://pages.cs.wisc.edu/~yongjaelee/", "dblp": "194/5532;97/7285;15/5471", "google_scholar": "QGdSgfoAAAAJ;3TMipekAAAAJ;4GTpCxcAAAAJ", "orcid": ";;", "linkedin": "utkarsh-ojha-16a20b11b/;krishna-kumar-singh-66586128;", "or_profile": "~Utkarsh_Ojha1;~Krishna_Kumar_Singh3;~Yong_Jae_Lee1", "aff": "University of California, Davis;Adobe Research;University of California, Davis", "aff_domain": "ucdavis.edu;adobe.com;cs.ucdavis.edu", "position": "PhD student;Research Scientist;Associate Professor", "bibtex": "@inproceedings{\nojha2021generating,\ntitle={Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains},\nauthor={Utkarsh Ojha and Krishna Kumar Singh and Yong Jae Lee},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=M88oFvqp_9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;5;7;7", "confidence": "4;4;4;3", "wc_review": "960;288;222;367", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1005;605;88;510", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.0, 1.0 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 459.25, 293.6301883321945 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 552.0, 325.9900305224072 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15513193294117291512&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=M88oFvqp_9", "email": "ucdavis.edu;adobe.com;cs.ucdavis.edu", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of California, Davis;Adobe", "aff_unique_dep": ";Adobe Research", "aff_unique_url": "https://www.ucdavis.edu;https://research.adobe.com", "aff_unique_abbr": "UC Davis;Adobe", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Davis;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "M9hdyCNlWaf", "title": "Sparse Uncertainty Representation in Deep Learning with Inducing Weights", "track": "main", "status": "Reject", "tldr": "", "abstract": "Bayesian neural networks and deep ensembles represent two modern paradigms of uncertainty quantification in deep learning. Yet these approaches struggle to scale mainly due to memory inefficiency issues, since they require parameter storage several times higher than their deterministic counterparts. To address this, we augment the weight matrix of each layer with a small number of inducing weights, thereby projecting the uncertainty quantification into such low dimensional spaces. We further extend Matheron's conditional Gaussian sampling rule to enable fast weight sampling, whichenable our inference method to maintain reasonable run-time as compared with ensembles. Importantly, our approach achieves competitive performance to the state-of-the-art in prediction and uncertainty estimation tasks with fully connected neural networks and ResNets, while reducing the parameter size to $\\leq 47.9\\%$ of that of a single neural network. ", "keywords": "Bayesian neural networks;uncertainty estimation;memory efficiency", "primary_area": "", "supplementary_material": "/attachment/3f34d14de08a382bccc43dc3ee9761f316599de3.zip", "author": "Hippolyt Ritter;Martin Kukla;Cheng Zhang;Yingzhen Li", "authorids": "~Hippolyt_Ritter1;~Martin_Kukla1;~Cheng_Zhang1;~Yingzhen_Li1", "gender": ";;F;F", "homepage": ";;http://cheng-zhang.org;http://yingzhenli.net/home/en/", "dblp": "203/4484;;82/6384-5;117/9230", "google_scholar": ";DksFc5QAAAAJ;r40iAwIAAAAJ;https://scholar.google.se/citations?hl=en", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Hippolyt_Ritter1;~Martin_Kukla1;~Cheng_Zhang1;~Yingzhen_Li1", "aff": "University College London;Microsoft Research;Microsoft;Imperial College London", "aff_domain": "ucl.ac.uk;research.microsoft.com;microsoft.com;imperial.ac.uk", "position": "PhD student;Researcher;Principal Researcher;Lecturer", "bibtex": "@misc{\nritter2021sparse,\ntitle={Sparse Uncertainty Representation in Deep Learning with Inducing Weights},\nauthor={Hippolyt Ritter and Martin Kukla and Cheng Zhang and Yingzhen Li},\nyear={2021},\nurl={https://openreview.net/forum?id=M9hdyCNlWaf}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=M9hdyCNlWaf", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "2;4;4;4", "wc_review": "222;1031;526;277", "wc_reply_reviewers": "130;177;84;0", "wc_reply_authors": "909;971;852;579", "reply_reviewers": "1;1;2;0", "reply_authors": "2;3;3;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 514.0, 319.71315268534073 ], "wc_reply_reviewers_avg": [ 97.75, 65.31605851549831 ], "wc_reply_authors_avg": [ 827.75, 149.65522877601038 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1497620184076566613&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "University College London;Microsoft;Imperial College London", "aff_unique_dep": ";Microsoft Research;", "aff_unique_url": "https://www.ucl.ac.uk;https://www.microsoft.com/en-us/research;https://www.imperial.ac.uk", "aff_unique_abbr": "UCL;MSR;ICL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "United Kingdom;United States" }, { "id": "MA8eT-vUPvZ", "title": "Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift", "track": "main", "status": "Reject", "tldr": "", "abstract": "A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. However, this assumption is violated in almost all practical applications: machine learning systems are regularly tested under distribution shift, due to temporal correlations, particular end users, or other factors. In this work, we consider the setting where the training data are structured into groups and test time shifts correspond to changes in the group distribution. Prior work has approached this problem by attempting to be robust to all possible test time distributions, which may degrade average performance. In contrast, we propose to use ideas from meta-learning to learn models that are adaptable, such that they can adapt to shift at test time using a batch of unlabeled test points. We acquire such models by learning to adapt to training batches sampled according to different distributions, which simulate structural shifts that may occur at test time. Our primary contribution is to introduce the framework of adaptive risk minimization (ARM), a formalization of this setting that lends itself to meta-learning. We develop meta-learning methods for solving the ARM problem, and compared to a variety of prior methods, these methods provide substantial gains on image classification problems in the presence of shift.", "keywords": "meta-learning;distribution shift;distributional robustness;test time adaptation", "primary_area": "", "supplementary_material": "", "author": "Marvin Mengxin Zhang;Henrik Marklund;Nikita Dhawan;Abhishek Gupta;Sergey Levine;Chelsea Finn", "authorids": "~Marvin_Mengxin_Zhang2;marklund@cs.stanford.edu;nikitadhawan@berkeley.edu;~Abhishek_Gupta1;~Sergey_Levine1;~Chelsea_Finn1", "gender": ";;;M;M;F", "homepage": ";;;https://homes.cs.washington.edu/~abhgupta/;https://people.eecs.berkeley.edu/~svlevine/;https://ai.stanford.edu/~cbfinn/", "dblp": ";;;18/6404-4;80/7594;131/1783", "google_scholar": ";;;1wLVDP4AAAAJ;8R35rCwAAAAJ;vfPE6hgAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Marvin_Mengxin_Zhang2;marklund@cs.stanford.edu;nikitadhawan@berkeley.edu;~Abhishek_Gupta1;~Sergey_Levine1;~Chelsea_Finn1", "aff": ";;;University of California, Berkeley;Google;Google", "aff_domain": ";;;berkeley.edu;google.com;google.com", "position": ";;;PhD student;Research Scientist;Research Scientist", "bibtex": "@misc{\nzhang2021adaptive,\ntitle={Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift},\nauthor={Marvin Mengxin Zhang and Henrik Marklund and Nikita Dhawan and Abhishek Gupta and Sergey Levine and Chelsea Finn},\nyear={2021},\nurl={https://openreview.net/forum?id=MA8eT-vUPvZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=MA8eT-vUPvZ", "pdf_size": 0, "rating": "5;6;7", "confidence": "3;4;5", "wc_review": "240;584;305", "wc_reply_reviewers": "0;167;51", "wc_reply_authors": "739;627;357", "reply_reviewers": "0;1;1", "reply_authors": "2;1;2", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 376.3333333333333, 149.22093984722414 ], "wc_reply_reviewers_avg": [ 72.66666666666667, 69.87767088912516 ], "wc_reply_authors_avg": [ 574.3333333333334, 160.33575881741277 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 139, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4785971656428156243&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;1", "aff_unique_norm": "University of California, Berkeley;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.berkeley.edu;https://www.google.com", "aff_unique_abbr": "UC Berkeley;Google", "aff_campus_unique_index": "0;1;1", "aff_campus_unique": "Berkeley;Mountain View", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "MAF2IYqkEYD", "title": "Unsupervised Learning of Slow Features for Data Efficient Regression", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Research in computational neuroscience suggests that the human brain's unparalleled data efficiency is a result of highly efficient mechanisms to extract and organize slowly changing high level features from continuous sensory inputs.\nIn this paper, we apply this \\textit{slowness principle} to a state of the art representation learning method with the goal of performing data efficient learning of down-stream regression tasks.\nTo this end, we propose the \\textit{slow variational autoencoder} (S-VAE), an extension to the $\\beta$-VAE which applies a temporal similarity constraint to the latent representations.\nWe empirically compare our method to the $\\beta$-VAE and the Temporal Difference VAE (TD-VAE), a state-of-the-art method for next frame prediction in latent space with temporal abstraction.\nWe evaluate the three methods against their data-efficiency on down-stream tasks using a synthetic 2D ball tracking dataset and a dataset generated using the DeepMind Lab environment.\nIn both tasks, the proposed method outperformed the baselines both with dense and sparse labeled data.\nFurthermore, the S-VAE achieved similar performance compared to the baselines with 1/5 to 1/11 of data.", "keywords": "Representation Learning;Semi-supervised Learning;Data Efficiency;Slowness Principle", "primary_area": "", "supplementary_material": "", "author": "Oliver Struckmeier;Kshitij Tiwari;Ville Kyrki", "authorids": "~Oliver_Struckmeier1;kshitij.tiwari@oulu.fi;ville.kyrki@aalto.fi", "gender": "M;;", "homepage": ";;", "dblp": ";;", "google_scholar": "https://scholar.google.fi/citations?user=TSZpN5gAAAAJ;;", "orcid": "0000-0003-4536-3190;;", "linkedin": "oliverstruckmeier/;;", "or_profile": "~Oliver_Struckmeier1;kshitij.tiwari@oulu.fi;ville.kyrki@aalto.fi", "aff": "Aalto University;;", "aff_domain": "aalto.fi;;", "position": "PhD student;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=MAF2IYqkEYD", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "4;4;4;4", "wc_review": "391;447;321;493", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "175;185;251;71", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 413.0, 64.2339474110069 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 170.5, 64.44183423832689 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18047538814578423594&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Aalto University", "aff_unique_dep": "", "aff_unique_url": "https://www.aalto.fi", "aff_unique_abbr": "Aalto", "aff_country_unique_index": "0", "aff_country_unique": "Finland" }, { "id": "MBIy8WLgsw", "title": "Efficient Model Performance Estimation via Feature Histories", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "An essential step in the task of model selection, such as hyper-parameter optimization (HPO) or neural architecture search (NAS), is the process of estimating a candidate model's (hyper-parameter or architecture) performance. Due to the high computational cost of training models until full convergence, it is necessary to develop efficient methods that can accurately estimate a model's best performance using only a small time budget. To this end, we propose a novel performance estimation method which uses a history of model features observed during the early stages of training to obtain an estimate of final performance. Our method is versatile. It can be combined with different search algorithms and applied to various configuration spaces in HPO and NAS. Using a sampling-based search algorithm and parallel computing, our method can find an architecture which is better than DARTS and with an 80\\% reduction in search time.", "keywords": "Hyperparameter Optimization;Neural Architecture Search", "primary_area": "", "supplementary_material": "", "author": "Shengcao Cao;Xiaofang Wang;Kris M. Kitani", "authorids": "~Shengcao_Cao1;~Xiaofang_Wang1;~Kris_M._Kitani1", "gender": "M;M;M", "homepage": "https://shengcao-cao.github.io/;http://www.cs.cmu.edu/~xiaofan2/;http://www.cs.cmu.edu/~kkitani/", "dblp": "236/4681;;42/163", "google_scholar": "yMYTz3AAAAAJ;YQomDVsAAAAJ;yv3sH74AAAAJ", "orcid": ";;0000-0002-9389-4060", "linkedin": ";;", "or_profile": "~Shengcao_Cao1;~Xiaofang_Wang1;~Kris_M._Kitani1", "aff": "Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University", "aff_domain": "cmu.edu;cmu.edu;cmu.edu", "position": "MS student;PhD student;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=MBIy8WLgsw", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "5;4;4;2", "wc_review": "363;305;220;187", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 268.75, 69.38434621728449 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.899228803025897, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10333142299389334231&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2779", "id": "MBOyiNnYthd", "poster": "", "openreview": "https://openreview.net/forum?id=MBOyiNnYthd", "slides": "https://iclr.cc/virtual/2021/poster/2779", "video": "https://iclr.cc/virtual/2021/poster/2779", "author_site": "Rianne van den Berg, Alexey Gritsenko, Mostafa Dehghani, Casper S\u00f8nderby, Tim Salimans", "tldr": "", "abstract": "In this paper we analyse and improve integer discrete flows for lossless compression. Integer discrete flows are a recently proposed class of models that learn invertible transformations for integer-valued random variables. Their discrete nature makes them particularly suitable for lossless compression with entropy coding schemes. We start by investigating a recent theoretical claim that states that invertible flows for discrete random variables are less flexible than their continuous counterparts. We demonstrate with a proof that this claim does not hold for integer discrete flows due to the embedding of data with finite support into the countably infinite integer lattice. Furthermore, we zoom in on the effect of gradient bias due to the straight-through estimator in integer discrete flows, and demonstrate that its influence is highly dependent on architecture choices and less prominent than previously thought. Finally, we show how different architecture modifications improve the performance of this model class for lossless compression, and that they also enable more efficient compression: a model with half the number of flow layers performs on par with or better than the original integer discrete flow model.", "keywords": "normalizing flows;lossless source compression;generative modeling", "primary_area": "", "supplementary_material": "", "author": "Rianne van den Berg;Alexey A. Gritsenko;Mostafa Dehghani;Casper Kaae S\u00f8nderby;Tim Salimans", "authorids": "~Rianne_van_den_Berg1;~Alexey_A._Gritsenko1;~Mostafa_Dehghani1;~Casper_Kaae_S\u00f8nderby1;~Tim_Salimans1", "gender": "F;M;M;M;Not Specified", "homepage": "https://research.google/people/RiannevandenBerg/;http://mostafadehghani.com/;http://casperkaae.github.ai;;", "dblp": "198/1077;125/4062;;116/2791;30/11478", "google_scholar": "KARgiboAAAAJ;https://scholar.google.nl/citations?user=MiHOX3QAAAAJ;https://scholar.google.dk/citations?user=yzGdbKoAAAAJ;;https://scholar.google.nl/citations?user=zTy9cUwAAAAJ", "orcid": "0000-0001-5076-2802;;;;", "linkedin": ";;;;agritsenko/", "or_profile": "~Rianne_van_den_Berg1;~Mostafa_Dehghani1;~Casper_Kaae_S\u00f8nderby1;~Tim_Salimans1;~Alexey_Alexeevich_Gritsenko1", "aff": "Google;Google DeepMind;Google;Google;Google", "aff_domain": "google.com;google.com;google.com;google.com;google.com", "position": "Research scientist;Research Scientist;Research Scientist;Research Scientist;Researcher", "bibtex": "@inproceedings{\nberg2021idf,\ntitle={{\\{}IDF{\\}}++: Analyzing and Improving Integer Discrete Flows for Lossless Compression},\nauthor={Rianne van den Berg and Alexey A. Gritsenko and Mostafa Dehghani and Casper Kaae S{\\o}nderby and Tim Salimans},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MBOyiNnYthd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;4;4", "wc_review": "626;238;551;286", "wc_reply_reviewers": "11;50;71;52", "wc_reply_authors": "1051;179;500;712", "reply_reviewers": "1;1;1;1", "reply_authors": "2;1;1;2", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 425.25, 166.25789454940178 ], "wc_reply_reviewers_avg": [ 46.0, 21.805962487356524 ], "wc_reply_authors_avg": [ 610.5, 317.31096734906595 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 54, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10516598853901370692&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=MBOyiNnYthd", "email": "google.com;google.com;google.com;google.com;google.com", "author_num": 5, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;1;0;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "MBdafA3G9k", "title": "Visual Imitation with Reinforcement Learning using Recurrent Siamese Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "It would be desirable for a reinforcement learning (RL) based agent to learn behaviour by merely watching a demonstration. However, defining rewards that facilitate this goal within the RL paradigm remains a challenge. Here we address this problem with Siamese networks, trained to compute distances between observed behaviours and an agent's behaviours. We use an RNN-based comparator model to learn such distances in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we have also found that the inclusion of multi-task data and an additional image encoding loss helps enforce temporal consistency and improve policy learning. These two components appear to balance reward for matching a specific instance of a behaviour versus that behaviour in general. Furthermore, we focus here on a particularly challenging form of this problem where only a single demonstration is provided for a given task -- the one-shot learning setting. We demonstrate our approach on humanoid, dog and raptor agents in 2D and a 3D quadruped and humanoid. In these environments, we show that our method outperforms the state-of-the-art, GAIfO (i.e. GAIL without access to actions) and TCNs.", "keywords": "Reinforcement Learning;Imitation learning", "primary_area": "", "supplementary_material": "", "author": "Glen Berseth;Florian Golemo;Christopher Pal", "authorids": "~Glen_Berseth1;~Florian_Golemo1;~Christopher_Pal1", "gender": "M;M;", "homepage": "http://fracturedplane.com/;https://fgolemo.github.io/;https://scholar.google.ca/citations?user=1ScWJOoAAAAJ&hl=en&oi=ao", "dblp": "147/5478;08/8643;45/1217", "google_scholar": "https://scholar.google.ca/citations?user=-WZcuuwAAAAJ;https://scholar.google.de/citations?user=qvRf9xsAAAAJ;https://scholar.google.ca/citations?user=1ScWJOoAAAAJ", "orcid": "0000-0001-7351-8028;0000-0001-9238-7764;", "linkedin": "glen-berseth-0523278b?trk=hp-identity-name;;", "or_profile": "~Glen_Berseth1;~Florian_Golemo1;~Christopher_Pal1", "aff": "University of California, Berkeley;Mila;Polytechnique Montreal", "aff_domain": "berkeley.edu;mila.quebec;polymtl.ca", "position": "Postdoc;Postdoc;Full Professor", "bibtex": "@misc{\nberseth2021visual,\ntitle={Visual Imitation with Reinforcement Learning using Recurrent Siamese Networks},\nauthor={Glen Berseth and Florian Golemo and Christopher Pal},\nyear={2021},\nurl={https://openreview.net/forum?id=MBdafA3G9k}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=MBdafA3G9k", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;3;4;4", "wc_review": "1152;397;529;624", "wc_reply_reviewers": "175;268;116;0", "wc_reply_authors": "1150;1155;813;256", "reply_reviewers": "2;2;2;0", "reply_authors": "4;3;3;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 675.5, 286.67446694813964 ], "wc_reply_reviewers_avg": [ 139.75, 97.19149911386283 ], "wc_reply_authors_avg": [ 843.5, 366.4222291291837 ], "reply_reviewers_avg": [ 1.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.75, 1.0897247358851685 ], "replies_avg": [ 23, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Hkd9gp2KhIgJ:scholar.google.com/&scioq=Visual+Imitation+with+Reinforcement+Learning+using+Recurrent+Siamese+Networks&hl=en&as_sdt=0,33", "gs_version_total": 2, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of California, Berkeley;Mila;Polytechnique Montreal", "aff_unique_dep": ";Quebec Artificial Intelligence Institute;", "aff_unique_url": "https://www.berkeley.edu;https://mila.quebec;https://www.polymtl.ca", "aff_unique_abbr": "UC Berkeley;Mila;PolyMTL", "aff_campus_unique_index": "0;2", "aff_campus_unique": "Berkeley;;Montreal", "aff_country_unique_index": "0;1;1", "aff_country_unique": "United States;Canada" }, { "title": "Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2591", "id": "MBpHUFrcG2x", "poster": "", "openreview": "https://openreview.net/forum?id=MBpHUFrcG2x", "slides": "https://iclr.cc/virtual/2021/poster/2591", "video": "https://iclr.cc/virtual/2021/poster/2591", "author_site": "Chris Cannella, Mohammadreza Soltani, VAHID TAROKH", "tldr": "", "abstract": "We introduce Projected Latent Markov Chain Monte Carlo (PL-MCMC), a technique for sampling from the exact conditional distributions learned by normalizing flows. As a conditional sampling method, PL-MCMC enables Monte Carlo Expectation Maximization (MC-EM) training of normalizing flows from incomplete data. Through experimental tests applying normalizing flows to missing data tasks for a variety of data sets, we demonstrate the efficacy of PL-MCMC for conditional sampling from normalizing flows.", "keywords": "Conditional Sampling;Normalizing Flows;Markov Chain Monte Carlo;Missing Data Inference", "primary_area": "", "supplementary_material": "/attachment/7142f0503843ba6a0b7b8b11d57c8ef8da7e7880.zip", "author": "Chris Cannella;Mohammadreza Soltani;Vahid Tarokh", "authorids": "~Chris_Cannella1;~Mohammadreza_Soltani1;~Vahid_Tarokh1", "gender": "M;M;", "homepage": ";https://mrezasoltani.github.io/;", "dblp": ";150/5633;", "google_scholar": "http://scholar.google.com/citations?user=T5vA9UIAAAAJ;;", "orcid": ";;", "linkedin": ";mohammadreza-soltani-99bb1ba0/;", "or_profile": "~Chris_Cannella1;~Mohammadreza_Soltani1;~Vahid_Tarokh1", "aff": "Duke University;Duke University;", "aff_domain": "duke.edu;duke.edu;", "position": "PhD student;Postdoc;", "bibtex": "@inproceedings{\ncannella2021projected,\ntitle={Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows},\nauthor={Chris Cannella and Mohammadreza Soltani and Vahid Tarokh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MBpHUFrcG2x}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;3;4", "wc_review": "392;219;323", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "694;444;736", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 311.3333333333333, 71.10711794343955 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 624.6666666666666, 128.89616837680717 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10645930535193696883&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=MBpHUFrcG2x", "email": "duke.edu;duke.edu;", "author_num": 3, "aff_unique_index": "0;0", "aff_unique_norm": "Duke University", "aff_unique_dep": "", "aff_unique_url": "https://www.duke.edu", "aff_unique_abbr": "Duke", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "MCe-j2-mVnA", "title": "Overcoming barriers to the training of effective learned optimizers", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.", "keywords": "learned optimizers;meta-learning", "primary_area": "", "supplementary_material": "", "author": "Luke Metz;Niru Maheswaranathan;C. Daniel Freeman;Ben Poole;Jascha Sohl-Dickstein", "authorids": "~Luke_Metz1;~Niru_Maheswaranathan1;~C._Daniel_Freeman1;~Ben_Poole1;~Jascha_Sohl-Dickstein2", "gender": "M;M;M;M;M", "homepage": "http://lukemetz.com;https://github.com/danielfreeman11/;https://cs.stanford.edu/~poole;http://sohldickstein.com;http://niru.dev/", "dblp": ";190/7046;16/10397;51/7117;155/7407", "google_scholar": "jCOmCb4AAAAJ;t5Xsx0IAAAAJ;i5FMLA4AAAAJ;-3zYIjQAAAAJ;bEOT7ScAAAAJ", "orcid": ";;;;", "linkedin": ";daniel-freeman-6952136?trk=hp-identity-name;;;", "or_profile": "~Luke_Metz1;~C._Daniel_Freeman1;~Ben_Poole1;~Jascha_Sohl-Dickstein1;~Niru_Maheswaranathan2", "aff": "Google;Google Research;Google;Google;Google", "aff_domain": "google.com;google.com;google.com;google.com;google.com", "position": "Research Scientist;Software Engineer;Research Scientist;Research Scientist;Research Engineer", "bibtex": "@misc{\nmetz2021overcoming,\ntitle={Overcoming barriers to the training of effective learned optimizers},\nauthor={Luke Metz and Niru Maheswaranathan and C. Daniel Freeman and Ben Poole and Jascha Sohl-Dickstein},\nyear={2021},\nurl={https://openreview.net/forum?id=MCe-j2-mVnA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=MCe-j2-mVnA", "pdf_size": 0, "rating": "4;5;7", "confidence": "3;2;3", "wc_review": "547;233;131", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "713;370;272", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 1.247219128924647 ], "confidence_avg": [ 2.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 303.6666666666667, 177.02981544235863 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 451.6666666666667, 189.07200274557368 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.18898223650461363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:45Vh3j_p1UMJ:scholar.google.com/&scioq=Overcoming+barriers+to+the+training+of+effective+learned+optimizers&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "MD3D5UbTcb1", "title": "A Unified View on Graph Neural Networks as Graph Signal Denoising", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph Neural Networks (GNNs) have risen to prominence in learning representations for graph structured data. A single GNN layer typically consists of a feature transformation and a feature aggregation operation. The former normally uses feed-forward networks to transform features, while the latter aggregates the transformed features over the graph. Numerous recent works have proposed GNN models with different designs in the aggregation operation. In this work, we establish mathematically that the aggregation processes in a group of representative GNN models including GCN, GAT, PPNP, and APPNP can be regarded as (approximately) solving a graph denoising problem with a smoothness assumption. Such a unified view across GNNs not only provides a new perspective to understand a variety of aggregation operations but also enables us to develop a unified graph neural network framework UGNN. To demonstrate its promising potential, we instantiate a novel GNN model, ADA-UGNN, derived from UGNN, to handle graphs with adaptive smoothness across nodes. Comprehensive experiments show the effectiveness of ADA-UGNN. ", "keywords": "Graph Neural Networks;Graph Signal Denoising;Smoothness", "primary_area": "", "supplementary_material": "", "author": "Yao Ma;Xiaorui Liu;Tong Zhao;Yozen Liu;Jiliang Tang;Neil Shah", "authorids": "~Yao_Ma3;~Xiaorui_Liu1;~Tong_Zhao3;yliu2@snap.com;~Jiliang_Tang1;nshah@snap.com", "gender": "M;M;M;;M;", "homepage": "https://yaoma24.github.io/;https://sites.google.com/ncsu.edu/xiaorui/;https://tzhao.io/;;https://www.cse.msu.edu/~tangjili/;", "dblp": "212/7871.html;172/0995;94/6503-3;;64/10812;", "google_scholar": "wf9TTOIAAAAJ;NhvN1KoAAAAJ;05cRc-MAAAAJ;;WtzKMWAAAAAJ;", "orcid": ";0000-0001-8217-5688;0000-0001-7660-1732;;0000-0001-7125-3898;", "linkedin": ";;;;;", "or_profile": "~Yao_Ma3;~Xiaorui_Liu1;~Tong_Zhao3;yliu2@snap.com;~Jiliang_Tang1;nshah@snap.com", "aff": "Michigan State University;Michigan State University;Amazon;;Michigan State University;", "aff_domain": "msu.edu;msu.edu;amazon.com;;msu.edu;", "position": "PhD student;PhD student;Applied Scientist Intern;;Assistant Professor;", "bibtex": "@misc{\nma2021a,\ntitle={A Unified View on Graph Neural Networks as Graph Signal Denoising},\nauthor={Yao Ma and Xiaorui Liu and Tong Zhao and Yozen Liu and Jiliang Tang and Neil Shah},\nyear={2021},\nurl={https://openreview.net/forum?id=MD3D5UbTcb1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=MD3D5UbTcb1", "pdf_size": 0, "rating": "3;3;6;6;7", "confidence": "4;5;3;3;4", "wc_review": "278;498;420;571;375", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "1321;1250;1909;750;716", "reply_reviewers": "0;0;0;0;0", "reply_authors": "3;2;3;1;1", "rating_avg": [ 5.0, 1.6733200530681511 ], "confidence_avg": [ 3.8, 0.7483314773547882 ], "wc_review_avg": [ 428.4, 100.71861794127241 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1189.2, 437.2630329675721 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.8944271909999159 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.6388765649999399, "gs_citation": 207, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16437545869045694114&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "Michigan State University;Amazon", "aff_unique_dep": ";Amazon.com, Inc.", "aff_unique_url": "https://www.msu.edu;https://www.amazon.com", "aff_unique_abbr": "MSU;Amazon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "MDX3F0qAfm3", "title": "Can We Use Gradient Norm as a Measure of Generalization Error for Model Selection in Practice?", "track": "main", "status": "Reject", "tldr": "", "abstract": "The recent theoretical investigation (Li et al., 2020) on the upper bound of generalization error of deep neural networks (DNNs) demonstrates the potential of using the gradient norm as a measure that complements validation accuracy for model selection in practice. In this work, we carry out empirical studies using several commonly-used neural network architectures and benchmark datasets to understand the effectiveness and efficiency of using gradient norm as the model selection criterion, especially in the settings of hyper-parameter optimization. While strong correlations between the generalization error and the gradient norm measures have been observed, we find the computation of gradient norm is time consuming due to the high gradient complexity. To balance the trade-off between efficiency and effectiveness, we propose to use an accelerated approximation (Goodfellow, 2015) of gradient norm that only computes the loss gradient in the Fully-Connected Layer (FC Layer) of DNNs with significantly reduced computation cost (200~20,000 times faster). Our empirical studies clearly find that the use of approximated gradient norm, as one of the hyper-parameter search objectives, can select the models with lower generalization error, but the efficiency is still low (marginal accuracy improvement but with high computation overhead). Our results also show that the bandit-based or population-based algorithms, such as BOHB, perform poorer with gradient norm objectives, since the correlation between gradient norm and generalization error is not always consistent across phases of the training process. Finally, gradient norm also fails to predict the generalization performance of models based on different architectures, in comparison with state of the art algorithms and metrics.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/c63313cc5119d5d976378b740cb30a86ce4f3c89.zip", "author": "Haozhe An;Haoyi Xiong;Xuhong Li;Xingjian Li;Dejing Dou;Zhanxing Zhu", "authorids": "~Haozhe_An1;~Haoyi_Xiong1;~Xuhong_Li3;~Xingjian_Li1;~Dejing_Dou1;~Zhanxing_Zhu1", "gender": ";M;M;M;M;M", "homepage": "https://haozhe-an.github.io;https://sites.google.com/site/haoyixiongshomepage/;;https://zhanxingzhu.github.io/;https://ix.cs.uoregon.edu/~dou/;", "dblp": "263/7358;06/2700;79/8061-2;87/7756.html;26/2854.html;76/5330-2.html", "google_scholar": ";f_Kcie0AAAAJ;https://scholar.google.com/citations?hl=en;a2sHceIAAAAJ;qBHsQ04AAAAJ;https://scholar.google.com/citations?hl=zh-CN", "orcid": ";;;;;", "linkedin": ";;;;;xuhong-li-4b2776a9/", "or_profile": "~Haozhe_An1;~Haoyi_Xiong1;~Xingjian_Li1;~Zhanxing_Zhu1;~Dejing_Dou4;~Xuhong_LI1", "aff": "Apple;Baidu;Baidu;Peking University;University of Oregon;Baidu", "aff_domain": "apple.com;baidu.com;baidu.com;pku.edu.cn;uoregon.edu;baidu.com", "position": "Intern;Principal Researcher;Senior Researcher;Assistant Professor;Full Professor;Researcher", "bibtex": "@misc{\nan2021can,\ntitle={Can We Use Gradient Norm as a Measure of Generalization Error for Model Selection in Practice?},\nauthor={Haozhe An and Haoyi Xiong and Xuhong Li and Xingjian Li and Dejing Dou and Zhanxing Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=MDX3F0qAfm3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=MDX3F0qAfm3", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "4;3;4;3", "wc_review": "965;217;683;430", "wc_reply_reviewers": "83;0;0;0", "wc_reply_authors": "921;449;634;266", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 573.75, 279.70821850635707 ], "wc_reply_reviewers_avg": [ 20.75, 35.94005425705421 ], "wc_reply_authors_avg": [ 567.5, 242.0377036744482 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14939680043599992870&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1;2;3;1", "aff_unique_norm": "Apple;Baidu;Peking University;University of Oregon", "aff_unique_dep": "Apple Inc.;Baidu, Inc.;;", "aff_unique_url": "https://www.apple.com;https://www.baidu.com;http://www.pku.edu.cn;https://www.uoregon.edu", "aff_unique_abbr": "Apple;Baidu;Peking U;UO", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;1;0;1", "aff_country_unique": "United States;China" }, { "title": "Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3161", "id": "MDsQkFP1Aw", "poster": "", "openreview": "https://openreview.net/forum?id=MDsQkFP1Aw", "slides": "https://iclr.cc/virtual/2021/poster/3161", "video": "https://iclr.cc/virtual/2021/poster/3161", "author_site": "Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Dan Ellis, John Hershey", "tldr": "", "abstract": "Recent progress in deep learning has enabled many advances in sound separation and visual scene understanding. However, extracting sound sources which are apparent in natural videos remains an open problem. In this work, we present AudioScope, a novel audio-visual sound separation framework that can be trained without supervision to isolate on-screen sound sources from real in-the-wild videos. Prior audio-visual separation work assumed artificial limitations on the domain of sound classes (e.g., to speech or music), constrained the number of sources, and required strong sound separation or visual segmentation labels. AudioScope overcomes these limitations, operating on an open domain of sounds, with variable numbers of sources, and without labels or prior visual segmentation. The training procedure for AudioScope uses mixture invariant training (MixIT) to separate synthetic mixtures of mixtures (MoMs) into individual sources, where noisy labels for mixtures are provided by an unsupervised audio-visual coincidence model. Using the noisy labels, along with attention between video and audio features, AudioScope learns to identify audio-visual similarity and to suppress off-screen sounds. We demonstrate the effectiveness of our approach using a dataset of video clips extracted from open-domain YFCC100m video data. This dataset contains a wide diversity of sound classes recorded in unconstrained conditions, making the application of previous methods unsuitable. For evaluation and semi-supervised experiments, we collected human labels for presence of on-screen and off-screen sounds on a small subset of clips.", "keywords": "Audio-visual sound separation;in-the-wild data;unsupervised learning;self-supervised learning;universal sound separation", "primary_area": "", "supplementary_material": "/attachment/17e65e06117cfc8ebf1dff8a98f582dab572610a.zip", "author": "Efthymios Tzinis;Scott Wisdom;Aren Jansen;Shawn Hershey;Tal Remez;Dan Ellis;John R. Hershey", "authorids": "~Efthymios_Tzinis1;~Scott_Wisdom1;arenjansen@google.com;shershey@google.com;talremez@google.com;~Dan_Ellis1;~John_R._Hershey1", "gender": "M;M;;;;;", "homepage": "https://www.etzinis.com;https://stwisdom.github.io/;;;;;", "dblp": "201/7067;149/0119;;;;;", "google_scholar": "https://scholar.google.gr/citations?user=IuKsc4IAAAAJ;kJM6N7IAAAAJ;;;;;", "orcid": "0000-0002-1047-1338;;;;;;", "linkedin": "etzinis/;;;;;;", "or_profile": "~Efthymios_Tzinis1;~Scott_Wisdom1;arenjansen@google.com;shershey@google.com;talremez@google.com;~Dan_Ellis1;~John_R._Hershey1", "aff": "Google;Google Research;;;;;", "aff_domain": "google.com;google.com;;;;;", "position": "Research Intern;Research Scientist;;;;;", "bibtex": "@inproceedings{\ntzinis2021into,\ntitle={Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds},\nauthor={Efthymios Tzinis and Scott Wisdom and Aren Jansen and Shawn Hershey and Tal Remez and Dan Ellis and John R. Hershey},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MDsQkFP1Aw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;4;4;4", "wc_review": "402;307;884;566", "wc_reply_reviewers": "0;101;0;0", "wc_reply_authors": "990;1102;2597;1135", "reply_reviewers": "0;1;0;0", "reply_authors": "2;3;4;2", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 539.75, 219.28562994414386 ], "wc_reply_reviewers_avg": [ 25.25, 43.73428289111415 ], "wc_reply_authors_avg": [ 1456.0, 660.9451565750369 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.75, 0.82915619758885 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 86, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7473733958271851327&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=MDsQkFP1Aw", "email": "google.com;google.com;;;;;", "author_num": 7, "aff_unique_index": "0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "ME1ugH3uXr", "title": "Domain Adaptation via Anaomaly Detection", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Domain shift in finetuning from pre-training can significantly impact the performance of deep neural networks. In NLP, this led to domain specific models such as SciBERT, BioBERT, ClinicalBERT, and FinBERT; each pre-trained on a different, manually curated, domain-specific corpus. In this work, we present a novel domain-adaptation framework to tailor pre-training so as to reap the benefits of domain specific pre-training even if we do not have access to large domain specific pre-training corpus. The need for such a method is clear as it is infeasible to collect a large pre-training corpus for every possible domain. Our method is completely unsupervised and unlike related methods, works well in the setting where the target domain data is limited in size. We draw a connection between the task of adapting a large corpus to a target domain and that of anomaly detection, resulting in a scalable and efficient domain adaptation framework. We evaluate our framework and various baselines on eight tasks across four different domains: Biomedical, Computer Science, News, and Movie reviews. Our framework outperforms all the baseline methods and yields an average gain of $1.07\\%$ in performance. We also evaluate it on one of the GLUE task, sentiment analysis and achieve an improvement of $0.4\\%$ in accuracy.", "keywords": "Domain Adaptation;Data Selection", "primary_area": "", "supplementary_material": "", "author": "Vivek Madan;Ashish Khetan;Zohar Karnin", "authorids": "~Vivek_Madan2;~Ashish_Khetan1;~Zohar_Karnin1", "gender": "M;M;", "homepage": ";http://khetan2.web.engr.illinois.edu/;", "dblp": "52/11466.html;175/1775;16/4051", "google_scholar": ";AaauqDAAAAAJ;", "orcid": ";;", "linkedin": ";ashishkhetan09/;", "or_profile": "~Vivek_Madan2;~Ashish_Khetan1;~Zohar_Karnin1", "aff": "Amazon;Amazon;Amazon", "aff_domain": "amazon.com;amazon.com;amazon.com", "position": "Scientist;Applied Scientist;Principal Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=ME1ugH3uXr", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;4;5;3", "wc_review": "556;249;401;366", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 393.0, 109.65628116984453 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:wbdnoOmYOmoJ:scholar.google.com/&scioq=Domain+Adaptation+via+Anaomaly+Detection&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Amazon", "aff_unique_dep": "Amazon.com, Inc.", "aff_unique_url": "https://www.amazon.com", "aff_unique_abbr": "Amazon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "MG8Zde0ip6u", "title": "A Siamese Neural Network for Behavioral Biometrics Authentication", "track": "main", "status": "Reject", "tldr": "", "abstract": "The raise in popularity of personalized web and mobile applications brings about a need of robust authentication systems. Although password authentication is the most popular authentication mechanism, it has also several drawbacks. Behavioral Biometrics Authentication has emerged as a complementary risk-based authentication approach which aims at profiling users based on their behavior while interacting with computers/smartphones. In this work we propose a novel Siamese Neural Network to perform a few-shot verification of user's behavior. We develop our approach to identify behavior from either human-computer or human-smartphone interaction. For computer interaction our approach learns from mouse and keyboard dynamics, while for smartphone interaction it learns from holding patterns and touch patterns. We show that our approach has a few-shot classification accuracy of up to 99.8% and 90.8% for mobile and web interactions, respectively. We also test our approach on a database that contains over 100K different web interactions collected in the wild.", "keywords": "Deep Learning;Few-shot Learning;Behavioral Biometrics;Biometric Authentication", "primary_area": "", "supplementary_material": "", "author": "Jes\u00fas Solano;Esteban Rivera;Alejandra Castelblanco;Lizzy Tengana;Christian Lopez;Martin Ochoa", "authorids": "~Jes\u00fas_Solano1;esteban.rivera@appgate.com;alejandra.castelblanco@appgate.com;lizzy.tengana@appgate.com;christian.lopez@appgate.com;martin.ochoa@appgate.com", "gender": ";;;;;", "homepage": ";;;;;", "dblp": ";;;;;", "google_scholar": ";;;;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": ";;;;;", "aff": ";;;;;", "aff_domain": ";;;;;", "position": ";;;;;", "bibtex": "@misc{\nsolano2021a,\ntitle={A Siamese Neural Network for Behavioral Biometrics Authentication},\nauthor={Jes{\\'u}s Solano and Esteban Rivera and Alejandra Castelblanco and Lizzy Tengana and Christian Lopez and Martin Ochoa},\nyear={2021},\nurl={https://openreview.net/forum?id=MG8Zde0ip6u}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=MG8Zde0ip6u", "pdf_size": 0, "rating": "4;5;9", "confidence": "3;4;4", "wc_review": "163;506;587", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "400;471;282", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.0, 2.160246899469287 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 418.6666666666667, 183.78308470101982 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 384.3333333333333, 77.95012650549211 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.654653670707977, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11843813534929454770&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0 }, { "title": "Learning perturbation sets for robust machine learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2570", "id": "MIDckA56aD", "poster": "", "openreview": "https://openreview.net/forum?id=MIDckA56aD", "slides": "https://iclr.cc/virtual/2021/poster/2570", "video": "https://iclr.cc/virtual/2021/poster/2570", "author_site": "Eric Wong, Zico Kolter", "tldr": "", "abstract": "Although much progress has been made towards robust deep learning, a significant gap in robustness remains between real-world perturbations and more narrowly defined sets typically studied in adversarial defenses. In this paper, we aim to bridge this gap by learning perturbation sets from data, in order to characterize real-world effects for robust training and evaluation. Specifically, we use a conditional generator that defines the perturbation set over a constrained region of the latent space. We formulate desirable properties that measure the quality of a learned perturbation set, and theoretically prove that a conditional variational autoencoder naturally satisfies these criteria. Using this framework, our approach can generate a variety of perturbations at different complexities and scales, ranging from baseline spatial transformations, through common image corruptions, to lighting variations. We measure the quality of our learned perturbation sets both quantitatively and qualitatively, finding that our models are capable of producing a diverse set of meaningful perturbations beyond the limited data seen during training. Finally, we leverage our learned perturbation sets to train models which are empirically and certifiably robust to adversarial image corruptions and adversarial lighting variations, while improving generalization on non-adversarial data. All code and configuration files for reproducing the experiments as well as pretrained model weights can be found at https://github.com/locuslab/perturbation_learning. ", "keywords": "adversarial examples;perturbation sets;robust machine learning;conditional variational autoencoder", "primary_area": "", "supplementary_material": "/attachment/48288b06357fe1f52981eef71ebbb93fd9ee3222.zip", "author": "Eric Wong;J Zico Kolter", "authorids": "~Eric_Wong1;~J_Zico_Kolter1", "gender": "M;M", "homepage": "http://riceric22.github.io/;http://www.zicokolter.com", "dblp": "64/1811-1.html;67/2526", "google_scholar": "pWnTMRkAAAAJ;UXh1I6UAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Eric_Wong1;~Zico_Kolter1", "aff": "Massachusetts Institute of Technology;Carnegie Mellon University", "aff_domain": "mit.edu;cmu.edu", "position": "Postdoc;Full Professor", "bibtex": "@inproceedings{\nwong2021learning,\ntitle={Learning perturbation sets for robust machine learning},\nauthor={Eric Wong and J Zico Kolter},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MIDckA56aD}\n}", "github": "[![github](/images/github_icon.svg) locuslab/perturbation_learning](https://github.com/locuslab/perturbation_learning)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;6;6;8", "confidence": "4;3;3;3", "wc_review": "499;399;309;398", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "824;437;559;411", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 401.25, 67.23233968857546 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 557.75, 163.559431094633 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.6622661785325219, "gs_citation": 89, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14923687105877479161&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=MIDckA56aD", "email": "mit.edu;cmu.edu", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Massachusetts Institute of Technology;Carnegie Mellon University", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://www.cmu.edu", "aff_unique_abbr": "MIT;CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2992", "id": "MJAqnaC2vO1", "poster": "", "openreview": "https://openreview.net/forum?id=MJAqnaC2vO1", "slides": "https://iclr.cc/virtual/2021/poster/2992", "video": "https://iclr.cc/virtual/2021/poster/2992", "author_site": "Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai", "tldr": "", "abstract": "Designing proper loss functions is essential in training deep networks. Especially in the field of semantic segmentation, various evaluation metrics have been proposed for diverse scenarios. Despite the success of the widely adopted cross-entropy loss and its variants, the mis-alignment between the loss functions and evaluation metrics degrades the network performance. Meanwhile, manually designing loss functions for each specific metric requires expertise and significant manpower. In this paper, we propose to automate the design of metric-specific loss functions by searching differentiable surrogate losses for each metric. We substitute the non-differentiable operations in the metrics with parameterized functions, and conduct parameter search to optimize the shape of loss surfaces. Two constraints are introduced to regularize the search space and make the search efficient. Extensive experiments on PASCAL VOC and Cityscapes demonstrate that the searched surrogate losses outperform the manually designed loss functions consistently. The searched losses can generalize well to other datasets and networks. Code shall be released at https://github.com/fundamentalvision/Auto-Seg-Loss.", "keywords": "Loss Function Search;Metric Surrogate;Semantic Segmentation", "primary_area": "", "supplementary_material": "/attachment/fdba3303cd27a3c6f4da6db486a04f4de333df63.zip", "author": "Hao Li;Chenxin Tao;Xizhou Zhu;Xiaogang Wang;Gao Huang;Jifeng Dai", "authorids": "haoli@link.cuhk.edu.hk;~Chenxin_Tao2;~Xizhou_Zhu1;~Xiaogang_Wang2;~Gao_Huang1;~Jifeng_Dai1", "gender": ";;;M;M;M", "homepage": ";;;http://www.ee.cuhk.edu.hk/~xgwang/;http://www.gaohuang.net;https://jifengdai.org/", "dblp": ";;170/1608;91/6236-1.html;;14/9399", "google_scholar": ";;02RXI00AAAAJ;https://scholar.google.com.hk/citations?user=-B5JgjsAAAAJ;-P9LwcgAAAAJ;SH_-B_AAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "haoli@link.cuhk.edu.hk;~Chenxin_Tao2;~Xizhou_Zhu1;~Xiaogang_Wang2;~Gao_Huang1;~Jifeng_Dai1", "aff": ";;SenseTime;The Chinese University of Hong Kong;Tsinghua University;SenseTime Group Ltd", "aff_domain": ";;sensetime.com;cuhk.edu.hk;tsinghua.edu.cn;sensetime.com", "position": ";;Researcher;Full Professor;Assistant Professor;Executive Research Director", "bibtex": "@inproceedings{\nli2021auto,\ntitle={Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation},\nauthor={Hao Li and Chenxin Tao and Xizhou Zhu and Xiaogang Wang and Gao Huang and Jifeng Dai},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MJAqnaC2vO1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer5", "pdf_size": 0, "rating": "5;5;7;7", "confidence": "3;3;4;3", "wc_review": "324;319;452;373", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "673;841;980;842", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;2", "rating_avg": [ 6.0, 1.0 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 367.0, 53.4181617055473 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 834.0, 108.80027573494472 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 32, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17098330638181161712&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=MJAqnaC2vO1", "email": ";;sensetime.com;cuhk.edu.hk;tsinghua.edu.cn;sensetime.com", "author_num": 6, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "SenseTime;Chinese University of Hong Kong;Tsinghua University;SenseTime Group", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.sensetime.com;https://www.cuhk.edu.hk;https://www.tsinghua.edu.cn;https://www.sensetime.com", "aff_unique_abbr": "SenseTime;CUHK;THU;SenseTime", "aff_campus_unique_index": "1", "aff_campus_unique": ";Hong Kong SAR", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "title": "Unbiased Teacher for Semi-Supervised Object Detection", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2757", "id": "MJIve1zgR_", "poster": "", "openreview": "https://openreview.net/forum?id=MJIve1zgR_", "slides": "https://iclr.cc/virtual/2021/poster/2757", "video": "https://iclr.cc/virtual/2021/poster/2757", "author_site": "Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda", "tldr": "", "abstract": "Semi-supervised learning, i.e., training networks with both labeled and unlabeled data, has made significant progress recently. However, existing works have primarily focused on image classification tasks and neglected object detection which requires more annotation effort. In this work, we revisit the Semi-Supervised Object Detection (SS-OD) and identify the pseudo-labeling bias issue in SS-OD. To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner. Together with a class-balance loss to downweight overly confident pseudo-labels, Unbiased Teacher consistently improved state-of-the-art methods by significant margins on COCO-standard, COCO-additional, and VOC datasets. Specifically, Unbiased Teacher achieves 6.8 absolute mAP improvements against state-of-the-art method when using 1% of labeled data on MS-COCO, achieves around 10 mAP improvements against the supervised baseline when using only 0.5, 1, 2% of labeled data on MS-COCO.", "keywords": "Object Detection", "primary_area": "", "supplementary_material": "/attachment/8387b7f33d8163fe078557a90daaad822ee0bfa3.zip", "author": "Yen-Cheng Liu;Chih-Yao Ma;Zijian He;Chia-Wen Kuo;Kan Chen;Peizhao Zhang;Bichen Wu;Zsolt Kira;Peter Vajda", "authorids": "~Yen-Cheng_Liu1;~Chih-Yao_Ma1;zijian@fb.com;~Chia-Wen_Kuo1;~Kan_Chen1;~Peizhao_Zhang1;~Bichen_Wu1;~Zsolt_Kira1;~Peter_Vajda1", "gender": ";M;;M;M;M;M;M;", "homepage": "https://ycliu93.github.io/;https://chihyaoma.github.io/;;https://sites.google.com/view/chiawen-kuo/home;http://wind09.github.io/;;;https://faculty.cc.gatech.edu/~zk15;https://sites.google.com/site/vajdap", "dblp": "29/7584;198/0963;;;;23/8011.html;130/1371;36/4127;44/5953", "google_scholar": "yeAeAhsAAAAJ;HrrtgKkAAAAJ;;iip65VkAAAAJ;https://scholar.google.com.hk/citations?user=BYrARP4AAAAJ;eqQQkM4AAAAJ;K3QJPdMAAAAJ;2a5XgNAAAAAJ;k8QB5VUAAAAJ", "orcid": ";;;;0000-0003-1415-5495;;;0000-0002-2626-2004;", "linkedin": ";kevin-chih-yao-ma-9b5b3063/;;;;;bichenwu/;;p%C3%A9ter-vajda-9a03aaa/", "or_profile": "~Yen-Cheng_Liu1;~Chih-Yao_Ma1;zijian@fb.com;~Chia-Wen_Kuo1;~Kan_Chen1;~Peizhao_Zhang1;~Bichen_Wu1;~Zsolt_Kira1;~Peter_Vajda1", "aff": "Georgia Institute of Technology;Meta;;Amazon;Meta Facebook;Meta;Meta Facebook;Georgia Tech Research Institute;Meta", "aff_domain": "gatech.edu;meta.com;;amazon.com;fb.com;meta.com;fb.com;gtri.gatech.edu;meta.com", "position": "PhD student;Research Scientist;;Intern;Research Scientist;Research Scientist;Research Scientist;Senior Research Scientist;Researcher", "bibtex": "@inproceedings{\nliu2021unbiased,\ntitle={Unbiased Teacher for Semi-Supervised Object Detection},\nauthor={Yen-Cheng Liu and Chih-Yao Ma and Zijian He and Chia-Wen Kuo and Kan Chen and Peizhao Zhang and Bichen Wu and Zsolt Kira and Peter Vajda},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MJIve1zgR_}\n}", "github": "[![github](/images/github_icon.svg) facebookresearch/unbiased-teacher](https://github.com/facebookresearch/unbiased-teacher) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=MJIve1zgR_)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7;9", "confidence": "3;5;4;4", "wc_review": "266;388;278;109", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "957;1520;355;193", "reply_reviewers": "0;0;0;0", "reply_authors": "2;3;1;1", "rating_avg": [ 7.25, 1.0897247358851685 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 260.25, 99.42930905925073 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 756.25, 524.8492045340262 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.3244428422615251, "gs_citation": 598, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=860392753310305868&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=MJIve1zgR_", "email": "gatech.edu;meta.com;;amazon.com;fb.com;meta.com;fb.com;gtri.gatech.edu;meta.com", "author_num": 9, "aff_unique_index": "0;1;2;1;1;1;3;1", "aff_unique_norm": "Georgia Institute of Technology;Meta;Amazon;Georgia Tech Research Institute", "aff_unique_dep": ";Meta Platforms, Inc.;Amazon.com, Inc.;", "aff_unique_url": "https://www.gatech.edu;https://meta.com;https://www.amazon.com;https://www.gtri.gatech.edu", "aff_unique_abbr": "Georgia Tech;Meta;Amazon;GTRI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "MJmYbFnJAGa", "title": "Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Federated learning is a challenging optimization problem due to the heterogeneity of the data across different clients. Such heterogeneity has been observed to induce \\emph{client drift} and significantly degrade the performance of algorithms designed for this setting.\nIn contrast, centralized learning with centrally collected data does not experience such drift, and has seen great empirical and theoretical progress with innovations such as momentum, adaptivity, etc.\nIn this work, we propose a general framework {\\sc Mime} which mitigates client-drift and adapts arbitrary centralized optimization algorithms (e.g. SGD, Adam, etc.) to federated learning.\n{\\sc Mime} uses a combination of \\emph{control-variates} and \\emph{server-level statistics} (e.g. momentum) at every client-update step to ensure that each local update mimics that of the centralized method. Our thorough theoretical and empirical analyses strongly establish \\mime's superiority over other baselines.", "keywords": "Federated learning;Federated optimization;Adaptive optimization;Adam;Variance Reduction;Distributed optimization;Decentralized optimization", "primary_area": "", "supplementary_material": "", "author": "Sai Praneeth Karimireddy;Martin Jaggi;Satyen Kale;Mehryar Mohri;Sashank J. Reddi;Sebastian U Stich;Ananda Theertha Suresh", "authorids": "~Sai_Praneeth_Karimireddy1;~Martin_Jaggi1;~Satyen_Kale2;~Mehryar_Mohri1;~Sashank_J._Reddi1;~Sebastian_U_Stich1;~Ananda_Theertha_Suresh1", "gender": "M;M;;M;M;M;M", "homepage": "https://spkreddy.org;https://mlo.epfl.ch;https://www.satyenkale.com;;https://www.sstich.ch;https://theertha.info;https://cs.nyu.edu/~mohri/", "dblp": "217/3342;17/4402;52/4768;50/10452;04/10549;119/3884;03/5448", "google_scholar": "wKJeOQoAAAAJ;https://scholar.google.ch/citations?user=r1TJBr8AAAAJ;https://scholar.google.com/citations?hl=en;70lgwYwAAAAJ;https://scholar.google.ch/citations?user=8l-mDfQAAAAJ;K6ef57QAAAAJ;ktwwLjsAAAAJ", "orcid": ";0000-0003-1579-5558;;;;;", "linkedin": ";;;;;;mehryar-mohri-3737b981/", "or_profile": "~Sai_Praneeth_Karimireddy1;~Martin_Jaggi1;~Satyen_Kale2;~Sashank_J._Reddi1;~Sebastian_U_Stich1;~Ananda_Theertha_Suresh1;~Mehryar_Mohri2", "aff": "Swiss Federal Institute of Technology Lausanne;EPFL;Google;Google;Swiss Federal Institute of Technology Lausanne;Google;Google Research", "aff_domain": "epfl.ch;epfl.ch;google.com;google.com;epfl.ch;google.com;google.com", "position": "PhD student;Assistant Professor;Research Scientist;Research Scientist;Postdoc;Research Scientist;Principal Researcher", "bibtex": "@misc{\nkarimireddy2021mime,\ntitle={Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning},\nauthor={Sai Praneeth Karimireddy and Martin Jaggi and Satyen Kale and Mehryar Mohri and Sashank J. Reddi and Sebastian U Stich and Ananda Theertha Suresh},\nyear={2021},\nurl={https://openreview.net/forum?id=MJmYbFnJAGa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=MJmYbFnJAGa", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;3;2", "wc_review": "316;240;319;418", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "148;196;904;442", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 323.25, 63.20354024894492 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 422.5, 299.53088321573784 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 248, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14288155548381772698&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;2;2;0;2;2", "aff_unique_norm": "Swiss Federal Institute of Technology Lausanne;EPFL;Google", "aff_unique_dep": ";;Google", "aff_unique_url": "https://www.epfl.ch;https://www.epfl.ch;https://www.google.com", "aff_unique_abbr": "EPFL;EPFL;Google", "aff_campus_unique_index": "0;2;2;0;2;2", "aff_campus_unique": "Lausanne;;Mountain View", "aff_country_unique_index": "0;0;1;1;0;1;1", "aff_country_unique": "Switzerland;United States" }, { "title": "Contrastive Divergence Learning is a Time Reversal Adversarial Game", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2898", "id": "MLSvqIHRidA", "poster": "", "openreview": "https://openreview.net/forum?id=MLSvqIHRidA", "slides": "https://iclr.cc/virtual/2021/poster/2898", "video": "https://iclr.cc/virtual/2021/poster/2898", "author_site": "Omer Yair, Tomer Michaeli", "tldr": "", "abstract": "Contrastive divergence (CD) learning is a classical method for fitting unnormalized statistical models to data samples. Despite its wide-spread use, the convergence properties of this algorithm are still not well understood. The main source of difficulty is an unjustified approximation which has been used to derive the gradient of the loss. In this paper, we present an alternative derivation of CD that does not require any approximation and sheds new light on the objective that is actually being optimized by the algorithm. Specifically, we show that CD is an adversarial learning procedure, where a discriminator attempts to classify whether a Markov chain generated from the model has been time-reversed. Thus, although predating generative adversarial networks (GANs) by more than a decade, CD is, in fact, closely related to these techniques. Our derivation settles well with previous observations, which have concluded that CD's update steps cannot be expressed as the gradients of any fixed objective function. In addition, as a byproduct, our derivation reveals a simple correction that can be used as an alternative to Metropolis-Hastings rejection, which is required when the underlying Markov chain is inexact (e.g., when using Langevin dynamics with a large step).", "keywords": "Unsupervised learning;energy based model;adversarial learning;contrastive divergence;noise contrastive estimation", "primary_area": "", "supplementary_material": "/attachment/1124c6c41a5beb640f3b369d8dc9de3ee2d20144.zip", "author": "Omer Yair;Tomer Michaeli", "authorids": "~Omer_Yair1;~Tomer_Michaeli1", "gender": "M;M", "homepage": ";https://tomer.net.technion.ac.il/", "dblp": "166/1235;70/3188.html", "google_scholar": "EF3AXOkAAAAJ;n2EbR2cAAAAJ", "orcid": ";", "linkedin": "yairomer/?originalSubdomain=il;", "or_profile": "~Omer_Yair1;~Tomer_Michaeli1", "aff": "Technion, Technion;Technion, Technion", "aff_domain": "technion.ac.il;technion.ac.il", "position": "PhD student;Associate Professor", "bibtex": "@inproceedings{\nyair2021contrastive,\ntitle={Contrastive Divergence Learning is a Time Reversal Adversarial Game},\nauthor={Omer Yair and Tomer Michaeli},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MLSvqIHRidA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "4;3;3;4", "wc_review": "307;365;354;344", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "592;378;290;395", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 342.5, 21.80022935659164 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 413.75, 110.35935619601992 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 11, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16875684033378405463&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=MLSvqIHRidA", "email": "technion.ac.il;technion.ac.il", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Technion - Israel Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.technion.ac.il/en/", "aff_unique_abbr": "Technion", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Israel" }, { "id": "MMXhHXbNsa-", "title": "Blind Pareto Fairness and Subgroup Robustness", "track": "main", "status": "Reject", "tldr": "", "abstract": "With the wide adoption of machine learning algorithms across various application domains, there is a growing interest in the fairness properties of such algorithms. The vast majority of the activity in the field of group fairness addresses disparities between prede\ufb01ned groups based on protected features such as gender, age, and race, which need to be available at train, and often also at test, time. These approaches are static and retrospective, since algorithms designed to protect groups identified a priori cannot anticipate and protect the needs of different at-risk groups in the future. In this work we analyze the space of solutions for worst-case fairness beyond demographics, and propose Blind Pareto Fairness (BPF), a method that leverages no-regret dynamics to recover a fair minimax classi\ufb01er that reduces worst-case risk of any potential subgroup of suf\ufb01cient size, and guarantees that the remaining population receives the best possible level of service. BPF addresses fairness beyond demographics, that is, it does not rely on prede\ufb01ned notions of at-risk groups, neither at train nor at test time. Our experimental results show that the proposed framework improves worst-case risk in multiple standard datasets, while simultaneously providing better levels of service for the remaining population, in comparison to competing methods.", "keywords": "fairness;fairness in machine learning;fairness without demographics;robustness;subgroup robustness;blind fairness;pareto fairness", "primary_area": "", "supplementary_material": "/attachment/b48bf83d9a0f191c4a29560574d1776881767629.zip", "author": "Natalia Martinez;Martin Bertran;Afroditi Papadaki;Miguel R. D. Rodrigues;Guillermo Sapiro", "authorids": "~Natalia_Martinez1;martin.a.bertran@gmail.com;~Afroditi_Papadaki1;~Miguel_R._D._Rodrigues1;~Guillermo_Sapiro1", "gender": ";;;M;", "homepage": "https://scholar.google.com/citations?user=CehCYYoAAAAJ&hl=en;;https://apapadaki.github.io/;https://www.ee.ucl.ac.uk/iiml/;", "dblp": "87/2515;;220/5701;21/6763;82/5175", "google_scholar": "CehCYYoAAAAJ;;57FE_XsAAAAJ;;https://scholar.google.co.il/citations?user=ISRNX3gAAAAJ", "orcid": ";;0000-0002-1253-1947;;", "linkedin": ";;;;", "or_profile": "~Natalia_Martinez1;martin.a.bertran@gmail.com;~Afroditi_Papadaki1;~Miguel_R._D._Rodrigues1;~Guillermo_Sapiro1", "aff": "Duke University;;University College London;University College London;Duke University", "aff_domain": "duke.edu;;ucl.ac.uk;ucl.ac.uk;duke.edu", "position": "PhD student;;PhD student;Full Professor;Full Professor", "bibtex": "@misc{\nmartinez2021blind,\ntitle={Blind Pareto Fairness and Subgroup Robustness},\nauthor={Natalia Martinez and Martin Bertran and Afroditi Papadaki and Miguel R. D. Rodrigues and Guillermo Sapiro},\nyear={2021},\nurl={https://openreview.net/forum?id=MMXhHXbNsa-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer5", "site": "https://openreview.net/forum?id=MMXhHXbNsa-", "pdf_size": 0, "rating": "6;6;6", "confidence": "4;4;4", "wc_review": "768;528;430", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "213;445;469", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 575.3333333333334, 141.989044960831 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 375.6666666666667, 115.43925771687125 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 50, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11688522208362271332&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "Duke University;University College London", "aff_unique_dep": ";", "aff_unique_url": "https://www.duke.edu;https://www.ucl.ac.uk", "aff_unique_abbr": "Duke;UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "MP0LhG4YiiC", "title": "Analogical Reasoning for Visually Grounded Compositional Generalization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Children acquire language subconsciously by observing the surrounding world and listening to descriptions. They can discover the meaning of words even without explicit language knowledge, and generalize to novel compositions effortlessly. In this paper, we bring this ability to AI, by studying the task of multimodal compositional generalization within the context of visually grounded language acquisition. We propose a multimodal transformer model augmented with a novel mechanism for analogical reasoning, which approximates novel compositions by learning semantic mapping and reasoning operations from previously seen compositions. Our proposed method, Analogical Reasoning Transformer Networks (ARTNet), is trained on raw multimedia data (video frames and transcripts), and after observing a set of compositions such as \"washing apple\" or \"cutting carrot\", it can generalize and recognize new compositions in new video frames, such as \"washing carrot\" or \"cutting apple\". To this end, ARTNet refers to relevant instances in the training data and uses their visual features and captions to establish analogies with the query image. Then it chooses a suitable verb and noun to create a new composition that describes the new image best. Extensive experiments on an instructional video dataset demonstrate that the proposed method achieves significantly better generalization capability and recognition accuracy compared to state-of-the-art transformer models.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/d9640708e9271ab7e837268b96fb6c4580af7524.zip", "author": "Bo Wu;Haoyu Qin;Alireza Zareian;Carl Vondrick;Shih-Fu Chang", "authorids": "~Bo_Wu6;~Haoyu_Qin1;~Alireza_Zareian2;~Carl_Vondrick2;~Shih-Fu_Chang3", "gender": ";M;;M;M", "homepage": ";;;http://www.cs.columbia.edu/~vondrick/;http://www.ee.columbia.edu/~sfchang/", "dblp": ";211/8134;154/6427;26/8610;c/ShihFuChang", "google_scholar": ";;Ioe0SGsAAAAJ;3MzhkFIAAAAJ;OMVTRscAAAAJ", "orcid": ";;;;", "linkedin": ";;alzareian/;;", "or_profile": "~Bo_Wu6;~Haoyu_Qin1;~Alireza_Zareian2;~Carl_Vondrick2;~Shih-Fu_Chang3", "aff": ";;Snap Inc.;Columbia University;Amazon", "aff_domain": ";;snap.com;columbia.edu;amazon.com", "position": ";;Researcher;Assistant Professor;Scholar", "bibtex": "@misc{\nwu2021analogical,\ntitle={Analogical Reasoning for Visually Grounded Compositional Generalization},\nauthor={Bo Wu and Haoyu Qin and Alireza Zareian and Carl Vondrick and Shih-Fu Chang},\nyear={2021},\nurl={https://openreview.net/forum?id=MP0LhG4YiiC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=MP0LhG4YiiC", "pdf_size": 0, "rating": "3;5;7", "confidence": "3;4;4", "wc_review": "1002;564;425", "wc_reply_reviewers": "238;0;0", "wc_reply_authors": "1338;942;365", "reply_reviewers": "1;0;0", "reply_authors": "3;3;1", "rating_avg": [ 5.0, 1.632993161855452 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 663.6666666666666, 245.8757590509664 ], "wc_reply_reviewers_avg": [ 79.33333333333333, 112.19427594826554 ], "wc_reply_authors_avg": [ 881.6666666666666, 399.5099776253682 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.3333333333333335, 0.9428090415820634 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:WRE2EIRps10J:scholar.google.com/&scioq=Analogical+Reasoning+for+Visually+Grounded+Compositional+Generalization&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Snap Inc.;Columbia University;Amazon", "aff_unique_dep": ";;Amazon.com, Inc.", "aff_unique_url": "https://www.snapinc.com;https://www.columbia.edu;https://www.amazon.com", "aff_unique_abbr": "Snap;Columbia;Amazon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "MPO4oML_JC", "title": "Coordinated Multi-Agent Exploration Using Shared Goals", "track": "main", "status": "Reject", "tldr": "", "abstract": "Exploration is critical for good results of deep reinforcement learning algorithms and has drawn much attention. However, existing multi-agent deep reinforcement learning algorithms still use mostly noise-based techniques. It was recognized recently that noise-based exploration is suboptimal in multi-agent settings, and exploration methods that consider agents' cooperation have been developed. However, existing methods suffer from a common challenge: agents struggle to identify states that are worth exploring, and don't coordinate their exploration efforts toward those states. To address this shortcoming, in this paper, we proposed coordinated multi-agent exploration (CMAE): agents share a common goal while exploring. The goal is selected by a normalized entropy-based technique from multiple projected state spaces. Then, agents are trained to reach the goal in a coordinated manner. We demonstrated that our approach needs only $1\\%-5\\%$ of the environment steps to achieve similar or better returns than state-of-the-art baselines on various sparse-reward tasks, including a sparse-reward version of the Starcraft multi-agent challenge (SMAC).", "keywords": "Multi-agent RL;Deep RL;Exploration", "primary_area": "", "supplementary_material": "/attachment/94dcabe1390297a58ba77a94f073eb3310f6d06c.zip", "author": "Iou-Jen Liu;Unnat Jain;Alex Schwing", "authorids": "~Iou-Jen_Liu1;~Unnat_Jain1;~Alex_Schwing1", "gender": "M;;Unspecified", "homepage": "https://ioujenliu.github.io/;;https://ece.illinois.edu/directory/profile/aschwing", "dblp": "130/1359;;79/9775", "google_scholar": "OQDOCs0AAAAJ;;3B2c31wAAAAJ", "orcid": ";;", "linkedin": "iou-jen-liu-29436bb2/;;", "or_profile": "~Iou-Jen_Liu1;~Unnat_Jain1;~Alex_Schwing1", "aff": "Microsoft;;University of Illinois, Urbana Champaign", "aff_domain": "microsoft.com;;illinois.edu", "position": "Intern;;Assistant Professor", "bibtex": "@misc{\nliu2021coordinated,\ntitle={Coordinated Multi-Agent Exploration Using Shared Goals},\nauthor={Iou-Jen Liu and Unnat Jain and Alex Schwing},\nyear={2021},\nurl={https://openreview.net/forum?id=MPO4oML_JC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=MPO4oML_JC", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;4;3;4", "wc_review": "1119;273;211;808", "wc_reply_reviewers": "0;321;0;0", "wc_reply_authors": "784;790;111;595", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 602.75, 377.7713428782019 ], "wc_reply_reviewers_avg": [ 80.25, 138.9970773074024 ], "wc_reply_authors_avg": [ 570.0, 276.36117672350434 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:CmBjzFUHaEUJ:scholar.google.com/&scioq=Coordinated+Multi-Agent+Exploration+Using+Shared+Goals&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Microsoft;University of Illinois Urbana-Champaign", "aff_unique_dep": "Microsoft Corporation;", "aff_unique_url": "https://www.microsoft.com;https://illinois.edu", "aff_unique_abbr": "Microsoft;UIUC", "aff_campus_unique_index": "1", "aff_campus_unique": ";Urbana-Champaign", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "MRQJmsNPp8E", "title": "Learning Representations by Contrasting Clusters While Bootstrapping Instances", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning visual representations using large-scale unlabelled images is a holy grail for most of computer vision tasks. Recent contrastive learning methods have focused on encouraging the learned visual representations to be linearly separable among the individual items regardless of their semantic similarity; however, it could lead to a sub-optimal solution if a given downstream task is related to non-discriminative ones such as cluster analysis and information retrieval. In this work, we propose an advanced approach to consider the instance semantics in an unsupervised environment by both i) Contrasting batch-wise Cluster assignment features and ii) Bootstrapping an INstance representations without considering negatives simultaneously, referred to as C2BIN. Specifically, instances in a mini-batch are appropriately assigned to distinct clusters, each of which aims to capture apparent similarity among instances. Moreover, we introduce a pyramidal multi-heads technique, showing positive effects on the representations by capturing multi-scale semantics. Empirically, our method achieves comparable or better performance than both representation learning and clustering baselines on various benchmark datasets: CIFAR-10, CIFAR-100, and STL-10.", "keywords": "unsupervised;self-supervised;image clustering;visual representation learning", "primary_area": "", "supplementary_material": "", "author": "Junsoo Lee;Hojoon Lee;Inkyu Shin;Jaekyoung Bae;In So Kweon;Jaegul Choo", "authorids": "~Junsoo_Lee1;joonleesky@kaist.ac.kr;~Inkyu_Shin1;storm.b@kakaoenterprise.com;~In_So_Kweon2;~Jaegul_Choo1", "gender": "M;;M;;;M", "homepage": "https://ssuhan.github.io/;;https://dlsrbgg33.github.io/;;;https://sites.google.com/site/jaegulchoo/", "dblp": "87/3598;;232/3141;;;07/2074", "google_scholar": "https://scholar.google.co.kr/citations?user=Ww1oh28AAAAJ;;XpHl_HEAAAAJ;;;GHJYsLEAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Junsoo_Lee1;joonleesky@kaist.ac.kr;~Inkyu_Shin1;storm.b@kakaoenterprise.com;~In_So_Kweon2;~Jaegul_Choo1", "aff": "Korea Advanced Institute of Science & Technology;;Korea Advanced Institute of Science & Technology;;;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;;kaist.ac.kr;;;kaist.ac.kr", "position": "MS student;;MS student;;;Associate Professor", "bibtex": "@misc{\nlee2021learning,\ntitle={Learning Representations by Contrasting Clusters While Bootstrapping Instances},\nauthor={Junsoo Lee and Hojoon Lee and Inkyu Shin and Jaekyoung Bae and In So Kweon and Jaegul Choo},\nyear={2021},\nurl={https://openreview.net/forum?id=MRQJmsNPp8E}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=MRQJmsNPp8E", "pdf_size": 0, "rating": "4;5;6", "confidence": "5;3;3", "wc_review": "198;599;454", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "101;995;576", "reply_reviewers": "0;0;0", "reply_authors": "1;2;1", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 417.0, 165.78500133204653 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 557.3333333333334, 365.212571281743 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.8660254037844387, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15485328198746745138&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "MU0yqXIoleL", "title": "Convolutional Complex Knowledge Graph Embeddings", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "In this paper, we study the problem of learning continuous vector representations of knowledge graphs for predicting missing links. We present a new approach called ConEx, which infers missing links by leveraging the composition of a 2D convolution with a Hermitian inner product of complex-valued embedding vectors. We evaluate ConEx against state-of-the-art approaches on the WN18RR, FB15K-237, KINSHIP and UMLS benchmark datasets. Our experimental results show that ConEx achieves a performance superior to that of state-of-the-art approaches such as RotatE, QuatE and TuckER in the link prediction task on all datasets while requiring at least 8 times fewer parameters. We ensure the reproducibility of our results by providing an open-source implementation which includes the training, evaluation scripts along with pre-trained models at https://github.com/conex-kge/ConEx.", "keywords": "complex knowledge graph embeddings;convolutions", "primary_area": "", "supplementary_material": "/attachment/dbe0a68638d146c9c0de12770421d51ae6789e1e.zip", "author": "Caglar Demir;Axel Ngonga", "authorids": "~Caglar_Demir1;~Axel_Ngonga1", "gender": "M;", "homepage": "https://dice-research.org/;http://dice-research.org", "dblp": ";65/4336", "google_scholar": ";ccQhjwkAAAAJ", "orcid": ";0000-0001-7112-3516", "linkedin": "caglar-demir/;", "or_profile": "~Caglar_Demir1;~Axel_Ngonga1", "aff": "eim;Universit\u00e4t Paderborn", "aff_domain": "paderborn.de;uni-paderborn.de", "position": "PhD student;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=MU0yqXIoleL", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "5;5;5;3", "wc_review": "819;538;379;394", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 532.5, 176.6755500911204 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 58, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6278000920278730390&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "1", "aff_unique_norm": ";University of Paderborn", "aff_unique_dep": ";", "aff_unique_url": ";https://www.uni-paderborn.de", "aff_unique_abbr": ";UPB", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "1", "aff_country_unique": ";Germany" }, { "id": "MWj_P-Lk3jC", "title": "Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "\"``Nonstationarity\" is a fundamental problem in cooperative multi-agent reinforcement learning (MARL). It results from every agent's policy changing during learning, while being part of the environment from the perspective of other agents. This causes information to inherently oscillate between agents during learning, greatly slowing convergence. We use the MAILP model of information transfer during multi-agent learning to show that increasing centralization during learning arbitrarily mitigates the slowing of convergence due to nonstationarity. The most centralized case of learning is parameter sharing, an uncommonly used MARL method, specific to environments with homogeneous agents. It bootstraps single-agent reinforcement learning (RL) methods and learns an identical policy for each agent. We experimentally replicate our theoretical result of increased learning centralization leading to better performance. We further apply parameter sharing to 8 more modern single-agent deep RL methods for the first time, achieving up to 44 times more average reward in 16% as many episodes compared to previous parameter sharing experiments. We finally give a formal proof of a set of methods that allow parameter sharing to serve in environments with heterogeneous agents.", "keywords": "Reinforcement Learning;Multi-agent Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "J K Terry;Nathaniel Grammel;Ananth Hari;Luis Santos;Benjamin Black", "authorids": "~J_K_Terry1;ngrammel@umd.edu;ahari1@umd.edu;luis.santos@swarmlabs.com;benjamin.black@swarmlabs.com", "gender": "F;;;;", "homepage": ";;;;", "dblp": ";;;;", "google_scholar": "QcDnpLgAAAAJ;;;;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~J_K_Terry1;ngrammel@umd.edu;ahari1@umd.edu;luis.santos@swarmlabs.com;benjamin.black@swarmlabs.com", "aff": "University of Maryland, College Park;;;;", "aff_domain": "umd.edu;;;;", "position": "PhD student;;;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=MWj_P-Lk3jC", "pdf_size": 0, "rating": "3;3;5;7", "confidence": "4;5;4;5", "wc_review": "574;686;546;487", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 1.6583123951777 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 573.25, 72.27508215145798 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 106, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2153509577363500882&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "University of Maryland", "aff_unique_dep": "", "aff_unique_url": "https://www/umd.edu", "aff_unique_abbr": "UMD", "aff_campus_unique_index": "0", "aff_campus_unique": "College Park", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "MY3WGKsXct_", "title": "Membership Attacks on Conditional Generative Models Using Image Difficulty", "track": "main", "status": "Reject", "tldr": "", "abstract": "Membership inference attacks (MIA) try to detect if data samples were used to train a Neural Network model. As training data is very valuable in machine learning, MIA can be used to detect the use of unauthorized data. Unlike the traditional MIA approaches, addressing classification models, we address conditional image generation models (e.g. image translation). \nDue to overfitting, reconstruction errors are typically lower for images used in training. A simple but effective approach for membership attacks can therefore use the reconstruction error.\nHowever, we observe that some images are \"universally\" easy, and others are difficult. Reconstruction error alone is less effective at discriminating between difficult images used in training and easy images that were never seen before. To overcome this, we propose to use a novel difficulty score that can be computed for each image, and its computation does not require a training set. Our membership error, obtained by subtracting the difficulty score from the reconstruction error, is shown to achieve high MIA accuracy on an extensive number of benchmarks.", "keywords": "Membership Inference Attack;Image translation", "primary_area": "", "supplementary_material": "", "author": "Avital Shafran;Shmuel Peleg;Yedid Hoshen", "authorids": "avital.shafran@mail.huji.ac.il;~Shmuel_Peleg1;~Yedid_Hoshen3", "gender": ";M;M", "homepage": ";http://www.cs.huji.ac.il/~peleg/;https://www.cs.huji.ac.il/~ydidh/", "dblp": ";p/ShmuelPeleg;136/0280", "google_scholar": ";CshJxRUAAAAJ;https://scholar.google.co.il/citations?user=6y1-qS4AAAAJ", "orcid": ";0000-0002-4468-2619;", "linkedin": ";speleg/;", "or_profile": "avital.shafran@mail.huji.ac.il;~Shmuel_Peleg1;~Yedid_Hoshen3", "aff": ";Hebrew University of Jerusalem;Hebrew University of Jerusalem", "aff_domain": ";huji.ac.il;huji.ac.il", "position": ";Full Professor;Assistant Professor", "bibtex": "@misc{\nshafran2021membership,\ntitle={Membership Attacks on Conditional Generative Models Using Image Difficulty},\nauthor={Avital Shafran and Shmuel Peleg and Yedid Hoshen},\nyear={2021},\nurl={https://openreview.net/forum?id=MY3WGKsXct_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=MY3WGKsXct_", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;3;3;3", "wc_review": "462;245;303;229", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "763;294;356;320", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 309.75, 92.11236344812785 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 433.25, 191.6499086876902 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ttAFrNh7dn0J:scholar.google.com/&scioq=Membership+Attacks+on+Conditional+Generative+Models+Using+Image+Difficulty&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Hebrew University of Jerusalem", "aff_unique_dep": "", "aff_unique_url": "https://www.huji.ac.il", "aff_unique_abbr": "HUJI", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Jerusalem", "aff_country_unique_index": "0;0", "aff_country_unique": "Israel" }, { "id": "MY5iHZ0IZXl", "title": "ABSTRACTING INFLUENCE PATHS FOR EXPLAINING (CONTEXTUALIZATION OF) BERT MODELS", "track": "main", "status": "Reject", "tldr": "", "abstract": "While \u201cattention is all you need\u201d may be proving true, we do not yet know why: attention-based transformer models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subject-verb number agreement(SVA) is uncertain. We introduce multi-partite patterns, abstractions of sets of paths through a neural network model. Patterns quantify and localize the effect of an input concept (e.g., a subject\u2019s number) on an output concept (e.g. corresponding verb\u2019s number) to paths passing through a sequence of model components, thus surfacing how BERT contextualizes information. We describe guided pattern refinement, an efficient search procedure for finding sufficient and sparse patterns representative of concept-critical paths. We discover that patterns generate succinct and meaningful explanations for BERT, highlighted by \u201ccopy\u201d and \u201ctransfer\u201d operations implemented by skip connections and attention heads, respectively. We also show how pattern visualizations help us understand how BERT contextualizes various grammatical concepts, such as SVA across clauses, and why it makes errors in some cases while succeeding in others.", "keywords": "interpretability;natural language processing;transformer;BERT", "primary_area": "", "supplementary_material": "", "author": "Kaiji Lu;Zifan Wang;Piotr Mardziel;Anupam Datta", "authorids": "~Kaiji_Lu1;~Zifan_Wang1;piotrm@gmail.com;~Anupam_Datta2", "gender": "M;M;;", "homepage": "https://www.linkedin.com/in/calebkaijilu/;https://www.zifanw.net;;", "dblp": "224/0239;;;", "google_scholar": ";HJOP3wMAAAAJ;;", "orcid": ";;;", "linkedin": ";zifan-wang-sail/;;", "or_profile": "~Kaiji_Lu1;~Zifan_Wang1;piotrm@gmail.com;~Anupam_Datta2", "aff": "Carnegie Mellon University;Carnegie Mellon University;;Carnegie-Mellon University", "aff_domain": "cmu.edu;cmu.edu;;", "position": "PhD student;PhD student;;", "bibtex": "@misc{\nlu2021abstracting,\ntitle={{\\{}ABSTRACTING{\\}} {\\{}INFLUENCE{\\}} {\\{}PATHS{\\}} {\\{}FOR{\\}} {\\{}EXPLAINING{\\}} ({\\{}CONTEXTUALIZATION{\\}} {\\{}OF{\\}}) {\\{}BERT{\\}} {\\{}MODELS{\\}}},\nauthor={Kaiji Lu and Zifan Wang and Piotr Mardziel and Anupam Datta},\nyear={2021},\nurl={https://openreview.net/forum?id=MY5iHZ0IZXl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=MY5iHZ0IZXl", "pdf_size": 0, "rating": "6;6;6;6", "confidence": "4;4;3;3", "wc_review": "739;629;775;707", "wc_reply_reviewers": "59;85;0;75", "wc_reply_authors": "378;769;527;744", "reply_reviewers": "1;1;0;1", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 712.5, 53.87717513010496 ], "wc_reply_reviewers_avg": [ 54.75, 32.942184202022794 ], "wc_reply_authors_avg": [ 604.5, 161.11253830785486 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:0FTBxAjj9MQJ:scholar.google.com/&scioq=ABSTRACTING+INFLUENCE+PATHS+FOR+EXPLAINING+(CONTEXTUALIZATION+OF)+BERT+MODELS&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "M_KwRsbhi5e", "title": "Improving Learning to Branch via Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Branch-and-Bound~(B\\&B) is a general and widely used algorithm paradigm for solving Mixed Integer Programming~(MIP). \nRecently there is a surge of interest in designing learning-based branching policies as a fast approximation of strong branching, a human-designed heuristic. In this work, we argue strong branching is not a good expert to imitate for its poor decision quality when turning off its side effects in solving linear programming. To obtain more effective and non-myopic policies than a local heuristic, we formulate the branching process in MIP as reinforcement learning~(RL) and design a policy characterization for the B\\&B process to improve our agent by novelty search evolutionary strategy. Across a range of NP-hard problems, our trained RL agent significantly outperforms expert-designed branching rules and the state-of-the-art learning-based branching methods in terms of both speed and effectiveness. Our results suggest that with carefully designed policy networks and learning algorithms, reinforcement learning has the potential to advance algorithms for solving MIPs.", "keywords": "Mixed Integer Programming;Branching and Bound;Strong Branching;Reinforcement Learning;Evolution Strategy;Novelty Search", "primary_area": "", "supplementary_material": "/attachment/d9ba115d123a2c65e99eea647780caa93d1fbfae.zip", "author": "Haoran Sun;Wenbo Chen;Hui Li;Le Song", "authorids": "~Haoran_Sun2;~Wenbo_Chen2;~Hui_Li2;~Le_Song1", "gender": "M;M;M;M", "homepage": ";https://wenbo11.github.io/;http://www.cc.gatech.edu/~lsong;", "dblp": ";80/1502-1;94/3481;", "google_scholar": "p7of_yoAAAAJ;https://scholar.google.com/citations?hl=en;Xl4E0CsAAAAJ;L-mvmpAAAAAJ", "orcid": ";;;", "linkedin": ";wenbo-chen-919603184;;", "or_profile": "~Haoran_Sun2;~Wenbo_Chen2;~Le_Song1;~Huiknight_Li1", "aff": "Georgia Institute of Technology;Georgia Institute of Technology;College of Computing, Georgia Institute of Technology;BioMap", "aff_domain": "gatech.edu;gatech.edu;cc.gatech.edu;biomap.com", "position": "PhD student;PhD student;Associate Professor;Principal Researcher", "bibtex": "@misc{\nsun2021improving,\ntitle={Improving Learning to Branch via Reinforcement Learning},\nauthor={Haoran Sun and Wenbo Chen and Hui Li and Le Song},\nyear={2021},\nurl={https://openreview.net/forum?id=M_KwRsbhi5e}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=M_KwRsbhi5e", "pdf_size": 0, "rating": "4;7;7;8", "confidence": "5;3;3;3", "wc_review": "2730;205;343;498", "wc_reply_reviewers": "0;0;85;0", "wc_reply_authors": "1288;24;304;204", "reply_reviewers": "0;0;1;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.5, 1.5 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 944.0, 1036.3438136062762 ], "wc_reply_reviewers_avg": [ 21.25, 36.80607966083864 ], "wc_reply_authors_avg": [ 455.0, 491.2870851141927 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.9622504486493763, "gs_citation": 33, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8893971940434755059&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Georgia Institute of Technology;BioMap", "aff_unique_dep": ";", "aff_unique_url": "https://www.gatech.edu;", "aff_unique_abbr": "Georgia Tech;", "aff_campus_unique_index": "1", "aff_campus_unique": ";Atlanta", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States;" }, { "id": "M_eaMB2DOxw", "title": "On Representing (Anti)Symmetric Functions", "track": "main", "status": "Reject", "tldr": "", "abstract": "Permutation-invariant, -equivariant, and -covariant functions and anti-symmetric functions are important in quantum physics, computer vision, and other disciplines. Applications often require most or all of the following properties: (a) a large class of such functions can be approximated, e.g. all continuous function (b) only the (anti)symmetric functions can be represented (c) a fast algorithm for computing the approximation (d) the representation itself is continuous or differentiable (e) the architecture is suitable for learning the function from data (Anti)symmetric neural networks have recently been developed and applied with great success. A few theoretical approximation results have been proven, but many questions are still open, especially for particles in more than one dimension and the anti-symmetric case, which this work focuses on. More concretely, we derive natural polynomial approximations in the symmetric case, and approximations based on a single generalized Slater determinant in the anti-symmetric case. Unlike some previous super-exponential and discontinuous approximations, these seem a more promising basis for future tighter bounds.", "keywords": "Neural network;approximation;universality;Slater determinant;Vandermonde matrix;equivariance;symmetry;anti-symmetry;symmetric polynomials;polarized basis;multilayer perceptron;continuity;smoothness", "primary_area": "", "supplementary_material": "/attachment/b7ec381a0b44a3d00d9115225f0f2cae5c103984.zip", "author": "Marcus Hutter", "authorids": "~Marcus_Hutter1", "gender": "", "homepage": "http://www.hutter1.net/", "dblp": "h/MarcusHutter", "google_scholar": "https://scholar.google.com.tw/citations?user=7hmCntEAAAAJ", "orcid": "0000-0002-3263-4097", "linkedin": "hutter1/", "or_profile": "~Marcus_Hutter1", "aff": "Australian National University", "aff_domain": "anu.edu.au", "position": "Full Professor", "bibtex": "@misc{\nhutter2021on,\ntitle={On Representing (Anti)Symmetric Functions},\nauthor={Marcus Hutter},\nyear={2021},\nurl={https://openreview.net/forum?id=M_eaMB2DOxw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=M_eaMB2DOxw", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "4;4;4;3", "wc_review": "528;405;818;308", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "321;471;774;138", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 514.75, 191.65512646417784 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 426.0, 232.9688820422161 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 35, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13004327261203793740&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Australian National University", "aff_unique_dep": "", "aff_unique_url": "https://www.anu.edu.au", "aff_unique_abbr": "ANU", "aff_country_unique_index": "0", "aff_country_unique": "Australia" }, { "id": "M_gk45ItxIp", "title": "Interpretable Reinforcement Learning With Neural Symbolic Logic", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Recent progress in deep reinforcement learning (DRL) can be largely attributed to the use of neural networks. However, this black-box approach fails to explain the learned policy in a human understandable way. To address this challenge and improve the transparency, we introduce symbolic logic into DRL and propose a Neural Symbolic Reinforcement Learning framework, in which states and actions are represented in an interpretable way using first-order logic. This framework features a relational reasoning module, which performs on task-level in Hierarchical Reinforcement Learning, enabling end-to-end learning with prior symbolic knowledge. Moreover, interpretability is enabled by extracting the logical rules learned by the reasoning module in a symbolic rule space, providing explainability on task level. Experimental results demonstrate better interpretability of subtasks, along with competing performance compared with existing approaches.", "keywords": "Interpretable Reinforcement Learning;Neural Symbolic Logic", "primary_area": "", "supplementary_material": "", "author": "Zhihao Ma;Yuzheng Zhuang;Paul Weng;Dong Li;Kun Shao;Wulong Liu;Hankz Hankui Zhuo;Jianye HAO", "authorids": "~Zhihao_Ma1;~Yuzheng_Zhuang1;~Paul_Weng1;~Dong_Li10;~Kun_Shao1;~Wulong_Liu1;~Hankz_Hankui_Zhuo2;~Jianye_HAO1", "gender": ";F;M;M;;M;M;M", "homepage": "https://xplan-lab.org/#Members;;http://weng.fr;;;;http://xplan-lab.org;http://www.icdai.org/jianye.html", "dblp": ";;http://dblp.uni-trier.de/pers/hd/w/Weng:Paul;47/4826-16;;36/9257.html;12/793;21/7664.html", "google_scholar": ";https://scholar.google.com/citations?hl=en;_Hd6AeQAAAAJ;;;https://scholar.google.ca/citations?user=od00FfIAAAAJ;;", "orcid": ";;;;;;;0000-0002-0422-8235", "linkedin": ";;paul-weng-69a15980/;;;wulong-liu-28006155/;;", "or_profile": "~Zhihao_Ma1;~Yuzheng_Zhuang1;~Paul_Weng1;~Dong_Li10;~Kun_Shao1;~Wulong_Liu1;~Hankz_Hankui_Zhuo2;~Jianye_HAO1", "aff": "SUN YAT-SEN UNIVERSITY;Huawei Technologies Ltd.;Shanghai Jiaotong University;Huawei Technologies Ltd.;;Huawei Noah's Ark Lab;;Tianjin University", "aff_domain": "sysu.edu.cn;huawei.com;sjtu.edu.cn;huawei.com;;huawei.com;;tju.edu.cn", "position": "MS student;Research Engineer;Assistant Professor;Principal Researcher;;Researcher;;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=M_gk45ItxIp", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "2;4;4;4", "wc_review": "203;420;736;394", "wc_reply_reviewers": "0;0;64;0", "wc_reply_authors": "530;0;681;703", "reply_reviewers": "0;0;1;0", "reply_authors": "1;0;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 438.25, 191.2385617494547 ], "wc_reply_reviewers_avg": [ 16.0, 27.712812921102035 ], "wc_reply_authors_avg": [ 478.5, 284.17468219389286 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 0.75, 0.4330127018922193 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15635986770707216452&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;1;1;3", "aff_unique_norm": "Sun Yat-sen University;Huawei;Shanghai Jiao Tong University;Tianjin University", "aff_unique_dep": ";Huawei Technologies;;", "aff_unique_url": "http://www.sysu.edu.cn;https://www.huawei.com;https://www.sjtu.edu.cn;http://www.tju.edu.cn", "aff_unique_abbr": "SYSU;Huawei;SJTU;TJU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "China" }, { "id": "Ma0S4RcfpR_", "title": "A Representational Model of Grid Cells' Path Integration Based on Matrix Lie Algebras", "track": "main", "status": "Reject", "tldr": "", "abstract": "The grid cells in the mammalian medial entorhinal cortex exhibit striking hexagon firing patterns when the agent navigates in the open field. It is hypothesized that the grid cells are involved in path integration so that the agent is aware of its self-position by accumulating its self-motion. Assuming the grid cells form a vector representation of self-position, we elucidate a minimally simple recurrent model for grid cells' path integration based on two coupled matrix Lie algebras that underlie two coupled rotation systems that mirror the agent's self-motion: (1) When the agent moves along a certain direction, the vector is rotated by a generator matrix. (2) When the agent changes direction, the generator matrix is rotated by another generator matrix. Our experiments show that our model learns hexagonal grid response patterns that resemble the firing patterns observed from the grid cells in the brain. Furthermore, the learned model is capable of near exact path integration, and it is also capable of error correction. Our model is novel and simple, with explicit geometric and algebraic structures. ", "keywords": "grid cells;path integration;representational model;Lie algebras;error correction", "primary_area": "", "supplementary_material": "", "author": "Ruiqi Gao;Jianwen Xie;Xue-Xin Wei;Song-Chun Zhu;Ying Nian Wu", "authorids": "~Ruiqi_Gao1;~Jianwen_Xie1;~Xue-Xin_Wei2;~Song-Chun_Zhu1;~Ying_Nian_Wu1", "gender": "F;;;M;", "homepage": "http://www.stat.ucla.edu/~ruiqigao/;;;https://zhusongchun.net/;", "dblp": "206/7084;;;10/10313;", "google_scholar": "VdlgOXoAAAAJ;;;https://scholar.google.com.tw/citations?user=Al8dyb4AAAAJ;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Ruiqi_Gao1;~Jianwen_Xie1;~Xue-Xin_Wei2;~Song-Chun_Zhu1;~Ying_Nian_Wu1", "aff": "University of California, Los Angeles;;;Peking University;", "aff_domain": "ucla.edu;;;pku.edu.cn;", "position": "PhD student;;;Full Professor;", "bibtex": "@misc{\ngao2021a,\ntitle={A Representational Model of Grid Cells' Path Integration Based on Matrix Lie Algebras},\nauthor={Ruiqi Gao and Jianwen Xie and Xue-Xin Wei and Song-Chun Zhu and Ying Nian Wu},\nyear={2021},\nurl={https://openreview.net/forum?id=Ma0S4RcfpR_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=Ma0S4RcfpR_", "pdf_size": 0, "rating": "5;5;6;8", "confidence": "5;4;4;5", "wc_review": "289;245;404;272", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1053;625;951;357", "reply_reviewers": "0;0;0;0", "reply_authors": "3;1;2;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 302.5, 60.66506408139696 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 746.5, 274.8795190624431 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.40824829046386296, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:9Zjpo0lZBA4J:scholar.google.com/&scioq=A+Representational+Model+of+Grid+Cells%27+Path+Integration+Based+on+Matrix+Lie+Algebras&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of California, Los Angeles;Peking University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ucla.edu;http://www.pku.edu.cn", "aff_unique_abbr": "UCLA;Peking U", "aff_campus_unique_index": "0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;China" }, { "title": "Hopper: Multi-hop Transformer for Spatiotemporal Reasoning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3260", "id": "MaZFq7bJif7", "poster": "", "openreview": "https://openreview.net/forum?id=MaZFq7bJif7", "slides": "https://iclr.cc/virtual/2021/poster/3260", "video": "https://iclr.cc/virtual/2021/poster/3260", "author_site": "Honglu Zhou, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Min, Mubbasir Kapadia, Hans P Graf", "tldr": "", "abstract": "This paper considers the problem of spatiotemporal object-centric reasoning in videos. Central to our approach is the notion of object permanence, i.e., the ability to reason about the location of objects as they move through the video while being occluded, contained or carried by other objects. Existing deep learning based approaches often suffer from spatiotemporal biases when applied to video reasoning problems. We propose Hopper, which uses a Multi-hop Transformer for reasoning object permanence in videos. Given a video and a localization query, Hopper reasons over image and object tracks to automatically hop over critical frames in an iterative fashion to predict the final position of the object of interest. We demonstrate the effectiveness of using a contrastive loss to reduce spatiotemporal biases. We evaluate over CATER dataset and find that Hopper achieves 73.2% Top-1 accuracy using just 1 FPS by hopping through just a few critical frames. We also demonstrate Hopper can perform long-term reasoning by building a CATER-h dataset that requires multi-step reasoning to localize objects of interest correctly.", "keywords": "Multi-hop Reasoning;Object Permanence;Spatiotemporal Understanding;Video Recognition;Transformer", "primary_area": "", "supplementary_material": "/attachment/f45490b13af4702c9c3d806a11c871f8ece0be4e.zip", "author": "Honglu Zhou;Asim Kadav;Farley Lai;Alexandru Niculescu-Mizil;Martin Renqiang Min;Mubbasir Kapadia;Hans Peter Graf", "authorids": "~Honglu_Zhou1;~Asim_Kadav1;~Farley_Lai1;~Alexandru_Niculescu-Mizil1;~Martin_Renqiang_Min1;~Mubbasir_Kapadia2;~Hans_Peter_Graf1", "gender": "F;M;M;;M;M;", "homepage": "https://sites.google.com/view/hongluzhou/;http://asim.ai;https://farley.zyxdao.xyz;http://niculescu-mizil.org;http://www.cs.toronto.edu/~cuty;https://ivi.cs.rutgers.edu/;", "dblp": "184/9372;42/4496;135/8737;71/1974;29/7048;08/4943;69/3091", "google_scholar": "https://scholar.google.com/citations?hl=en;IphEjqcAAAAJ;xa-HsgkAAAAJ;https://scholar.google.com/citations?hl=en;T2M4JjEAAAAJ;xhkzmycAAAAJ;", "orcid": ";;0000-0002-2503-6208;;0000-0002-8563-6133;0000-0002-3501-0028;", "linkedin": "honglu-zhou-21058a169/;;farleylai/;niculescu-mizil/;martin-renqiang-min-955a8766;mubbasir-kapadia-aa9273a/;", "or_profile": "~Honglu_Zhou1;~Asim_Kadav1;~Farley_Lai1;~Alexandru_Niculescu-Mizil1;~Martin_Renqiang_Min1;~Mubbasir_Kapadia2;~Hans_Peter_Graf1", "aff": "Google;NEC Labs;NEC Laboratories America, Inc.;NEC-Labs;NEC Laboratories America;Rutgers University;", "aff_domain": "google.com;nec-labs.com;nec-labs.com;nec-labs.com;nec-labs.com;rutgers.edu;", "position": "Software Engineering Intern;Senior Researcher;Researcher;Senior Researcher;Researcher;Assistant Professor;", "bibtex": "@inproceedings{\nzhou2021hopper,\ntitle={Hopper: Multi-hop Transformer for Spatiotemporal Reasoning},\nauthor={Honglu Zhou and Asim Kadav and Farley Lai and Alexandru Niculescu-Mizil and Martin Renqiang Min and Mubbasir Kapadia and Hans Peter Graf},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MaZFq7bJif7}\n}", "github": "[![github](/images/github_icon.svg) necla-ml/cater-h](https://github.com/necla-ml/cater-h)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "4;3;4;4", "wc_review": "917;250;320;1221", "wc_reply_reviewers": "250;0;0;0", "wc_reply_authors": "1784;626;639;1704", "reply_reviewers": "1;0;0;0", "reply_authors": "4;1;1;3", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 677.0, 407.2204562641715 ], "wc_reply_reviewers_avg": [ 62.5, 108.25317547305482 ], "wc_reply_authors_avg": [ 1188.25, 556.4882635779483 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 1.299038105676658 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 21, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15937741305189053323&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=MaZFq7bJif7", "email": "google.com;nec-labs.com;nec-labs.com;nec-labs.com;nec-labs.com;rutgers.edu;", "author_num": 7, "aff_unique_index": "0;1;2;1;2;3", "aff_unique_norm": "Google;NEC Laboratories;NEC Laboratories America;Rutgers University", "aff_unique_dep": "Google;;;", "aff_unique_url": "https://www.google.com;https://www.nec-labs.com;https://www.nec-labs.com;https://www.rutgers.edu", "aff_unique_abbr": "Google;NEC Labs;NEC Labs America;Rutgers", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "MbG7JBt0Yvo", "title": "Sequence Metric Learning as Synchronization of Recurrent Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Sequence metric learning is becoming a widely adopted approach for various applications dealing with sequential multi-variate data such as activity recognition or natural language processing and is most of the time tackled with sequence alignment approaches or representation learning. \nIn this paper, we propose to study this subject from the point of view of dynamical system theory by drawing the analogy between synchronized trajectories produced by dynamical systems and the distance between similar sequences processed by a siamese recurrent neural network. \nIndeed, a siamese recurrent network comprises two identical sub-networks, two identical dynamical systems which can theoretically achieve complete synchronization if a coupling is introduced between them. \nWe therefore propose a new neural network model that implements this coupling with a new gate integrated into the classical Gated Recurrent Unit architecture. This model is thus able to simultaneously learn a similarity metric and the synchronization of unaligned multi-variate sequences in a weakly supervised way.\nOur experiments show that introducing such a coupling improves the performance of the siamese Gated Recurrent Unit architecture on an activity recognition dataset. ", "keywords": "Metric learning;sequence processing;siamese recurrent neural network;dynamical systems", "primary_area": "", "supplementary_material": "/attachment/477a8d08b3ecda709ff73cd3afa6b02945967759.zip", "author": "Paul Compagnon;Gr\u00e9goire Lefebvre;Stefan Duffner;Christophe Garcia", "authorids": "~Paul_Compagnon1;gregoire.lefebvre@orange.com;~Stefan_Duffner1;~Christophe_Garcia1", "gender": ";;;", "homepage": ";;;", "dblp": ";;;", "google_scholar": "https://scholar.google.fr/citations?user=b_Mb7rAAAAAJ;;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Paul_Compagnon1;gregoire.lefebvre@orange.com;~Stefan_Duffner1;~Christophe_Garcia1", "aff": "Orange-labs;;;", "aff_domain": "orange.com;;;", "position": "PhD student;;;", "bibtex": "@misc{\ncompagnon2021sequence,\ntitle={Sequence Metric Learning as Synchronization of Recurrent Neural Networks},\nauthor={Paul Compagnon and Gr{\\'e}goire Lefebvre and Stefan Duffner and Christophe Garcia},\nyear={2021},\nurl={https://openreview.net/forum?id=MbG7JBt0Yvo}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=MbG7JBt0Yvo", "pdf_size": 0, "rating": "3;4;6", "confidence": "3;5;3", "wc_review": "635;849;302", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 1.247219128924647 ], "confidence_avg": [ 3.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 595.3333333333334, 225.06640995246025 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.18898223650461365, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10814394111938781770&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Orange Labs", "aff_unique_dep": "", "aff_unique_url": "https://www.orange.com/en/innovation/orange-labs", "aff_unique_abbr": "Orange Labs", "aff_country_unique_index": "0", "aff_country_unique": "France" }, { "id": "MbM_gvIB3Y4", "title": "Which Mutual-Information Representation Learning Objectives are Sufficient for Control?", "track": "main", "status": "Reject", "tldr": "", "abstract": "Mutual information maximization provides an appealing formalism for learning representations of data. In the context of reinforcement learning, such representations can accelerate learning by discarding irrelevant and redundant information, while retaining the information necessary for control. Much of the prior work on these methods has addressed the practical difficulties of estimating mutual information from samples of high-dimensional observations, while comparatively less is understood about \\emph{which} mutual information objectives are sufficient for RL from a theoretical perspective. In this paper we identify conditions under which representations that maximize specific mutual-information objectives are theoretically sufficient for learning and representing the optimal policy. Somewhat surprisingly, we find that several popular objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP. We corroborate our theoretical results with deep RL experiments on a simulated game environment with visual observations.", "keywords": "representation learning;reinforcement learning;information theory", "primary_area": "", "supplementary_material": "", "author": "Kate Rakelly;Abhishek Gupta;Carlos Florensa;Sergey Levine", "authorids": "~Kate_Rakelly1;~Abhishek_Gupta1;~Carlos_Florensa1;~Sergey_Levine1", "gender": "F;M;M;M", "homepage": "https://people.eecs.berkeley.edu/~rakelly/;https://homes.cs.washington.edu/~abhgupta/;;https://people.eecs.berkeley.edu/~svlevine/", "dblp": ";18/6404-4;;80/7594", "google_scholar": ";1wLVDP4AAAAJ;https://scholar.google.com/citations?hl=en;8R35rCwAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Kate_Rakelly1;~Abhishek_Gupta1;~Carlos_Florensa1;~Sergey_Levine1", "aff": "University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;Google", "aff_domain": "berkeley.edu;berkeley.edu;berkeley.edu;google.com", "position": "PhD student;PhD student;PhD student;Research Scientist", "bibtex": "@misc{\nrakelly2021which,\ntitle={Which Mutual-Information Representation Learning Objectives are Sufficient for Control?},\nauthor={Kate Rakelly and Abhishek Gupta and Carlos Florensa and Sergey Levine},\nyear={2021},\nurl={https://openreview.net/forum?id=MbM_gvIB3Y4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer5;AnonReviewer3", "site": "https://openreview.net/forum?id=MbM_gvIB3Y4", "pdf_size": 0, "rating": "5;5;5;6;7", "confidence": "4;3;3;3;3", "wc_review": "599;300;500;1010;180", "wc_reply_reviewers": "206;20;65;164;0", "wc_reply_authors": "591;193;547;1451;126", "reply_reviewers": "1;1;1;1;0", "reply_authors": "2;2;3;4;1", "rating_avg": [ 5.6, 0.8 ], "confidence_avg": [ 3.2, 0.39999999999999997 ], "wc_review_avg": [ 517.8, 286.6066293720367 ], "wc_reply_reviewers_avg": [ 91.0, 80.68704976636585 ], "wc_reply_authors_avg": [ 581.6, 472.38187941537296 ], "reply_reviewers_avg": [ 0.8, 0.4 ], "reply_authors_avg": [ 2.4, 1.019803902718557 ], "replies_avg": [ 28, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.375, "gs_citation": 41, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5834613238475859552&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "University of California, Berkeley;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.berkeley.edu;https://www.google.com", "aff_unique_abbr": "UC Berkeley;Google", "aff_campus_unique_index": "0;0;0;1", "aff_campus_unique": "Berkeley;Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "McYsRk9-rso", "title": "Reducing Implicit Bias in Latent Domain Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "A fundamental shortcoming of deep neural networks is their specialization to a single task and domain. While recent techniques in multi-domain learning enable the learning of more domain-agnostic features, their success relies firmly on the presence of domain labels, typically requiring manual annotation and careful curation of datasets. Here we focus on latent domain learning, a highly realistic, yet less explored scenario: learning from data from different domains, without access to domain annotations. This is a particularly challenging problem, since standard models exhibit an implicit bias toward learning only the large domains in data, while disregarding smaller ones. To address this issue, we propose dynamic residual adapters that adaptively account for latent domains, and weighted domain transfer \u2013 a novel augmentation strategy designed specifically for this setting. Our techniques are evaluated on image classification tasks containing multiple unannotated domains, and we demonstrate they enhance performance, in particular, on the smallest of these.", "keywords": "Latent Domain Learning;CNN Architectures", "primary_area": "", "supplementary_material": "", "author": "Lucas Deecke;Timothy Hospedales;Hakan Bilen", "authorids": "~Lucas_Deecke1;~Timothy_Hospedales1;~Hakan_Bilen1", "gender": "M;M;M", "homepage": ";http://homepages.inf.ed.ac.uk/thospeda/;http://homepages.inf.ed.ac.uk/hbilen/", "dblp": "222/9834;32/3545;97/2993", "google_scholar": "6-x0_AsAAAAJ;https://scholar.google.fr/citations?user=nHhtvqkAAAAJ;PtBtfawAAAAJ", "orcid": ";0000-0003-4867-7486;0000-0002-6947-6918", "linkedin": ";timothyhospedales/;", "or_profile": "~Lucas_Deecke1;~Timothy_Hospedales1;~Hakan_Bilen1", "aff": "University of Edinburgh;Samsung AI Research Centre;University of Edinburgh", "aff_domain": "ed.ac.uk;samsung.com;ed.ac.uk", "position": "PhD student;Principal Researcher;Assistant Professor", "bibtex": "@misc{\ndeecke2021reducing,\ntitle={Reducing Implicit Bias in Latent Domain Learning},\nauthor={Lucas Deecke and Timothy Hospedales and Hakan Bilen},\nyear={2021},\nurl={https://openreview.net/forum?id=McYsRk9-rso}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=McYsRk9-rso", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;4;2;3", "wc_review": "278;619;704;217", "wc_reply_reviewers": "182;209;0;0", "wc_reply_authors": "1305;1068;348;328", "reply_reviewers": "1;1;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 454.5, 210.2789813557218 ], "wc_reply_reviewers_avg": [ 97.75, 98.21500649086167 ], "wc_reply_authors_avg": [ 762.25, 432.503395940425 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8181818181818182, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:37vYQkopGhoJ:scholar.google.com/&scioq=Reducing+Implicit+Bias+in+Latent+Domain+Learning&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Edinburgh;Samsung", "aff_unique_dep": ";AI Research", "aff_unique_url": "https://www.ed.ac.uk;https://www.samsung.com/global/researchers/samsung-ai-research-centre/", "aff_unique_abbr": "Edinburgh;SARC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United Kingdom;South Korea" }, { "id": "Mf4ZSXMZP7", "title": "Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming", "track": "main", "status": "Reject", "tldr": "", "abstract": "Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant over-fitting. Instead, these methods only use the calibration set to set the activations' dynamic ranges. However, such methods always resulted in significant accuracy degradation, when used below 8-bits (except on small datasets). Here we aim to break the 8-bit barrier. To this end, we minimize the quantization errors of each layer separately by optimizing its parameters over the calibration set. We empirically demonstrate that this approach is: (1) much less susceptible to over-fitting than the standard fine-tuning approaches, and can be used even on a very small calibration set; and (2) more powerful than previous methods, which only set the activations' dynamic ranges. Furthermore, we demonstrate how to optimally allocate the bit-widths for each layer, while constraining accuracy degradation or model compression by proposing a novel integer programming formulation. Finally, we suggest model global statistics tuning, to correct biases introduced during quantization. Together, these methods yield state-of-the-art results for both vision and text models. For instance, on ResNet50, we obtain less than 1\\% accuracy degradation --- with 4-bit weights and activations in all layers, but the smallest two. Our code is available at, https://github.com/papers-submission/CalibTIP", "keywords": "Efficient Deep Learning;Quantization;Compression", "primary_area": "", "supplementary_material": "", "author": "Itay Hubara;Yury Nahshan;Yair Hanani;Ron Banner;Daniel Soudry", "authorids": "~Itay_Hubara1;~Yury_Nahshan1;~Yair_Hanani1;~Ron_Banner1;~Daniel_Soudry1", "gender": "M;M;M;M;M", "homepage": ";;;;https://soudry.github.io/", "dblp": "http://dblp.uni-trier.de/pers/hd/h/Hubara:Itay;228/7866;;03/5857;126/1779", "google_scholar": ";vdRZRhIAAAAJ;3A_kOosAAAAJ;;https://scholar.google.co.il/citations?user=AEBWEm8AAAAJ", "orcid": ";;;;0000-0001-9368-6352", "linkedin": "itay-hubara-57739b29/;;;https://il.linkedin.com/in/ron-banner-69403a51;daniel-soudry-2aa3a88/", "or_profile": "~Itay_Hubara1;~Yury_Nahshan1;~Yair_Hanani1;~Ron_Banner1;~Daniel_Soudry1", "aff": "Technion, Technion;Huawei Technologies Ltd.;Intel;Intel;Technion - Israel Institute of Technology", "aff_domain": "technion.ac.il;huawei.com;intel.com;intel.com;technion.ac.il", "position": "PhD student;Researcher;Researcher;Researcher;Assistant Professor", "bibtex": "@misc{\nhubara2021improving,\ntitle={Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming},\nauthor={Itay Hubara and Yury Nahshan and Yair Hanani and Ron Banner and Daniel Soudry},\nyear={2021},\nurl={https://openreview.net/forum?id=Mf4ZSXMZP7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer5;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=Mf4ZSXMZP7", "pdf_size": 0, "rating": "4;4;6;6;7", "confidence": "2;5;3;4;5", "wc_review": "161;567;257;528;534", "wc_reply_reviewers": "0;0;0;0;24", "wc_reply_authors": "293;1407;573;436;540", "reply_reviewers": "0;0;0;0;1", "reply_authors": "1;2;2;1;1", "rating_avg": [ 5.4, 1.2 ], "confidence_avg": [ 3.8, 1.16619037896906 ], "wc_review_avg": [ 409.4, 166.94741687130113 ], "wc_reply_reviewers_avg": [ 4.8, 9.6 ], "wc_reply_authors_avg": [ 649.8, 390.9789764168912 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 1.4, 0.4898979485566356 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.3429971702850177, "gs_citation": 169, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9100334331875007411&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1;2;2;0", "aff_unique_norm": "Technion - Israel Institute of Technology;Huawei;Intel", "aff_unique_dep": ";Huawei Technologies;Intel Corporation", "aff_unique_url": "https://www.technion.ac.il/en/;https://www.huawei.com;https://www.intel.com", "aff_unique_abbr": "Technion;Huawei;Intel", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2;2;0", "aff_country_unique": "Israel;China;United States" }, { "id": "Mh1Abj33qI", "title": "Data-driven Learning of Geometric Scattering Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Many popular graph neural network (GNN) architectures, which are often considered as the current state of the art, rely on encoding graph structure via smoothness or similarity between neighbors. While this approach performs well on a surprising number of standard benchmarks, the efficacy of such models does not translate consistently to more complex domains, such as graph data in the biochemistry domain. We argue that these more complex domains require priors that encourage learning of longer range features rather than oversmoothed signals of standard GNN architectures. Here, we propose an alternative GNN architecture, based on a relaxation of recently proposed geometric scattering transforms, which consists of a cascade of graph wavelet filters. Our learned geometric scattering (LEGS) architecture adaptively tunes these wavelets and their scales to encourage band-pass features to emerge in learned representations. This results in a simplified GNN with significantly fewer learned parameters compared to competing methods. We demonstrate the predictive performance of our method on several biochemistry graph classification benchmarks, as well as the descriptive quality of its learned features in biochemical graph data exploration tasks. Our results show that the proposed LEGS network matches or outperforms popular GNNs, as well as the original geometric scattering construction, while retaining certain mathematical properties of its handcrafted (nonlearned) design.", "keywords": "Graph Neural Networks;GNNs;Geometric Scattering;Radial Basis Network;Graph Signal Processing;Wavelet", "primary_area": "", "supplementary_material": "/attachment/67a54430fad47fb334e0bf8dbeb3f76cc24d9c81.zip", "author": "Alexander Tong;Frederik Wenkel;Kincaid Macdonald;Smita Krishnaswamy;Guy Wolf", "authorids": "~Alexander_Tong1;frederik.wenkel@umontreal.ca;kincaid.macdonald@yale.edu;~Smita_Krishnaswamy1;~Guy_Wolf1", "gender": ";;;F;M", "homepage": "https://alextong.net;;;http://www.krishnaswamylab.org;http://guywolf.org", "dblp": "153/9296;;;74/2457;120/1308", "google_scholar": "CS80pt4AAAAJ;;;l2Pr9m8AAAAJ;g0k3SjcAAAAJ", "orcid": "0000-0002-2031-4096;;;;0000-0002-6740-059X", "linkedin": "atong01/;;;;", "or_profile": "~Alexander_Tong1;frederik.wenkel@umontreal.ca;kincaid.macdonald@yale.edu;~Smita_Krishnaswamy1;~Guy_Wolf1", "aff": "Yale University;;;Yale University;University of Montreal", "aff_domain": "yale.edu;;;yale.edu;umontreal.ca", "position": "PhD student;;;Associate Professor;Assistant Professor", "bibtex": "@misc{\ntong2021datadriven,\ntitle={Data-driven Learning of Geometric Scattering Networks},\nauthor={Alexander Tong and Frederik Wenkel and Kincaid Macdonald and Smita Krishnaswamy and Guy Wolf},\nyear={2021},\nurl={https://openreview.net/forum?id=Mh1Abj33qI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=Mh1Abj33qI", "pdf_size": 0, "rating": "4;6;6;8", "confidence": "3;3;4;5", "wc_review": "536;260;839;1251", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "880;1012;1577;250", "reply_reviewers": "0;0;0;0", "reply_authors": "2;3;3;2", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 721.5, 367.9568588842991 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 929.75, 471.76073119750015 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.5, 0.5 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8528028654224418, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11729808541605378757&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0;1", "aff_unique_norm": "Yale University;University of Montreal", "aff_unique_dep": ";", "aff_unique_url": "https://www.yale.edu;https://wwwumontreal.ca", "aff_unique_abbr": "Yale;UM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United States;Canada" }, { "id": "MhTgnultR1K", "title": "A Real-time Contribution Measurement Method for Participants in Federated Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Federated learning is a framework for protecting distributed data privacy and has participated in commercial activities. However, there is a lack of a sufficiently reasonable contribution measurement mechanism to distribute the reward for each agent. In the commercial union, if there is no mechanism like this, every agent will get the same reward. This is unfair to agents that provide better data, so such a mechanism is needed. To address this issue, this work proposes a real-time contribution measurement method. Firstly, the method defines the impact of each agent. Furthermore, we comprehensively consider the current round and the previous round to obtain the contribution rate of each agent. To verify effectiveness of the proposed method, the work conducts pseudo-distributed training and an experiment on the Penn Treebank dataset. Comparing the Shapley Value in game theory, the comparative experiment result shows that the proposed method is more sensitive to both data quantity and data quality under the premise of maintaining real-time.", "keywords": "Federated Learning;Contribution Evaluation;Multi-party Participation", "primary_area": "", "supplementary_material": "", "author": "Bingjie Yan;Yize Zhou;Boyi Liu;Jun Wang;Yuhan Zhang;Li Liu;Xiaolan Nie;Zhiwei Fan;Zhixuan Liang", "authorids": "~Bingjie_Yan1;yizezhou20001203@163.com;by.liu@ieee.org;20180581310080@hainanu.edu.cn;zhangyh01230@163.com;hainan_lily2001@163.com;niexiaolan25@163.com;hnufzw@gmail.com;~Zhixuan_Liang1", "gender": ";;;;;;;;M", "homepage": ";;;;;;;;http://www4.comp.polyu.edu.hk/~cszwen/zhixuan-liang.html", "dblp": ";;;;;;;;", "google_scholar": ";;;;;;;;", "orcid": ";;;;;;;;", "linkedin": ";;;;;;;;", "or_profile": "~Bingjie_Yan1;yizezhou20001203@163.com;by.liu@ieee.org;20180581310080@hainanu.edu.cn;zhangyh01230@163.com;hainan_lily2001@163.com;niexiaolan25@163.com;hnufzw@gmail.com;~Zhixuan_Liang1", "aff": ";;;;;;;;", "aff_domain": ";;;;;;;;", "position": ";;;;;;;;", "bibtex": "@misc{\nyan2021a,\ntitle={A Real-time Contribution Measurement Method for Participants in Federated Learning},\nauthor={Bingjie Yan and Yize Zhou and Boyi Liu and Jun Wang and Yuhan Zhang and Li Liu and Xiaolan Nie and Zhiwei Fan and Zhixuan Liang},\nyear={2021},\nurl={https://openreview.net/forum?id=MhTgnultR1K}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=MhTgnultR1K", "pdf_size": 0, "rating": "3;3;4;4", "confidence": "5;4;3;4", "wc_review": "339;290;272;371", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "56;110;93;95", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.5, 0.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 318.0, 39.2109678533953 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 88.5, 19.880895352071043 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": -0.7071067811865475, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6504683241632195983&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0 }, { "title": "Exploring the Uncertainty Properties of Neural Networks\u2019 Implicit Priors in the Infinite-Width Limit", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2865", "id": "MjvduJCsE4", "poster": "", "openreview": "https://openreview.net/forum?id=MjvduJCsE4", "slides": "https://iclr.cc/virtual/2021/poster/2865", "video": "https://iclr.cc/virtual/2021/poster/2865", "author_site": "Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek", "tldr": "", "abstract": "Modern deep learning models have achieved great success in predictive accuracy for many data modalities. However, their application to many real-world tasks is restricted by poor uncertainty estimates, such as overconfidence on out-of-distribution (OOD) data and ungraceful failing under distributional shift. Previous benchmarks have found that ensembles of neural networks (NNs) are typically the best calibrated models on OOD data. Inspired by this, we leverage recent theoretical advances that characterize the function-space prior of an infinitely-wide NN as a Gaussian process, termed the neural network Gaussian process (NNGP). We use the NNGP with a softmax link function to build a probabilistic model for multi-class classification and marginalize over the latent Gaussian outputs to sample from the posterior. This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue. We also examine the calibration of previous approaches to classification with the NNGP, which treat classification problems as regression to the one-hot labels. In this case the Bayesian posterior is exact, and we compare several heuristics to generate a categorical distribution over classes. We find these methods are well calibrated under distributional shift. Finally, we consider an infinite-width final layer in conjunction with a pre-trained embedding. This replicates the important practical use case of transfer learning and allows scaling to significantly larger datasets. As well as achieving competitive predictive accuracy, this approach is better calibrated than its finite width analogue.", "keywords": "Deep Learning;Uncertainty;Infinite-Width Limit;Neural Network Gaussian Process;Bayesian Neural Networks;Gaussian Process", "primary_area": "", "supplementary_material": "", "author": "Ben Adlam;Jaehoon Lee;Lechao Xiao;Jeffrey Pennington;Jasper Snoek", "authorids": "~Ben_Adlam1;~Jaehoon_Lee2;~Lechao_Xiao2;~Jeffrey_Pennington1;~Jasper_Snoek1", "gender": "M;;M;M;M", "homepage": "http://www.benadlam.com;https://jaehlee.github.io;https://sites.google.com/site/lechaoxiao/;;", "dblp": ";95/386-1.html;222/3238;https://dblp.org/pers/p/Pennington:Jeffrey.html;95/6097", "google_scholar": "Q93u3c0AAAAJ;d3YhiooAAAAJ;fvwzUnIAAAAJ;cn_FoswAAAAJ;FM2DTXwAAAAJ", "orcid": ";;;;", "linkedin": ";eejaehoon/;;jpennin;", "or_profile": "~Ben_Adlam1;~Jaehoon_Lee2;~Lechao_Xiao2;~Jeffrey_Pennington1;~Jasper_Snoek1", "aff": "Google;Google;Google Research, Brain Team;Google;Google", "aff_domain": "google.com;google.com;google.com;google.com;google.com", "position": "Research Scientist;Research Scientist;Research Scientist;Research Scientist;Research Scientist", "bibtex": "@inproceedings{\nadlam2021exploring,\ntitle={Exploring the Uncertainty Properties of Neural Networks{\\textquoteright} Implicit Priors in the Infinite-Width Limit},\nauthor={Ben Adlam and Jaehoon Lee and Lechao Xiao and Jeffrey Pennington and Jasper Snoek},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MjvduJCsE4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;4;3;4", "wc_review": "359;1023;760;694", "wc_reply_reviewers": "15;0;0;0", "wc_reply_authors": "1123;944;347;495", "reply_reviewers": "1;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 709.0, 236.60198646672433 ], "wc_reply_reviewers_avg": [ 3.75, 6.49519052838329 ], "wc_reply_authors_avg": [ 727.25, 317.068111767803 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 26, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4194918161901179822&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=MjvduJCsE4", "email": "google.com;google.com;google.com;google.com;google.com", "author_num": 5, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2724", "id": "Mk6PZtgAgfq", "poster": "", "openreview": "https://openreview.net/forum?id=Mk6PZtgAgfq", "slides": "https://iclr.cc/virtual/2021/poster/2724", "video": "https://iclr.cc/virtual/2021/poster/2724", "author_site": "Max B Paulus, Chris Maddison, Andreas Krause", "tldr": "", "abstract": "Gradient estimation in models with discrete latent variables is a challenging problem, because the simplest unbiased estimators tend to have high variance. To counteract this, modern estimators either introduce bias, rely on multiple function evaluations, or use learned, input-dependent baselines. Thus, there is a need for estimators that require minimal tuning, are computationally cheap, and have low mean squared error. In this paper, we show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization without increasing the number of function evaluations. This provably reduces the mean squared error. We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.", "keywords": "gumbel;softmax;gumbel-softmax;straight-through;straightthrough;rao;rao-blackwell", "primary_area": "", "supplementary_material": "/attachment/360b5c736614b92f048e24e213e058263c7a3290.zip", "author": "Max B Paulus;Chris J. Maddison;Andreas Krause", "authorids": "~Max_B_Paulus1;~Chris_J._Maddison1;~Andreas_Krause1", "gender": "M;M;M", "homepage": "https://ml.inf.ethz.ch/people/person-detail.MjIyMDk5.TGlzdC8xODA3LC0xNzg2MjE4NDI4.html;https://las.inf.ethz.ch/krausea;http://www.cs.toronto.edu/~cmaddis/", "dblp": "267/5373;87/1831-1.html;139/1388", "google_scholar": ";https://scholar.google.ch/citations?user=eDHv58AAAAAJ;https://scholar.google.ca/citations?user=WjCG3owAAAAJ", "orcid": ";0000-0001-7260-9673;", "linkedin": ";krausea/;", "or_profile": "~Max_B_Paulus1;~Andreas_Krause1;~Chris_J_Maddison1", "aff": "Swiss Federal Institute of Technology;ETH Zurich;Google", "aff_domain": "ethz.ch;ethz.ch;google.com", "position": "PhD student;Full Professor;Researcher", "bibtex": "@inproceedings{\npaulus2021raoblackwellizing,\ntitle={Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator},\nauthor={Max B Paulus and Chris J. Maddison and Andreas Krause},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Mk6PZtgAgfq}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 5 community implementations](https://paperswithcode.com/paper/?openreview=Mk6PZtgAgfq)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "7;7;8", "confidence": "4;3;3", "wc_review": "204;450;238", "wc_reply_reviewers": "0;111;0", "wc_reply_authors": "76;790;89", "reply_reviewers": "0;1;0", "reply_authors": "1;5;1", "rating_avg": [ 7.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 297.3333333333333, 108.84035199420398 ], "wc_reply_reviewers_avg": [ 37.0, 52.32590180780452 ], "wc_reply_authors_avg": [ 318.3333333333333, 333.56092230489 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.3333333333333335, 1.8856180831641267 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 43, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3146399779253755928&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=Mk6PZtgAgfq", "email": "ethz.ch;ethz.ch;google.com", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "Swiss Federal Institute of Technology;ETH Zurich;Google", "aff_unique_dep": ";;Google", "aff_unique_url": "https://www.ethz.ch;https://www.ethz.ch;https://www.google.com", "aff_unique_abbr": "ETH Zurich;ETHZ;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;1", "aff_country_unique": "Switzerland;United States" }, { "id": "MkrAyYVmt7b", "title": "Perfect density models cannot guarantee anomaly detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "Thanks to the tractability of their likelihood, some deep generative models show promise for seemingly straightforward but important applications like anomaly detection, uncertainty estimation, and active learning. However, the likelihood values empirically attributed to anomalies conflict with the expectations these proposed applications suggest. In this paper, we take a closer look at the behavior of distribution densities and show that these quantities carry less meaningful information than previously thought, beyond estimation\u00a0issues or the curse of dimensionality. We conclude that the use of these likelihoods for out-of-distribution detection relies on strong and implicit hypotheses and highlight the necessity of explicitly formulating these assumptions for reliable anomaly detection.", "keywords": "anomaly detection;out-of-distribution detection;OOD detection;outlier detection;density estimation", "primary_area": "", "supplementary_material": "", "author": "Charline Le Lan;Laurent Dinh", "authorids": "~Charline_Le_Lan2;~Laurent_Dinh1", "gender": "F;", "homepage": "http://csml.stats.ox.ac.uk/people/lelan/;https://laurent-dinh.github.io/", "dblp": "234/9001;131/6819", "google_scholar": "3geG4OkAAAAJ;h7OHSkoAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Charline_Le_Lan2;~Laurent_Dinh1", "aff": "University of Oxford;Google", "aff_domain": "ox.ac.uk;google.com", "position": "PhD student;Research Scientist", "bibtex": "@misc{\nlan2021perfect,\ntitle={Perfect density models cannot guarantee anomaly detection},\nauthor={Charline Le Lan and Laurent Dinh},\nyear={2021},\nurl={https://openreview.net/forum?id=MkrAyYVmt7b}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=MkrAyYVmt7b", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "4;3;4;4", "wc_review": "295;1202;281;412", "wc_reply_reviewers": "127;2142;144;0", "wc_reply_authors": "308;3894;526;244", "reply_reviewers": "1;4;1;0", "reply_authors": "2;7;2;1", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 547.5, 381.28368703630633 ], "wc_reply_reviewers_avg": [ 603.25, 890.138577694507 ], "wc_reply_authors_avg": [ 1243.0, 1534.1215727575177 ], "reply_reviewers_avg": [ 1.5, 1.5 ], "reply_authors_avg": [ 3.0, 2.345207879911715 ], "replies_avg": [ 28, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 60, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6238566466640748238&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "aff_unique_index": "0;1", "aff_unique_norm": "University of Oxford;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.ox.ac.uk;https://www.google.com", "aff_unique_abbr": "Oxford;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;United States" }, { "title": "Open Question Answering over Tables and Text", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2617", "id": "MmCRswl1UYl", "poster": "", "openreview": "https://openreview.net/forum?id=MmCRswl1UYl", "slides": "https://iclr.cc/virtual/2021/poster/2617", "video": "https://iclr.cc/virtual/2021/poster/2617", "author_site": "wenhu chen, Ming-Wei Chang, Eva Schlinger, William Yang Wang, William Cohen", "tldr": "", "abstract": "In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. Here we consider for the first time open QA over {\\em both} tabular and textual data and present a new large-scale dataset \\emph{Open Table-and-Text Question Answering} (OTT-QA) to evaluate performance on this task. Most questions in OTT-QA require multi-hop inference across tabular data and unstructured text, and the evidence required to answer a question can be distributed in different ways over these two types of input, making evidence retrieval challenging---our baseline model using an iterative retriever and BERT-based reader achieves an exact match score less than 10\\%. We then propose two novel techniques to address the challenge of retrieving and aggregating evidence for OTT-QA. The first technique is to use ``early fusion'' to group multiple highly relevant tabular and textual units into a fused block, which provides more context for the retriever to search for. The second technique is to use a cross-block reader to model the cross-dependency between multiple retrieved evidence with global-local sparse attention. Combining these two techniques improves the score significantly, to above 27\\%.", "keywords": "Question Answering;Tabular Data;Open-domain;Retrieval", "primary_area": "", "supplementary_material": "/attachment/d390299cf512bb8528303cb87596c2aad142e10c.zip", "author": "Wenhu Chen;Ming-Wei Chang;Eva Schlinger;William Yang Wang;William W. Cohen", "authorids": "~Wenhu_Chen3;~Ming-Wei_Chang3;~Eva_Schlinger2;~William_Yang_Wang2;~William_W._Cohen2", "gender": "F;M;;M;M", "homepage": "https://eschling.com;https://wwcohen.github.io/;https://mingweichang.org/;https://wenhuchen.github.io/;https://www.cs.ucsb.edu/~william/", "dblp": ";c/WWCohen.html;69/4618;136/0957.html;08/9282", "google_scholar": ";8ys-38kAAAAJ;GiCqMFkAAAAJ;https://scholar.google.co.jp/citations?user=U8ShbhUAAAAJ;gf8Ms_8AAAAJ", "orcid": ";;;;", "linkedin": ";;ming-wei-chang-4962497/;;", "or_profile": "~Eva_Schlinger2;~William_W._Cohen2;~Ming-Wei_Chang2;~wenhu_chen1;~William_Wang1", "aff": "Google;Google DeepMind;Google Deepmind;University of California, Santa Barbara;UC Santa Barbara", "aff_domain": "google.com;google.com;google.com;ucsb.edu;ucsb.edu", "position": "Research software engineer;Principle Scientist;Research scientist;PhD student;Full Professor", "bibtex": "@inproceedings{\nchen2021open,\ntitle={Open Question Answering over Tables and Text},\nauthor={Wenhu Chen and Ming-Wei Chang and Eva Schlinger and William Yang Wang and William W. Cohen},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MmCRswl1UYl}\n}", "github": "[![github](/images/github_icon.svg) wenhuchen/OTT-QA](https://github.com/wenhuchen/OTT-QA)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;4;4;4", "wc_review": "945;794;281;204", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "821;1322;254;74", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 556.0, 319.17628358009307 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 617.75, 491.2374044186782 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 206, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3303883977664528561&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=MmCRswl1UYl", "email": "google.com;google.com;google.com;ucsb.edu;ucsb.edu", "author_num": 5, "aff_unique_index": "0;0;1;2;2", "aff_unique_norm": "Google;DeepMind;University of California, Santa Barbara", "aff_unique_dep": "Google;DeepMind;", "aff_unique_url": "https://www.google.com;https://deepmind.com;https://www.ucsb.edu", "aff_unique_abbr": "Google;DeepMind;UCSB", "aff_campus_unique_index": "0;2;2", "aff_campus_unique": "Mountain View;;Santa Barbara", "aff_country_unique_index": "0;1;1;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "MmcywoW7PbJ", "title": "Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "It is of significance for an agent to learn a widely applicable and general-purpose policy that can achieve diverse goals including images and text descriptions. Considering such perceptually-specific goals, the frontier of deep reinforcement learning research is to learn a goal-conditioned policy without hand-crafted rewards. To learn this kind of policy, recent works usually take as the reward the non-parametric distance to a given goal in an explicit embedding space. From a different viewpoint, we propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM), which jointly learns both an abstract-level policy and a goal-conditioned policy. The abstract-level policy is conditioned on a latent variable to optimize a discriminator and discovers diverse states that are further rendered into perceptually-specific goals for the goal-conditioned policy. The learned discriminator serves as an intrinsic reward function for the goal-conditioned policy to imitate the trajectory induced by the abstract-level policy. Experiments on various robotic tasks demonstrate the effectiveness and efficiency of our proposed GPIM method which substantially outperforms prior techniques. ", "keywords": "unsupervised reinforcement learning;goal-conditioned policy;intrinsic reward", "primary_area": "", "supplementary_material": "", "author": "Jinxin Liu;Donglin Wang;Qiangxing Tian;Zhengyu Chen", "authorids": "~Jinxin_Liu1;~Donglin_Wang1;~Qiangxing_Tian1;~Zhengyu_Chen2", "gender": ";M;M;", "homepage": ";https://milab.westlake.edu.cn/;https://scholar.google.com/citations?user=kr9s_zQAAAAJ&hl=zh-CN&oi=ao;", "dblp": ";;;", "google_scholar": ";https://scholar.google.ca/citations?user=-fo6wdwAAAAJ;;", "orcid": ";0000-0002-8188-3735;;", "linkedin": ";;;", "or_profile": "~Jinxin_Liu1;~Donglin_Wang1;~Qiangxing_Tian1;~Zhengyu_Chen2", "aff": ";Westlake University;Zhejiang University;", "aff_domain": ";westlake.edu.cn;zju.edu.cn;", "position": ";Associate Professor;PhD student;", "bibtex": "@misc{\nliu2021learn,\ntitle={Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning},\nauthor={Jinxin Liu and Donglin Wang and Qiangxing Tian and Zhengyu Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=MmcywoW7PbJ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=MmcywoW7PbJ", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;4;3;4", "wc_review": "828;466;377;746", "wc_reply_reviewers": "0;0;69;128", "wc_reply_authors": "481;269;503;452", "reply_reviewers": "0;0;1;1", "reply_authors": "1;1;1;2", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 604.25, 187.6917353001991 ], "wc_reply_reviewers_avg": [ 49.25, 53.48539520280279 ], "wc_reply_authors_avg": [ 426.25, 92.57260663932932 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.7071067811865476, "gs_citation": 23, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14812738338352187643&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1", "aff_unique_norm": "Westlake University;Zhejiang University", "aff_unique_dep": ";", "aff_unique_url": "https://www.westlake.edu.cn;https://www.zju.edu.cn", "aff_unique_abbr": "WU;ZJU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "title": "Complex Query Answering with Neural Link Predictors", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3140", "id": "Mos9F9kDwkz", "poster": "", "openreview": "https://openreview.net/forum?id=Mos9F9kDwkz", "slides": "https://iclr.cc/virtual/2021/poster/3140", "video": "https://iclr.cc/virtual/2021/poster/3140", "author_site": "Erik Arakelyan, Daniel Daza, Pasquale Minervini, Michael Cochez", "tldr": "", "abstract": "Neural link predictors are immensely useful for identifying missing edges in large scale Knowledge Graphs. However, it is still not clear how to use these models for answering more complex queries that arise in a number of domains, such as queries using logical conjunctions ($\\land$), disjunctions ($\\lor$) and existential quantifiers ($\\exists$), while accounting for missing edges. In this work, we propose a framework for efficiently answering complex queries on incomplete Knowledge Graphs. We translate each query into an end-to-end differentiable objective, where the truth value of each atom is computed by a pre-trained neural link predictor. We then analyse two solutions to the optimisation problem, including gradient-based and combinatorial search. In our experiments, the proposed approach produces more accurate results than state-of-the-art methods --- black-box neural models trained on millions of generated queries --- without the need of training on a large and diverse set of complex queries. Using orders of magnitude less training data, we obtain relative improvements ranging from 8% up to 40% in Hits@3 across different knowledge graphs containing factual information. Finally, we demonstrate that it is possible to explain the outcome of our model in terms of the intermediate solutions identified for each of the complex query atoms. All our source code and datasets are available online, at https://github.com/uclnlp/cqd.", "keywords": "neural link prediction;complex query answering", "primary_area": "", "supplementary_material": "", "author": "Erik Arakelyan;Daniel Daza;Pasquale Minervini;Michael Cochez", "authorids": "erik.arakelyan.18@alumni.ucl.ac.uk;dfdazac@gmail.com;~Pasquale_Minervini1;~Michael_Cochez2", "gender": ";;M;M", "homepage": ";;https://www.neuralnoise.com;https://www.cochez.nl", "dblp": ";;58/10142;83/11448", "google_scholar": ";;https://scholar.google.it/citations?user=9sk6CSgAAAA;https://scholar.google.fi/citations?user=JuZrOtoAAAAJ", "orcid": ";;0000-0002-8442-602X;0000-0001-5726-4638", "linkedin": ";;pasquale-mauro-minervini-47a08324/;michaelcochez/", "or_profile": "erik.arakelyan.18@alumni.ucl.ac.uk;dfdazac@gmail.com;~Pasquale_Minervini1;~Michael_Cochez2", "aff": ";;University College London, University of London;VU Amsterdam", "aff_domain": ";;ucl.ac.uk;vu.nl", "position": ";;Postdoc;Assistant Professor", "bibtex": "@inproceedings{\narakelyan2021complex,\ntitle={Complex Query Answering with Neural Link Predictors},\nauthor={Erik Arakelyan and Daniel Daza and Pasquale Minervini and Michael Cochez},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Mos9F9kDwkz}\n}", "github": "[![github](/images/github_icon.svg) uclnlp/cqd](https://github.com/uclnlp/cqd) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=Mos9F9kDwkz)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;8;9;9", "confidence": "5;4;4;2", "wc_review": "680;423;633;274", "wc_reply_reviewers": "0;0;5;0", "wc_reply_authors": "970;654;393;321", "reply_reviewers": "0;0;1;0", "reply_authors": "2;1;1;1", "rating_avg": [ 8.0, 1.224744871391589 ], "confidence_avg": [ 3.75, 1.0897247358851685 ], "wc_review_avg": [ 502.5, 163.6069986278093 ], "wc_reply_reviewers_avg": [ 1.25, 2.165063509461097 ], "wc_reply_authors_avg": [ 584.5, 254.72779589200704 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7492686492653551, "gs_citation": 162, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8823088409575332587&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=Mos9F9kDwkz", "email": ";;ucl.ac.uk;vu.nl", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "University College London;Vrije Universiteit Amsterdam", "aff_unique_dep": ";", "aff_unique_url": "https://www.ucl.ac.uk;https://www.vu.nl", "aff_unique_abbr": "UCL;VU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Amsterdam", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;Netherlands" }, { "id": "MpStQoD73Mj", "title": "Differentiable Weighted Finite-State Transducers", "track": "main", "status": "Reject", "tldr": "", "abstract": "We introduce a framework for automatic differentiation with weighted finite-state transducers (WFSTs) allowing them to be used dynamically at training time. Through the separation of graphs from operations on graphs, this framework enables the exploration of new structured loss functions which in turn eases the encoding of prior knowledge into learning algorithms. We show how the framework can combine pruning and back-off in transition models with various sequence-level loss functions. We also show how to learn over the latent decomposition of phrases into word pieces. Finally, to demonstrate that WFSTs can be used in the interior of a deep neural network, we propose a convolutional WFST layer which maps lower-level representations to higher-level representations and can be used as a drop-in replacement for a traditional convolution. We validate these algorithms with experiments in handwriting recognition and speech recognition.", "keywords": "weighted automata;automatic differentiation;sequence models", "primary_area": "", "supplementary_material": "", "author": "Awni Hannun;Vineel Pratap;Jacob Kahn;Wei-Ning Hsu", "authorids": "~Awni_Hannun1;vineelkpratap@fb.com;~Jacob_Kahn1;~Wei-Ning_Hsu2", "gender": "M;;M;", "homepage": "https://www.awnihannun.com/;;https://jacobkahn.me/;", "dblp": "https://dblp.uni-trier.de/pers/hd/h/Hannun:Awni;;232/2341;", "google_scholar": "3-mdTUAAAAAJ;;_-pugt8AAAAJ;", "orcid": ";;0000-0003-2911-2500;", "linkedin": ";;jacobdavidkahn/;", "or_profile": "~Awni_Hannun1;vineelkpratap@fb.com;~Jacob_Kahn1;~Wei-Ning_Hsu2", "aff": "Meta Facebook;;Meta AI;", "aff_domain": "facebook.com;;meta.com;", "position": "Research Scientist (FAIR);;Research Engineer;", "bibtex": "@misc{\nhannun2021differentiable,\ntitle={Differentiable Weighted Finite-State Transducers},\nauthor={Awni Hannun and Vineel Pratap and Jacob Kahn and Wei-Ning Hsu},\nyear={2021},\nurl={https://openreview.net/forum?id=MpStQoD73Mj}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=MpStQoD73Mj", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "5;4;5;5", "wc_review": "835;687;993;432", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "771;531;716;409", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 736.75, 206.55795191664734 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 606.75, 144.70379227926267 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 36, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10035108293445622537&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta Platforms, Inc.", "aff_unique_url": "https://meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "MqWHrrIfBMN", "title": "Asynchronous Edge Learning using Cloned Knowledge Distillation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "With the increasing demand for more and more data, the federated learning (FL) methods, which try to utilize highly distributed on-device local data in the training process, have been proposed.However, fledgling services provided by startup companies not only have limited number of clients, but also have minimal resources for constant communications between the server and multiple clients. In addition, in a real-world environment where the user pool changes dynamically, the FL system must be able to efficiently utilize rapid inflow and outflow of users, while at the same time experience minimal bottleneck due to network delays of multiple users. In this respect, we amend the federated learning scenario to a more flexible asynchronous edge learning. To solve the aforementioned learning problems, we propose an asynchronous model-based communication method with knowledge distillation. In particular, we dub our knowledge distillation scheme as ``\"cloned distillation\" and explain how it is different from other knowledge distillation method. In brief, we found that in knowledge distillation between the teacher and the student there exist two contesting traits in the student: to attend to the teacher's knowledge or to retain its own knowledge exclusive to the teacher. And in this edge learning scenario, the attending property should be amplified rather than the retaining property, because teachers are dispatched to the users to learn from them and recollected at the server to teach the core model. Our asynchronous edge learning method can elastically handle the dynamic inflow and outflow of users in a service with minimal communication cost, operate with essentially no bottleneck due to user delay, and protect user's privacy. Also we found that it is robust to users who behave abnormally or maliciously.", "keywords": "Knowledge distillation;Federated learning", "primary_area": "", "supplementary_material": "", "author": "Sangho Lee;KiYoon Yoo;Nojun Kwak", "authorids": "~Sangho_Lee3;961230@snu.ac.kr;~Nojun_Kwak1", "gender": ";;M", "homepage": "http://mipal.snu.ac.kr/index.php/Main_Page;;http://mipal.snu.ac.kr", "dblp": ";;49/2806", "google_scholar": ";;h_8-1M0AAAAJ", "orcid": ";;0000-0002-1792-0327", "linkedin": ";;", "or_profile": "~Sangho_Lee3;961230@snu.ac.kr;~Nojun_Kwak1", "aff": "Seoul National University;;Seoul National University", "aff_domain": "snu.ac.kr;;snu.ac.kr", "position": "PhD student;;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=MqWHrrIfBMN", "pdf_size": 0, "rating": "3;4;8", "confidence": "4;4;1", "wc_review": "252;347;609", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 5.0, 2.160246899469287 ], "confidence_avg": [ 3.0, 1.4142135623730951 ], "wc_review_avg": [ 402.6666666666667, 150.9665157870741 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9819805060619657, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6178968814736375770&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Seoul National University", "aff_unique_dep": "", "aff_unique_url": "https://www.snu.ac.kr", "aff_unique_abbr": "SNU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "id": "Ms51cV-vqFY", "title": "Meta-Semi: A Meta-learning Approach for Semi-supervised Learning", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Deep learning based semi-supervised learning (SSL) algorithms have led to promising results in recent years. However, they tend to introduce multiple tunable hyper-parameters, making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search. In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL. We start by de\ufb01ning a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples, which are associated with soft pseudo labels during training. As the meta problem is computationally intensive to solve directly, we propose an ef\ufb01cient algorithm to dynamically obtain the approximate solutions. We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions. Empirically, Meta-Semi outperforms state-of-the-art SSL algorithms signi\ufb01cantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks, and achieves competitive performance on CIFAR-10 and SVHN.", "keywords": "Semi-supervised learning;Meta-learning", "primary_area": "", "supplementary_material": "", "author": "Yulin Wang;Jiayi Guo;Shiji Song;Gao Huang", "authorids": "~Yulin_Wang1;~Jiayi_Guo2;~Shiji_Song1;~Gao_Huang1", "gender": "M;M;M;M", "homepage": "https://www.wyl.cool/;https://jiayiguo821.github.io/;;http://www.gaohuang.net", "dblp": ";;72/5351;", "google_scholar": "gBP38gcAAAAJ;2p6GCEEAAAAJ;;-P9LwcgAAAAJ", "orcid": "0000-0002-1363-0234;;;", "linkedin": ";;;", "or_profile": "~Yulin_Wang1;~Jiayi_Guo2;~Shiji_Song1;~Gao_Huang1", "aff": "Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University", "aff_domain": "tsinghua.edu.cn;tsinghua.edu.cn;mail.tsinghua.edu.cn;tsinghua.edu.cn", "position": "PhD student;PhD student;Full Professor;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=Ms51cV-vqFY", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;3;3", "wc_review": "236;480;205", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 307.0, 122.98238356230809 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.9999999999999997, "gs_citation": 41, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7673416131796362836&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Tsinghua University", "aff_unique_dep": "", "aff_unique_url": "https://www.tsinghua.edu.cn", "aff_unique_abbr": "THU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "China" }, { "id": "Ms9zjhVB5R", "title": "SOAR: Second-Order Adversarial Regularization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Adversarial training is a common approach to improving the robustness of deep neural networks against adversarial examples. In this work, we propose a novel regularization approach as an alternative. To derive the regularizer, we formulate the adversarial robustness problem under the robust optimization framework and approximate the loss function using a second-order Taylor series expansion. Our proposed second-order adversarial regularizer (SOAR) is an upper bound based on the Taylor approximation of the inner-max in the robust optimization objective. We empirically show that the proposed method improves the robustness of networks against the $\\ell_\\infty$ and $\\ell_2$ bounded perturbations on CIFAR-10 and SVHN.", "keywords": "Adversarial Robustness", "primary_area": "", "supplementary_material": "/attachment/df1f4eb32fafaa9a025f36b7bcda62f3abdf67ff.zip", "author": "Avery Ma;Fartash Faghri;Nicolas Papernot;Amir-massoud Farahmand", "authorids": "~Avery_Ma1;~Fartash_Faghri1;~Nicolas_Papernot1;~Amir-massoud_Farahmand1", "gender": ";M;M;M", "homepage": "http://averyma.com/;;https://www.papernot.fr;http://academic.sologen.net/", "dblp": "201/5137;115/7922;162/1405;17/671", "google_scholar": "https://scholar.google.ca/citations?user=z6fDbkUAAAAJ;https://scholar.google.ca/citations?user=KUG_tG0AAAAJ;cGxq0cMAAAAJ;https://scholar.google.ca/citations?user=G5SAV7gAAAAJ", "orcid": ";;;", "linkedin": "avery-ma-326522158/;fartash-faghri;nicolaspapernot;amir-massoud-farahmand/", "or_profile": "~Avery_Ma1;~Fartash_Faghri1;~Nicolas_Papernot1;~Amir-massoud_Farahmand1", "aff": "University of Toronto;Department of Computer Science, University of Toronto;Google;Vector Institute", "aff_domain": "toronto.edu;cs.toronto.edu;google.com;vectorinstitute.ai", "position": "PhD student;PhD student;Research Scientist;Faculty Member", "bibtex": "@misc{\nma2021soar,\ntitle={{\\{}SOAR{\\}}: Second-Order Adversarial Regularization},\nauthor={Avery Ma and Fartash Faghri and Nicolas Papernot and Amir-massoud Farahmand},\nyear={2021},\nurl={https://openreview.net/forum?id=Ms9zjhVB5R}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=Ms9zjhVB5R", "pdf_size": 0, "rating": "4;7;7", "confidence": "5;3;4", "wc_review": "685;403;276", "wc_reply_reviewers": "499;0;0", "wc_reply_authors": "1633;220;242", "reply_reviewers": "2;0;0", "reply_authors": "3;1;1", "rating_avg": [ 6.0, 1.4142135623730951 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 454.6666666666667, 170.9236346702494 ], "wc_reply_reviewers_avg": [ 166.33333333333334, 235.2308558747248 ], "wc_reply_authors_avg": [ 698.3333333333334, 660.9701623791769 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8660254037844387, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16240042915066024279&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "University of Toronto;Google;Vector Institute", "aff_unique_dep": ";Google;", "aff_unique_url": "https://www.utoronto.ca;https://www.google.com;https://vectorinstitute.ai/", "aff_unique_abbr": "U of T;Google;Vector Institute", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Toronto;Mountain View", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "Canada;United States" }, { "title": "Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2671", "id": "MtEE0CktZht", "poster": "", "openreview": "https://openreview.net/forum?id=MtEE0CktZht", "slides": "https://iclr.cc/virtual/2021/poster/2671", "video": "https://iclr.cc/virtual/2021/poster/2671", "author_site": "Daochen Zha, Wenye Ma, Lei Yuan, Xia Hu, Ji Liu", "tldr": "", "abstract": "Exploration under sparse reward is a long-standing challenge of model-free reinforcement learning. The state-of-the-art methods address this challenge by introducing intrinsic rewards to encourage exploration in novel states or uncertain environment dynamics. Unfortunately, methods based on intrinsic rewards often fall short in procedurally-generated environments, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once. Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, we introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments. RAPID regards each episode as a whole and gives an episodic exploration score from both per-episode and long-term views. Those highly scored episodes are treated as good exploration behaviors and are stored in a small ranking buffer. The agent then imitates the episodes in the buffer to reproduce the past good exploration behaviors. We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks. The results show that RAPID significantly outperforms the state-of-the-art intrinsic reward strategies in terms of sample efficiency and final performance. The code is available at https://github.com/daochenzha/rapid", "keywords": "Reinforcement Learning;Exploration;Generalization of Reinforcement Learning;Self-Imitation", "primary_area": "", "supplementary_material": "/attachment/6d0ea78b3fe5a3831f2f01e96289d1b298d93190.zip", "author": "Daochen Zha;Wenye Ma;Lei Yuan;Xia Hu;Ji Liu", "authorids": "~Daochen_Zha1;mawenye@gmail.com;~Lei_Yuan1;~Xia_Hu4;~Ji_Liu1", "gender": ";;M;;M", "homepage": "http://dczha.com/;;;;http://jiliu-ml.org", "dblp": "167/0903;;;;51/4433-2.html", "google_scholar": "jK0NgMcAAAAJ;;WO80oyAAAAAJ;;RRzVwKkAAAAJ", "orcid": "0000-0002-6677-7504;;0009-0003-9126-3577;;", "linkedin": "daochen-zha;;lei-yuan-7b490b19/;;", "or_profile": "~Daochen_Zha1;mawenye@gmail.com;~Lei_Yuan1;~Xia_Hu4;~Ji_Liu1", "aff": "Texas A&M;;Kuaishou Technology;;", "aff_domain": "tamu.edu;;kuaishou.com;;", "position": "PhD student;;Principal Researcher;;", "bibtex": "@inproceedings{\nzha2021rank,\ntitle={Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments},\nauthor={Daochen Zha and Wenye Ma and Lei Yuan and Xia Hu and Ji Liu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MtEE0CktZht}\n}", "github": "[![github](/images/github_icon.svg) maximecb/gym-miniworld](https://github.com/maximecb/gym-miniworld) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=MtEE0CktZht)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "3;3;4;3", "wc_review": "456;602;443;629", "wc_reply_reviewers": "36;49;39;60", "wc_reply_authors": "761;465;347;1106", "reply_reviewers": "1;1;1;1", "reply_authors": "1;1;1;2", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 532.5, 83.67347249875554 ], "wc_reply_reviewers_avg": [ 46.0, 9.40744386111339 ], "wc_reply_authors_avg": [ 669.75, 293.5688803330489 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 61, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17489085045316603336&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=MtEE0CktZht", "email": "tamu.edu;;kuaishou.com;;", "author_num": 5, "aff_unique_index": "0;1", "aff_unique_norm": "Texas A&M University;Kuaishou Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.tamu.edu;https://www.kuaishou.com", "aff_unique_abbr": "TAMU;Kuaishou", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;China" }, { "title": "Uncertainty-aware Active Learning for Optimal Bayesian Classifier", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3026", "id": "Mu2ZxFctAI", "poster": "", "openreview": "https://openreview.net/forum?id=Mu2ZxFctAI", "slides": "https://iclr.cc/virtual/2021/poster/3026", "video": "https://iclr.cc/virtual/2021/poster/3026", "author_site": "Guang Zhao, Edward Dougherty, Byung-Jun Yoon, Francis Alexander, Xiaoning Qian", "tldr": "", "abstract": "For pool-based active learning, in each iteration a candidate training sample is chosen for labeling by optimizing an acquisition function. In Bayesian classification, expected Loss Reduction~(ELR) methods maximize the expected reduction in the classification error given a new labeled candidate based on a one-step-look-ahead strategy. ELR is the optimal strategy with a single query; however, since such myopic strategies cannot identify the long-term effect of a query on the classification error, ELR may get stuck before reaching the optimal classifier. In this paper, inspired by the mean objective cost of uncertainty (MOCU), a metric quantifying the uncertainty directly affecting the classification error, we propose an acquisition function based on a weighted form of MOCU. Similar to ELR, the proposed method focuses on the reduction of the uncertainty that pertains to the classification error. But unlike any other existing scheme, it provides the critical advantage that the resulting Bayesian active learning algorithm guarantees convergence to the optimal classifier of the true model. We demonstrate its performance with both synthetic and real-world datasets.", "keywords": "Active learning;Bayesian classification", "primary_area": "", "supplementary_material": "/attachment/d61b4f976f35501c50b155575304bdc939a4f7bb.zip", "author": "Guang Zhao;Edward Dougherty;Byung-Jun Yoon;Francis Alexander;Xiaoning Qian", "authorids": "~Guang_Zhao1;~Edward_Dougherty1;~Byung-Jun_Yoon1;falexander@bnl.gov;~Xiaoning_Qian2", "gender": "M;;M;;M", "homepage": ";https://engineering.tamu.edu/electrical/profiles/edougherty.html;https://BioMLSP.com;;https://www.ece.tamu.edu/~xqian", "dblp": ";;14/1887;;62/4504", "google_scholar": ";;KxPLjXkAAAAJ;;dXGlddgAAAAJ", "orcid": "0000-0001-6843-8242;;0000-0001-9328-1101;;0000-0002-4347-2476", "linkedin": ";;;;", "or_profile": "~Guang_Zhao1;~Edward_Dougherty1;~Byung-Jun_Yoon1;falexander@bnl.gov;~Xiaoning_Qian2", "aff": "Texas A&M;Texas A&M University - College Station;Texas A&M University - College Station;;", "aff_domain": "tamu.edu;tamu.edu;tamu.edu;;", "position": "PhD student;Full Professor;Associate Professor;;", "bibtex": "@inproceedings{\nzhao2021uncertaintyaware,\ntitle={Uncertainty-aware Active Learning for Optimal Bayesian Classifier},\nauthor={Guang Zhao and Edward Dougherty and Byung-Jun Yoon and Francis Alexander and Xiaoning Qian},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Mu2ZxFctAI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "2;4;4;3", "wc_review": "487;506;668;470", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "382;459;1025;350", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 532.75, 79.11818691046957 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 554.0, 274.8026564645983 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.4264014327112209, "gs_citation": 51, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5814008961326782307&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=Mu2ZxFctAI", "email": "tamu.edu;tamu.edu;tamu.edu;;", "author_num": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "Texas A&M University", "aff_unique_dep": "", "aff_unique_url": "https://www.tamu.edu", "aff_unique_abbr": "TAMU", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";College Station", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Geometry-Aware Gradient Algorithms for Neural Architecture Search", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2618", "id": "MuSYkd1hxRP", "poster": "", "openreview": "https://openreview.net/forum?id=MuSYkd1hxRP", "slides": "https://iclr.cc/virtual/2021/poster/2618", "video": "https://iclr.cc/virtual/2021/poster/2618", "author_site": "Liam Li, Mikhail Khodak, Nina Balcan, Ameet Talwalkar", "tldr": "", "abstract": "Recent state-of-the-art methods for neural architecture search (NAS) exploit gradient-based optimization by relaxing the problem into continuous optimization over architectures and shared-weights, a noisy process that remains poorly understood. We argue for the study of single-level empirical risk minimization to understand NAS with weight-sharing, reducing the design of NAS methods to devising optimizers and regularizers that can quickly obtain high-quality solutions to this problem. Invoking the theory of mirror descent, we present a geometry-aware framework that exploits the underlying structure of this optimization to return sparse architectural parameters, leading to simple yet novel algorithms that enjoy fast convergence guarantees and achieve state-of-the-art accuracy on the latest NAS benchmarks in computer vision. Notably, we exceed the best published results for both CIFAR and ImageNet on both the DARTS search space and NAS-Bench-201; on the latter we achieve near-oracle-optimal performance on CIFAR-10 and CIFAR-100. Together, our theory and experiments demonstrate a principled way to co-design optimizers and continuous relaxations of discrete NAS search spaces.", "keywords": "neural architecture search;automated machine learning;weight-sharing;optimization", "primary_area": "", "supplementary_material": "/attachment/40d1293e30dfebc4910bedf126623e6a574d2638.zip", "author": "Liam Li;Mikhail Khodak;Nina Balcan;Ameet Talwalkar", "authorids": "~Liam_Li1;~Mikhail_Khodak1;~Nina_Balcan1;~Ameet_Talwalkar1", "gender": ";;F;M", "homepage": ";;http://www.cs.cmu.edu/~ninamf/;http://www.cs.cmu.edu/~atalwalk/", "dblp": "23/2305;;b/MariaFlorinaBalcan;56/5528", "google_scholar": "xPSkgtIAAAAJ;;https://scholar.google.com.tw/citations?user=LWlN_BUAAAAJ;https://scholar.google.com.tw/citations?user=TW7U1W0AAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Liam_Li1;~Mikhail_Khodak1;~Nina_Balcan1;~Ameet_Talwalkar1", "aff": ";;Carnegie Mellon University;Carnegie Mellon University", "aff_domain": ";;cmu.edu;cmu.edu", "position": ";;Full Professor;Associate Professor", "bibtex": "@inproceedings{\nli2021geometryaware,\ntitle={Geometry-Aware Gradient Algorithms for Neural Architecture Search},\nauthor={Liam Li and Mikhail Khodak and Nina Balcan and Ameet Talwalkar},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MuSYkd1hxRP}\n}", "github": "[![github](/images/github_icon.svg) liamcli/gaea_release](https://github.com/liamcli/gaea_release)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;7;8", "confidence": "4;3;4", "wc_review": "306;1415;241", "wc_reply_reviewers": "167;196;0", "wc_reply_authors": "777;1406;19", "reply_reviewers": "1;1;0", "reply_authors": "2;2;1", "rating_avg": [ 7.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 654.0, 538.7621615023337 ], "wc_reply_reviewers_avg": [ 121.0, 86.3751507475771 ], "wc_reply_authors_avg": [ 734.0, 567.0561406656899 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 89, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1063083377324377559&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=MuSYkd1hxRP", "email": ";;cmu.edu;cmu.edu", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "Mub9VkGZoZe", "title": "Identifying Informative Latent Variables Learned by GIN via Mutual Information", "track": "main", "status": "Reject", "tldr": "", "abstract": "How to learn a good representation of data is one of the most important topics of machine learning. Disentanglement of representations, though believed to be the core feature of good representations, has caused a lot of debates and discussions in recent. Sorrenson et al. (2020), using the techniques developed in nonlinear independent analysis theory, show that general incompressible-flow networks (GIN) can recover the underlying latent variables that generate the data, and thus can provide a compact and disentangled representation. However, in this paper, we point out that the method taken by GIN for informative latent variables identification is not theoretically supported and can be disproved by experiments. We propose to use the mutual information between latent variables and the auxiliary variable to correctly identify informative latent variables. We directly verify the improvement brought by our method in experiments on synthetic data. We further show the advantage of our method on various downstream tasks including classification, outlier detection and adversarial attack defence.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Chen Zhang;Yitong Sun;Mingtian Zhang", "authorids": "~Chen_Zhang6;~Yitong_Sun1;~Mingtian_Zhang1", "gender": "M;M;M", "homepage": "http://chenz.xyz;;http://tomo.wiki", "dblp": ";26/9557;230/8340", "google_scholar": "https://scholar.google.co.uk/citations?hl=en;LIfV_DIAAAAJ;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Chen_Zhang6;~Yitong_Sun1;~Mingtian_Zhang1", "aff": "Huawei Technologies R&D (UK) Ltd.;;University College London", "aff_domain": "huawei.com;;ucl.ac.uk", "position": "Research Scientist;;PhD student", "bibtex": "@misc{\nzhang2021identifying,\ntitle={Identifying Informative Latent Variables Learned by {\\{}GIN{\\}} via Mutual Information},\nauthor={Chen Zhang and Yitong Sun and Mingtian Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=Mub9VkGZoZe}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer5", "site": "https://openreview.net/forum?id=Mub9VkGZoZe", "pdf_size": 0, "rating": "4;5;5;6;6", "confidence": "3;3;3;4;2", "wc_review": "357;692;316;373;703", "wc_reply_reviewers": "0;81;0;0;0", "wc_reply_authors": "161;729;185;599;394", "reply_reviewers": "0;1;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.2, 0.7483314773547882 ], "confidence_avg": [ 3.0, 0.6324555320336759 ], "wc_review_avg": [ 488.2, 171.93649990621537 ], "wc_reply_reviewers_avg": [ 16.2, 32.4 ], "wc_reply_authors_avg": [ 413.6, 223.7405640468442 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Ybt70RqTe6oJ:scholar.google.com/&scioq=Identifying+Informative+Latent+Variables+Learned+by+GIN+via+Mutual+Information&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Huawei;University College London", "aff_unique_dep": "R&D;", "aff_unique_url": "https://www.huawei.com/uk;https://www.ucl.ac.uk", "aff_unique_abbr": "Huawei;UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "Mwuc0Plt_x2", "title": "RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior", "track": "main", "status": "Reject", "tldr": "", "abstract": "Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key idea of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, called RG-Flow, which can separate different scale information of images with disentangle representations at each scale. We demonstrate our method mainly on the CelebA dataset and show that the disentangled representation at different scales enables semantic manipulation and style mixing of the images. To visualize the latent representation, we introduce the receptive fields for flow-based models and find receptive fields learned by RG-Flow are similar to convolutional neural networks. In addition, we replace the widely adopted Gaussian prior distribution by sparse prior distributions to further enhance the disentanglement of representations. From a theoretical perspective, the proposed method has $O(\\log L)$ complexity for image inpainting compared to previous flow-based models with $O(L^2)$ complexity.", "keywords": "Unsupervised learning;representation learning;flow-based generative model;renormalization group;sparse encoding", "primary_area": "", "supplementary_material": "/attachment/0d3f3320b5073eed18e2d0597ed4c544ac020f71.zip", "author": "Hong-Ye Hu;Dian Wu;Yi-Zhuang You;Bruno Olshausen;Yubei Chen", "authorids": "~Hong-Ye_Hu1;wdphy16@pku.edu.cn;yzyou@physics.ucsd.edu;~Bruno_Olshausen1;~Yubei_Chen1", "gender": ";;;M;M", "homepage": "https://www.hongyehu.com/;;;http://redwood.berkeley.edu/bruno/;https://redwood.berkeley.edu/people/yubei-chen/", "dblp": ";;;30/3869;30/10064", "google_scholar": "Hqf3jOUAAAAJ;;;4aqK_74AAAAJ;WeyLqFUAAAAJ", "orcid": "0000-0001-5841-831X;;;;", "linkedin": ";;;;yubei-chen-05998a39/", "or_profile": "~Hong-Ye_Hu1;wdphy16@pku.edu.cn;yzyou@physics.ucsd.edu;~Bruno_Olshausen1;~Yubei_Chen1", "aff": "University of California, San Diego;;;UC Berkeley;Facebook AI Research", "aff_domain": "ucsd.edu;;;;facebook.com", "position": "PhD student;;;Full Professor;Postdoc Researcher", "bibtex": "@misc{\nhu2021rgflow,\ntitle={{\\{}RG{\\}}-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior},\nauthor={Hong-Ye Hu and Dian Wu and Yi-Zhuang You and Bruno Olshausen and Yubei Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=Mwuc0Plt_x2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=Mwuc0Plt_x2", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;4;3;4", "wc_review": "317;468;394;395", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "743;801;788;746", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 393.5, 53.3970972993851 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 769.5, 25.441108466417102 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17152798882486554858&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 11, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of California, San Diego;University of California, Berkeley;Meta", "aff_unique_dep": ";;Facebook AI Research", "aff_unique_url": "https://www.ucsd.edu;https://www.berkeley.edu;https://research.facebook.com", "aff_unique_abbr": "UCSD;UC Berkeley;FAIR", "aff_campus_unique_index": "0;1", "aff_campus_unique": "San Diego;Berkeley;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "MwxaStJXK6v", "title": "Double Q-learning: New Analysis and Sharper Finite-time Bound", "track": "main", "status": "Reject", "tldr": "", "abstract": "Double Q-learning \\citep{hasselt2010double} has gained significant success in practice due to its effectiveness in overcoming the overestimation issue of Q-learning. However, theoretical understanding of double Q-learning is rather limited and the only existing finite-time analysis was recently established in \\citet{xiong2020double} under a polynomial learning rate. This paper analyzes the more challenging case with a rescaled linear/constant learning rate for which the previous method does not appear to be applicable. We develop new analytical tools that achieve an order-level better finite-time convergence rate than the previously established result. Specifically, we show that synchronous double Q-learning attains an $\\epsilon$-accurate global optimum with a time complexity of $\\Omega\\left(\\frac{\\ln D}{(1-\\gamma)^7\\epsilon^2} \\right)$, and the asynchronous algorithm attains a time complexity of $\\tilde{\\Omega}\\left(\\frac{L}{(1-\\gamma)^7\\epsilon^2} \\right)$, where $D$ is the cardinality of the state-action space, $\\gamma$ is the discount factor, and $L$ is a parameter related to the sampling strategy for asynchronous double Q-learning. These results improve the order-level dependence of the convergence rate on all major parameters $(\\epsilon,1-\\gamma, D, L)$ provided in \\citet{xiong2020double}. The new analysis in this paper presents a more direct and succinct approach for characterizing the finite-time convergence rate of double Q-learning.", "keywords": "Double Q-learning;Finite-time analysis;Convergence rate;Stochastic approximation", "primary_area": "", "supplementary_material": "/attachment/4f7087edc0181abe6bf4b7db46028b6e654d9ee4.zip", "author": "Lin Zhao;Huaqing Xiong;Yingbin Liang;Wei Zhang", "authorids": "~Lin_Zhao3;~Huaqing_Xiong1;~Yingbin_Liang1;~Wei_Zhang40", "gender": "M;M;F;M", "homepage": "https://sites.google.com/view/lzhao;;https://sites.google.com/view/yingbinliang/home;https://www.wzhanglab.site/", "dblp": ";;51/332;", "google_scholar": "091lFhYAAAAJ;;lGgLAiIAAAAJ;HQ6j-KsAAAAJ", "orcid": "0000-0002-1078-887X;;;", "linkedin": ";;;", "or_profile": "~Lin_Zhao3;~Huaqing_Xiong1;~Yingbin_Liang1;~Wei_Zhang40", "aff": "National University of Singapore;Ohio State University;The Ohio State University;Southern University of Science and Technology of China", "aff_domain": "nus.edu.sg;osu.edu;osu.edu;sustech.edu.cn", "position": "Assistant Professor;PhD student;Professor;Professor", "bibtex": "@misc{\nzhao2021double,\ntitle={Double Q-learning: New Analysis and Sharper Finite-time Bound},\nauthor={Lin Zhao and Huaqing Xiong and Yingbin Liang and Wei Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=MwxaStJXK6v}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=MwxaStJXK6v", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;3;3;3", "wc_review": "699;303;582;231", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "739;225;702;202", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 453.75, 192.96291742197513 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 467.0, 253.96751760805947 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:pw555JrMHFMJ:scholar.google.com/&scioq=Double+Q-learning:+New+Analysis+and+Sharper+Finite-time+Bound&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "National University of Singapore;Ohio State University;Southern University of Science and Technology", "aff_unique_dep": ";;", "aff_unique_url": "https://www.nus.edu.sg;https://www.osu.edu;https://www.sustech.edu.cn", "aff_unique_abbr": "NUS;OSU;SUSTech", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;2", "aff_country_unique": "Singapore;United States;China" }, { "title": "High-Capacity Expert Binary Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2953", "id": "MxaY4FzOTa", "poster": "", "openreview": "https://openreview.net/forum?id=MxaY4FzOTa", "slides": "https://iclr.cc/virtual/2021/poster/2953", "video": "https://iclr.cc/virtual/2021/poster/2953", "author_site": "Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos", "tldr": "", "abstract": "Network binarization is a promising hardware-aware direction for creating efficient deep models. Despite its memory and computational advantages, reducing the accuracy gap between binary models and their real-valued counterparts remains an unsolved challenging research problem. To this end, we make the following 3 contributions: (a) To increase model capacity, we propose Expert Binary Convolution, which, for the first time, tailors conditional computing to binary networks by learning to select one data-specific expert binary filter at a time conditioned on input features. (b) To increase representation capacity, we propose to address the inherent information bottleneck in binary networks by introducing an efficient width expansion mechanism which keeps the binary operations within the same budget. (c) To improve network design, we propose a principled binary network growth mechanism that unveils a set of network topologies of favorable properties. Overall, our method improves upon prior work, with no increase in computational cost, by $\\sim6 \\%$, reaching a groundbreaking $\\sim 71\\%$ on ImageNet classification. Code will be made available $\\href{https://www.adrianbulat.com/binary-networks}{here}$.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Adrian Bulat;Brais Martinez;Georgios Tzimiropoulos", "authorids": "~Adrian_Bulat1;~Brais_Martinez3;~Georgios_Tzimiropoulos1", "gender": ";M;M", "homepage": "https://www.adrianbulat.com;http://www.braismartinez.org/;https://ytzimiro.github.io/", "dblp": "185/6878;14/111;03/3273", "google_scholar": "https://scholar.google.co.uk/citations?user=5sKcsg0AAAAJ;https://scholar.google.co.uk/citations?user=-62MApgAAAAJ;https://scholar.google.co.uk/citations?user=D4JkWxf-8fwC", "orcid": "0000-0002-3185-4979;;", "linkedin": ";;", "or_profile": "~Adrian_Bulat1;~Brais_Martinez3;~Georgios_Tzimiropoulos1", "aff": "Samsung AI Center Cambridge;Samsung;Queen Mary University London", "aff_domain": "samsung.com;samsung.com;qmul.ac.uk", "position": "Research Scientist;Samsung AI Center;Associate Professor", "bibtex": "@inproceedings{\nbulat2021highcapacity,\ntitle={High-Capacity Expert Binary Networks},\nauthor={Adrian Bulat and Brais Martinez and Georgios Tzimiropoulos},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MxaY4FzOTa}\n}", "github": "[![github](/images/github_icon.svg) 1adrianb/expert-binary-networks](https://github.com/1adrianb/expert-binary-networks)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "5;2;4;3", "wc_review": "508;465;432;130", "wc_reply_reviewers": "569;0;0;0", "wc_reply_authors": "3352;1345;828;32", "reply_reviewers": "3;0;0;0", "reply_authors": "6;2;2;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 383.75, 148.96035546413012 ], "wc_reply_reviewers_avg": [ 142.25, 246.38422737667278 ], "wc_reply_authors_avg": [ 1389.25, 1225.9154487565609 ], "reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "reply_authors_avg": [ 2.75, 1.920286436967152 ], "replies_avg": [ 25, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.39999999999999997, "gs_citation": 73, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12665350713453943927&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=MxaY4FzOTa", "email": "samsung.com;samsung.com;qmul.ac.uk", "author_num": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Samsung;Queen Mary University of London", "aff_unique_dep": "AI Center;", "aff_unique_url": "https://www.samsung.com/global/innovation/ai-research/;https://www.qmul.ac.uk", "aff_unique_abbr": "SAC;QMUL", "aff_campus_unique_index": "0;2", "aff_campus_unique": "Cambridge;;London", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United Kingdom;South Korea" }, { "title": "Beyond Categorical Label Representations for Image Classification", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3226", "id": "MyHwDabUHZm", "poster": "", "openreview": "https://openreview.net/forum?id=MyHwDabUHZm", "slides": "https://iclr.cc/virtual/2021/poster/3226", "video": "https://iclr.cc/virtual/2021/poster/3226", "author_site": "Boyuan Chen, Yu Li, Sunand Raghupathi, Hod Lipson", "tldr": "", "abstract": "We find that the way we choose to represent data labels can have a profound effect on the quality of trained models. For example, training an image classifier to regress audio labels rather than traditional categorical probabilities produces a more reliable classification. This result is surprising, considering that audio labels are more complex than simpler numerical probabilities or text. We hypothesize that high dimensional, high entropy label representations are generally more useful because they provide a stronger error signal. We support this hypothesis with evidence from various label representations including constant matrices, spectrograms, shuffled spectrograms, Gaussian mixtures, and uniform random matrices of various dimensionalities. Our experiments reveal that high dimensional, high entropy labels achieve comparable accuracy to text (categorical) labels on standard image classification tasks, but features learned through our label representations exhibit more robustness under various adversarial attacks and better effectiveness with a limited amount of training data. These results suggest that label representation may play a more important role than previously thought.", "keywords": "Label Representation;Image Classification;Representation Learning", "primary_area": "", "supplementary_material": "", "author": "Boyuan Chen;Yu Li;Sunand Raghupathi;Hod Lipson", "authorids": "~Boyuan_Chen1;yl4019@columbia.edu;sr3587@columbia.edu;~Hod_Lipson1", "gender": "Not Specified;;;M", "homepage": "http://boyuanchen.com/;;;https://www.hodlipson.com/", "dblp": "193/7174-1;;;l/HodLipson", "google_scholar": "5DBpY6EAAAAJ;;;https://scholar.google.com/citations?hl=en", "orcid": ";;;0000-0003-0769-4618", "linkedin": "boyuan-chen-b30854a0/;;;hod-lipson-4018189/", "or_profile": "~Boyuan_Chen1;yl4019@columbia.edu;sr3587@columbia.edu;~Hod_Lipson1", "aff": "Columbia University;;;Columbia University", "aff_domain": "cs.columbia.edu;;;columbia.edu", "position": "PhD student;;;Full Professor", "bibtex": "@inproceedings{\nchen2021beyond,\ntitle={Beyond Categorical Label Representations for Image Classification},\nauthor={Boyuan Chen and Yu Li and Sunand Raghupathi and Hod Lipson},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=MyHwDabUHZm}\n}", "github": "[![github](/images/github_icon.svg) BoyuanChen/label_representations](https://github.com/BoyuanChen/label_representations)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "4;7;7;7", "confidence": "4;3;4;4", "wc_review": "225;155;483;382", "wc_reply_reviewers": "0;20;0;0", "wc_reply_authors": "627;1030;1780;2130", "reply_reviewers": "0;1;0;0", "reply_authors": "2;4;5;5", "rating_avg": [ 6.25, 1.299038105676658 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 311.25, 128.79901979440683 ], "wc_reply_reviewers_avg": [ 5.0, 8.660254037844387 ], "wc_reply_authors_avg": [ 1391.75, 594.0237263779958 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 4.0, 1.224744871391589 ], "replies_avg": [ 23, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6100870767960512656&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=MyHwDabUHZm", "email": "cs.columbia.edu;;;columbia.edu", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Columbia University", "aff_unique_dep": "", "aff_unique_url": "https://www.columbia.edu", "aff_unique_abbr": "Columbia", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "N07ebsD-lHp", "title": "Defending against black-box adversarial attacks with gradient-free trained sign activation neural networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "While machine learning models today can achieve high accuracies on classification tasks, they can be deceived by minor imperceptible distortions to the data. These are known as adversarial attacks and can be lethal in the black-box setting which does not require knowledge of the target model type or its parameters. Binary neural networks that have sign activation and are trained with gradient descent have been shown to be harder to attack than conventional sigmoid activation networks but their improvements are marginal. We instead train sign activation networks with a novel gradient-free stochastic coordinate descent algorithm and propose an ensemble of such networks as a defense model. We evaluate the robustness of our model (a hard problem in itself) on image, text, and medical ECG data and find it to be more robust than ensembles of binary, full precision, and convolutional neural networks, and than random forests while attaining comparable clean test accuracy. In order to explain our model's robustness we show that an adversary targeting a single network in our ensemble fails to attack (and thus non-transferable to) other networks in the ensemble. Thus a datapoint requires a large distortion to fool the majority of networks in our ensemble and is likely to be detected in advance. This property of non-transferability arises naturally from the non-convexity of sign activation networks and randomization in our gradient-free training algorithm without any adversarial defense effort.", "keywords": "sign activation neural network;gradient-free training;stochastic coordinate descent;black box adversarial attack;hopskipjump;transferability;image distortion", "primary_area": "", "supplementary_material": "/attachment/2c05ec7265095bc3c9da11d5dd9803409e9ce6df.zip", "author": "Yunzhe Xue;Meiyan Xie;Zhibo Yang;Usman Roshan", "authorids": "yx277@njit.edu;mx42@njit.edu;zy328@njit.edu;~Usman_Roshan1", "gender": ";;;M", "homepage": ";;;https://web.njit.edu/~usman/", "dblp": ";;;r/UsmanRoshan", "google_scholar": ";;;https://scholar.google.com.tw/citations?user=8GkpkAIAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "yx277@njit.edu;mx42@njit.edu;zy328@njit.edu;~Usman_Roshan1", "aff": ";;;New Jersey Institute of Technology", "aff_domain": ";;;njit.edu", "position": ";;;Associate Professor", "bibtex": "@misc{\nxue2021defending,\ntitle={Defending against black-box adversarial attacks with gradient-free trained sign activation neural networks},\nauthor={Yunzhe Xue and Meiyan Xie and Zhibo Yang and Usman Roshan},\nyear={2021},\nurl={https://openreview.net/forum?id=N07ebsD-lHp}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=N07ebsD-lHp", "pdf_size": 0, "rating": "3;4;5", "confidence": "4;4;4", "wc_review": "321;288;454", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "157;341;234", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.0, 0.816496580927726 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 354.3333333333333, 71.75111303821163 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 244.0, 75.44976253552205 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:dY78J_8g0fEJ:scholar.google.com/&scioq=Defending+against+black-box+adversarial+attacks+with+gradient-free+trained+sign+activation+neural+networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "New Jersey Institute of Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.njit.edu", "aff_unique_abbr": "NJIT", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Selective Classification Can Magnify Disparities Across Groups", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3060", "id": "N0M_4BkQ05i", "poster": "", "openreview": "https://openreview.net/forum?id=N0M_4BkQ05i", "slides": "https://iclr.cc/virtual/2021/poster/3060", "video": "https://iclr.cc/virtual/2021/poster/3060", "author_site": "Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang", "tldr": "", "abstract": "Selective classification, in which models can abstain on uncertain predictions, is a natural approach to improving accuracy in settings where errors are costly but abstentions are manageable. In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations. We observe this behavior consistently across five vision and NLP datasets. Surprisingly, increasing abstentions can even decrease accuracies on some groups. To better understand this phenomenon, we study the margin distribution, which captures the model\u2019s confidences over all predictions. For symmetric margin distributions, we prove that whether selective classification monotonically improves or worsens accuracy is fully determined by the accuracy at full coverage (i.e., without any abstentions) and whether the distribution satisfies a property we call left-log-concavity. Our analysis also shows that selective classification tends to magnify full-coverage accuracy disparities. Motivated by our analysis, we train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group on these models. Altogether, our results suggest that selective classification should be used with care and underscore the importance of training models to perform equally well across groups at full coverage.", "keywords": "selective classification;group disparities;log-concavity;robustness", "primary_area": "", "supplementary_material": "", "author": "Erik Jones;Shiori Sagawa;Pang Wei Koh;Ananya Kumar;Percy Liang", "authorids": "~Erik_Jones3;~Shiori_Sagawa1;~Pang_Wei_Koh1;~Ananya_Kumar1;~Percy_Liang1", "gender": "M;Unspecified;M;M;", "homepage": "http://people.eecs.berkeley.edu/~erjones/;https://cs.stanford.edu/~ssagawa/;http://cs.stanford.edu/~pangwei;https://ananyakumar.wordpress.com/;https://cs.stanford.edu/~pliang/", "dblp": "264/5304;248/7578;10/10453;192/0474;04/1701", "google_scholar": "_-CU2CsAAAAJ;9EnJFEEAAAAJ;Nn990CkAAAAJ;tP5IBFkAAAAJ;pouyVyUAAAAJ", "orcid": ";;;;", "linkedin": "erik-jones-879239133/;;;;", "or_profile": "~Erik_Jones3;~Shiori_Sagawa1;~Pang_Wei_Koh1;~Ananya_Kumar1;~Percy_Liang1", "aff": "University of California, Berkeley;Stanford University;Stanford University;Stanford University;Stanford University", "aff_domain": "berkeley.edu;stanford.edu;stanford.edu;stanford.edu;stanford.edu", "position": "PhD student;PhD student;PhD student;PhD student;Associate Professor", "bibtex": "@inproceedings{\njones2021selective,\ntitle={Selective Classification Can Magnify Disparities Across Groups},\nauthor={Erik Jones and Shiori Sagawa and Pang Wei Koh and Ananya Kumar and Percy Liang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=N0M_4BkQ05i}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;7;7;8", "confidence": "3;4;3;4", "wc_review": "237;559;290;494", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "701;1073;398;413", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.75, 1.0897247358851685 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 395.0, 134.80170622065583 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 646.25, 274.38419688458737 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.6882472016116854, "gs_citation": 62, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2080495898610235130&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=N0M_4BkQ05i", "email": "berkeley.edu;stanford.edu;stanford.edu;stanford.edu;stanford.edu", "author_num": 5, "aff_unique_index": "0;1;1;1;1", "aff_unique_norm": "University of California, Berkeley;Stanford University", "aff_unique_dep": ";", "aff_unique_url": "https://www.berkeley.edu;https://www.stanford.edu", "aff_unique_abbr": "UC Berkeley;Stanford", "aff_campus_unique_index": "0;1;1;1;1", "aff_campus_unique": "Berkeley;Stanford", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3167", "id": "N33d7wjgzde", "poster": "", "openreview": "https://openreview.net/forum?id=N33d7wjgzde", "slides": "https://iclr.cc/virtual/2021/poster/3167", "video": "https://iclr.cc/virtual/2021/poster/3167", "author_site": "Tsung-Wei Ke, Jyh-Jing Hwang, Stella Yu", "tldr": "", "abstract": "Weakly supervised segmentation requires assigning a label to every pixel based on training instances with partial annotations such as image-level tags, object bounding boxes, labeled points and scribbles. This task is challenging, as coarse annotations (tags, boxes) lack precise pixel localization whereas sparse annotations (points, scribbles) lack broad region coverage. Existing methods tackle these two types of weak supervision differently: Class activation maps are used to localize coarse labels and iteratively refine the segmentation model, whereas conditional random fields are used to propagate sparse labels to the entire image.\n\nWe formulate weakly supervised segmentation as a semi-supervised metric learning problem, where pixels of the same (different) semantics need to be mapped to the same (distinctive) features. We propose 4 types of contrastive relationships between pixels and segments in the feature space, capturing low-level image similarity, semantic annotation, co-occurrence, and feature affinity They act as priors; the pixel-wise feature can be learned from training images with any partial annotations in a data-driven fashion. In particular, unlabeled pixels in training images participate not only in data-driven grouping within each image, but also in discriminative feature learning within and across images. We deliver a universal weakly supervised segmenter with significant gains on Pascal VOC and DensePose. Our code is publicly available at https://github.com/twke18/SPML.", "keywords": "weakly supervised representation learning;representation learning for computer vision;metric learning;semantic segmentation", "primary_area": "", "supplementary_material": "", "author": "Tsung-Wei Ke;Jyh-Jing Hwang;Stella Yu", "authorids": "~Tsung-Wei_Ke2;~Jyh-Jing_Hwang1;~Stella_Yu2", "gender": ";M;F", "homepage": "https://twke18.github.io/;http://jyhjinghwang.github.io/;http://www.eecs.umich.edu/~stellayu", "dblp": "173/4984;156/0239;58/5089", "google_scholar": "WTEFsHMAAAAJ;ClTTUWkAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;", "linkedin": ";;", "or_profile": "~Tsung-Wei_Ke2;~Jyh-Jing_Hwang1;~Stella_Yu2", "aff": "University of California, Berkeley;Waymo;University of California, Berkeley", "aff_domain": "berkeley.edu;waymo.com;berkeley.edu", "position": "PhD student;Researcher;Director, ICSI Vision Group", "bibtex": "@inproceedings{\nke2021universal,\ntitle={Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning},\nauthor={Tsung-Wei Ke and Jyh-Jing Hwang and Stella Yu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=N33d7wjgzde}\n}", "github": "[![github](/images/github_icon.svg) twke18/SPML](https://github.com/twke18/SPML)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer5", "pdf_size": 0, "rating": "5;6;6;7;7", "confidence": "4;5;4;3;5", "wc_review": "208;525;260;311;226", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "237;630;232;331;290", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 6.2, 0.7483314773547882 ], "confidence_avg": [ 4.2, 0.7483314773547882 ], "wc_review_avg": [ 306.0, 114.98347707388223 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 344.0, 147.5628679580334 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.07142857142857145, "gs_citation": 95, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2575509645382870246&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "pdf": "https://openreview.net/pdf?id=N33d7wjgzde", "email": "berkeley.edu;waymo.com;berkeley.edu", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of California, Berkeley;Waymo", "aff_unique_dep": ";", "aff_unique_url": "https://www.berkeley.edu;https://www.waymo.com", "aff_unique_abbr": "UC Berkeley;Waymo", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Berkeley;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2942", "id": "N3zUDGN5lO", "poster": "", "openreview": "https://openreview.net/forum?id=N3zUDGN5lO", "slides": "https://iclr.cc/virtual/2021/poster/2942", "video": "https://iclr.cc/virtual/2021/poster/2942", "author_site": "Vitaly Kurin, Maximilian Igl, Tim Rocktaeschel, Wendelin Boehmer, Shimon Whiteson", "tldr": "", "abstract": "Multitask Reinforcement Learning is a promising way to obtain models with better performance, generalisation, data efficiency, and robustness. Most existing work is limited to compatible settings, where the state and action space dimensions are the same across tasks. Graph Neural Networks (GNN) are one way to address incompatible environments, because they can process graphs of arbitrary size. They also allow practitioners to inject biases encoded in the structure of the input graph. Existing work in graph-based continuous control uses the physical morphology of the agent to construct the input graph, i.e., encoding limb features as node labels and using edges to connect the nodes if their corresponded limbs are physically connected.\nIn this work, we present a series of ablations on existing methods that show that morphological information encoded in the graph does not improve their performance. Motivated by the hypothesis that any benefits GNNs extract from the graph structure are outweighed by difficulties they create for message passing, we also propose Amorpheus, a transformer-based approach. Further results show that, while Amorpheus ignores the morphological information that GNNs encode, it nonetheless substantially outperforms GNN-based methods.", "keywords": "Deep Reinforcement Learning;Multitask Reinforcement Learning;Graph Neural Networks;Continuous Control;Incompatible Environments", "primary_area": "", "supplementary_material": "", "author": "Vitaly Kurin;Maximilian Igl;Tim Rockt\u00e4schel;Wendelin Boehmer;Shimon Whiteson", "authorids": "~Vitaly_Kurin1;~Maximilian_Igl1;~Tim_Rockt\u00e4schel1;~Wendelin_Boehmer1;~Shimon_Whiteson1", "gender": "M;M;M;;M", "homepage": "https://yobibyte.github.io/;https://maximilianigl.com;https://reinforceAI.net;;http://rockt.ai", "dblp": "200/8275;207/8245.html;08/9988;https://dblp.uni-trier.de/pers/w/Whiteson:Shimon.html;43/11537", "google_scholar": "https://scholar.google.co.uk/citations?user=yk6C1SgAAAAJ;https://scholar.google.com/citations?hl=en;https://scholar.google.de/citations?user=wI5MV8IAAAAJ;;https://scholar.google.co.uk/citations?user=mWBY8aIAAAAJ", "orcid": ";;0000-0002-4398-6792;;", "linkedin": ";maximilian-igl-21116992/;wendelin-boehmer;;rockt/", "or_profile": "~Vitaly_Kurin1;~Maximilian_Igl1;~Wendelin_Boehmer1;~Shimon_Whiteson1;~Tim_Rocktaeschel1", "aff": "Microsoft;University of Oxford;Delft University of Technology;University of Oxford;Department of Computer Science, University College London", "aff_domain": "microsoft.com;oxford.ac.uk;tudelft.nl;ox.ac.uk;cs.ucl.ac.uk", "position": "Research Intern;PhD student;Assistant Professor;Professor;Assistant Professor", "bibtex": "@inproceedings{\nkurin2021my,\ntitle={My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control},\nauthor={Vitaly Kurin and Maximilian Igl and Tim Rockt{\\\"a}schel and Wendelin Boehmer and Shimon Whiteson},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=N3zUDGN5lO}\n}", "github": "[![github](/images/github_icon.svg) yobibyte/amorpheus](https://github.com/yobibyte/amorpheus)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "4;4;4;2", "wc_review": "338;368;506;247", "wc_reply_reviewers": "10;0;178;48", "wc_reply_authors": "478;432;1184;320", "reply_reviewers": "1;0;2;1", "reply_authors": "1;1;2;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 364.75, 92.92839985709428 ], "wc_reply_reviewers_avg": [ 59.0, 71.0 ], "wc_reply_authors_avg": [ 603.5, 340.04227678334354 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 95, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5888918560420712353&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 12, "pdf": "https://openreview.net/pdf?id=N3zUDGN5lO", "email": "microsoft.com;oxford.ac.uk;tudelft.nl;ox.ac.uk;cs.ucl.ac.uk", "author_num": 5, "aff_unique_index": "0;1;2;1;3", "aff_unique_norm": "Microsoft;University of Oxford;Delft University of Technology;University College London", "aff_unique_dep": "Microsoft Corporation;;;Department of Computer Science", "aff_unique_url": "https://www.microsoft.com;https://www.ox.ac.uk;https://www.tudelft.nl;https://www.ucl.ac.uk", "aff_unique_abbr": "Microsoft;Oxford;TU Delft;UCL", "aff_campus_unique_index": "1", "aff_campus_unique": ";London", "aff_country_unique_index": "0;1;2;1;1", "aff_country_unique": "United States;United Kingdom;Netherlands" }, { "id": "N5Zacze7uru", "title": "Neural Lyapunov Model Predictive Control", "track": "main", "status": "Reject", "tldr": "", "abstract": "With a growing interest in data-driven control techniques, Model Predictive Control (MPC) provides a significant opportunity to exploit the surplus of data reliably, particularly while taking safety and stability into account. In this paper, we aim to infer the terminal cost of an MPC controller from transitions generated by an initial \\emph{unknown} demonstrator. We propose an algorithm to alternatively learn the terminal cost and update the MPC parameters according to a stability metric. We design the terminal cost as a Lyapunov function neural network and theoretically show that, under limited approximation error, our proposed approach guarantees that the size of the stability region (region of attraction) is greater than or equal to the one from the initial demonstrator. We also present theorems that characterize the stability and performance of the learned MPC in the presence of model uncertainties and sub-optimality due to function approximation. Empirically, we demonstrate the efficacy of the proposed algorithm on non-linear continuous control tasks with soft constraints. Our results show that the proposed approach can improve upon the initial demonstrator also in practice and achieve better task performance than other learning-based baselines. ", "keywords": "optimal control;mpc;lyapunov neural networks;safe-learning;safety", "primary_area": "", "supplementary_material": "/attachment/19fd5270d49c027cf73f7d98b2e2a7cee60cd246.zip", "author": "Mayank Mittal;Marco Gallieri;Alessio Quaglino;Seyed Sina Mirrazavi Salehian;Jan Koutnik", "authorids": "~Mayank_Mittal1;~Marco_Gallieri1;~Alessio_Quaglino1;sina@nnaisense.com;~Jan_Koutnik1", "gender": "M;M;M;;", "homepage": "https://mayankm96.github.io;;;;", "dblp": ";143/6023;;;https://dblp.org/pers/hd/k/Koutn=iacute=k:Jan.html", "google_scholar": "iVXG-IkAAAAJ;https://scholar.google.ch/citations?user=moNjsXoAAAAJ;;;", "orcid": ";;;;", "linkedin": "mayankm-0096/;https://ch.linkedin.com/in/marco-gallieri-166a0421;;;", "or_profile": "~Mayank_Mittal1;~Marco_Gallieri1;~Alessio_Quaglino1;sina@nnaisense.com;~Jan_Koutnik1", "aff": "Swiss Federal Institute of Technology;NNAISENSE;NNAISENSE;;", "aff_domain": "ethz.ch;nnaisense.com;nnaisense.com;;", "position": "MS student;Senior Researcher;Research Scientist;;", "bibtex": "@misc{\nmittal2021neural,\ntitle={Neural Lyapunov Model Predictive Control},\nauthor={Mayank Mittal and Marco Gallieri and Alessio Quaglino and Seyed Sina Mirrazavi Salehian and Jan Koutnik},\nyear={2021},\nurl={https://openreview.net/forum?id=N5Zacze7uru}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=N5Zacze7uru", "pdf_size": 0, "rating": "3;5;7", "confidence": "4;4;3", "wc_review": "740;443;859", "wc_reply_reviewers": "0;0;476", "wc_reply_authors": "1398;1231;1345", "reply_reviewers": "0;0;1", "reply_authors": "3;3;3", "rating_avg": [ 5.0, 1.632993161855452 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 680.6666666666666, 174.93681398976287 ], "wc_reply_reviewers_avg": [ 158.66666666666666, 224.38855189653108 ], "wc_reply_authors_avg": [ 1324.6666666666667, 69.67703272161415 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 3.0, 0.0 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18438023659259065068&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1", "aff_unique_norm": "Swiss Federal Institute of Technology;NNAISENSE", "aff_unique_dep": ";", "aff_unique_url": "https://www.ethz.ch;https://www.nnaiseNSE.com", "aff_unique_abbr": "ETH Zurich;NNAISENSE", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1", "aff_country_unique": "Switzerland;China" }, { "title": "FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2555", "id": "N6JECD-PI5w", "poster": "", "openreview": "https://openreview.net/forum?id=N6JECD-PI5w", "slides": "https://iclr.cc/virtual/2021/poster/2555", "video": "https://iclr.cc/virtual/2021/poster/2555", "author_site": "Pengyu Cheng, Weituo Hao, Siyang Yuan, Shijing Si, Lawrence Carin", "tldr": "", "abstract": "Pretrained text encoders, such as BERT, have been applied increasingly in various natural language processing (NLP) tasks, and have recently demonstrated significant performance gains. However, recent studies have demonstrated the existence of social bias in these pretrained NLP models. Although prior works have made progress on word-level debiasing, improved sentence-level fairness of pretrained encoders still lacks exploration. In this paper, we proposed the first neural debiasing method for a pretrained sentence encoder, which transforms the pretrained encoder outputs into debiased representations via a fair filter (FairFil) network. To learn the FairFil, we introduce a contrastive learning framework that not only minimizes the correlation between filtered embeddings and bias words but also preserves rich semantic information of the original sentences. On real-world datasets, our FairFil effectively reduces the bias degree of pretrained text encoders, while continuously showing desirable performance on downstream tasks. Moreover, our post hoc method does not require any retraining of the text encoders, further enlarging FairFil's application space.", "keywords": "Fairness;Contrastive Learning;Mutual Information;Pretrained Text Encoders", "primary_area": "", "supplementary_material": "", "author": "Pengyu Cheng;Weituo Hao;Siyang Yuan;Shijing Si;Lawrence Carin", "authorids": "~Pengyu_Cheng1;~Weituo_Hao1;~Siyang_Yuan1;~Shijing_Si1;~Lawrence_Carin2", "gender": "M;;F;M;M", "homepage": "https://linear95.github.io/;;;https://www.linkedin.com/in/shijing-si-81751395/;https://people.ee.duke.edu/~lcarin/", "dblp": "223/6048;;242/8930;254/1233;", "google_scholar": "eeQ_yCkAAAAJ;;;7OnnQlAAAAAJ;yuxwFscAAAAJ", "orcid": "0000-0003-0421-8376;;;;", "linkedin": ";;;shijing-si-81751395/;", "or_profile": "~Pengyu_Cheng1;~Weituo_Hao1;~Siyang_Yuan1;~Shijing_Si1;~Lawrence_Carin2", "aff": "Duke University;;Duke University;Pingan Technology;Duke University", "aff_domain": "duke.edu;;duke.edu;pingan.com.cn;duke.edu", "position": "PhD student;;PhD student;Researcher;Full Professor", "bibtex": "@inproceedings{\ncheng2021fairfil,\ntitle={FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders},\nauthor={Pengyu Cheng and Weituo Hao and Siyang Yuan and Shijing Si and Lawrence Carin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=N6JECD-PI5w}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7;7", "confidence": "4;4;3;4", "wc_review": "302;378;481;263", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "323;212;219;178", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 0.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 356.0, 83.17752100177067 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 233.0, 54.226377345347345 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 130, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13618753574669209531&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=N6JECD-PI5w", "email": "duke.edu;;duke.edu;pingan.com.cn;duke.edu", "author_num": 5, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "Duke University;PingAn Technology", "aff_unique_dep": ";", "aff_unique_url": "https://www.duke.edu;https://www.pingan.com", "aff_unique_abbr": "Duke;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0", "aff_country_unique": "United States;China" }, { "id": "N6SmiyDrkR5", "title": "What's in the Box? Exploring the Inner Life of Neural Networks with Robust Rules", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a novel method for exploring how neurons within a neural network interact. In particular, we consider activation values of a network for given data, and propose to mine noise-robust rules of the form $X \\rightarrow Y$ , where $X$ and $Y$ are sets of neurons in different layers. To ensure we obtain a small and non-redundant set of high quality rules, we formalize the problem in terms of the Minimum Description Length principle, by which we identify the best set of rules as the one that best compresses the activation data. To discover good rule sets, we propose the unsupervised ExplaiNN algorithm. Extensive evaluation shows that our rules give clear insight in how networks perceive the world: they identify shared, resp. class-specific traits, compositionality within the network, as well as locality in convolutional layers. Our rules are easily interpretable, but also super-charge prototyping as they identify which groups of neurons to consider in unison.", "keywords": "Neural Networks;CNN;explaining;interpretable;Rules;black box", "primary_area": "", "supplementary_material": "", "author": "Jonas Fischer;Anna Ol\u00e1h;Jilles Vreeken", "authorids": "~Jonas_Fischer1;aolah@mmci.uni-saarland.de;jv@cispa.de", "gender": ";;", "homepage": ";;", "dblp": ";;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": ";;", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "@misc{\nfischer2021whats,\ntitle={What's in the Box? Exploring the Inner Life of Neural Networks with Robust Rules},\nauthor={Jonas Fischer and Anna Ol{\\'a}h and Jilles Vreeken},\nyear={2021},\nurl={https://openreview.net/forum?id=N6SmiyDrkR5}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=N6SmiyDrkR5", "pdf_size": 0, "rating": "3;5;6;8", "confidence": "4;3;4;3", "wc_review": "395;469;716;670", "wc_reply_reviewers": "249;0;230;697", "wc_reply_authors": "1033;685;1136;1884", "reply_reviewers": "1;0;1;1", "reply_authors": "2;1;2;3", "rating_avg": [ 5.5, 1.8027756377319946 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 562.5, 134.0867256666371 ], "wc_reply_reviewers_avg": [ 294.0, 252.47079038970034 ], "wc_reply_authors_avg": [ 1184.5, 437.0654985239627 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.5547001962252291, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10213675563986212565&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9 }, { "id": "N9oPAFcuYWX", "title": "Understanding and Mitigating Accuracy Disparity in Regression", "track": "main", "status": "Reject", "tldr": "", "abstract": "With the widespread deployment of large-scale prediction systems in high-stakes domains, e.g., face recognition, criminal justice, etc., disparity on prediction accuracy between different demographic subgroups has called for fundamental understanding on the source of such disparity and algorithmic intervention to mitigate it. In this paper, we study the accuracy disparity problem in regression. To begin with, we first propose an error decomposition theorem, which decomposes the accuracy disparity into the distance between label populations and the distance between conditional representations, to help explain why such accuracy disparity appears in practice. Motivated by this error decomposition and the general idea of distribution alignment with statistical distances, we then propose an algorithm to reduce this disparity, and analyze its game-theoretic optima of the proposed objective function. We conduct experiments on four real-world datasets. The experimental results suggest that our proposed algorithms can effectively mitigate accuracy disparity while maintaining the predictive power of the regression models.", "keywords": "Algorithmic Fairness;Representation Learning", "primary_area": "", "supplementary_material": "/attachment/d2f501723ac243b8746d2f12590dfe1124a47df0.zip", "author": "Jianfeng Chi;Han Zhao;Geoff Gordon;Yuan Tian", "authorids": "~Jianfeng_Chi1;~Han_Zhao1;~Geoff_Gordon2;~Yuan_Tian2", "gender": "M;M;F;M", "homepage": "https://jfchi.github.io/;https://hanzhaoml.github.io/;https://www.ytian.info/;http://www.cs.cmu.edu/~ggordon/", "dblp": "231/6028.html;03/3520-2;;41/487", "google_scholar": "S_7a_B4AAAAJ;x942ipYAAAAJ;;8LcYFjEAAAAJ", "orcid": ";0000-0002-8579-1600;;0009-0007-9960-2114", "linkedin": ";;;", "or_profile": "~Jianfeng_Chi1;~Han_Zhao1;~Yuan_Tian2;~Geoff_Gordon1", "aff": "University of Virginia;University of Illinois, Urbana Champaign;University of Virginia;Microsoft", "aff_domain": "virginia.edu;illinois.edu;virginia.edu;microsoft.com", "position": "PhD student;Assistant Professor;Assistant Professor;Partner Researcher", "bibtex": "@misc{\nchi2021understanding,\ntitle={Understanding and Mitigating Accuracy Disparity in Regression},\nauthor={Jianfeng Chi and Han Zhao and Geoff Gordon and Yuan Tian},\nyear={2021},\nurl={https://openreview.net/forum?id=N9oPAFcuYWX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=N9oPAFcuYWX", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "3;3;3;3", "wc_review": "713;342;156;285", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1049;619;107;399", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 374.0, 206.9963767798847 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 543.5, 343.7451817844142 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 30, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9962646376890451048&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "University of Virginia;University of Illinois Urbana-Champaign;Microsoft", "aff_unique_dep": ";;Microsoft Corporation", "aff_unique_url": "https://www.virginia.edu;https://illinois.edu;https://www.microsoft.com", "aff_unique_abbr": "UVA;UIUC;Microsoft", "aff_campus_unique_index": "1", "aff_campus_unique": ";Urbana-Champaign", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Fidelity-based Deep Adiabatic Scheduling", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3327", "id": "NECTfffOvn1", "poster": "", "openreview": "https://openreview.net/forum?id=NECTfffOvn1", "slides": "https://iclr.cc/virtual/2021/poster/3327", "video": "https://iclr.cc/virtual/2021/poster/3327", "author_site": "Eli Ovits, Lior Wolf", "tldr": "", "abstract": "Adiabatic quantum computation is a form of computation that acts by slowly interpolating a quantum system between an easy to prepare initial state and a final state that represents a solution to a given computational problem. The choice of the interpolation schedule is critical to the performance: if at a certain time point, the evolution is too rapid, the system has a high probability to transfer to a higher energy state, which does not represent a solution to the problem. On the other hand, an evolution that is too slow leads to a loss of computation time and increases the probability of failure due to decoherence. In this work, we train deep neural models to produce optimal schedules that are conditioned on the problem at hand. We consider two types of problem representation: the Hamiltonian form, and the Quadratic Unconstrained Binary Optimization (QUBO) form. A novel loss function that scores schedules according to their approximated success probability is introduced. We benchmark our approach on random QUBO problems, Grover search, 3-SAT, and MAX-CUT problems and show that our approach outperforms, by a sizable margin, the linear schedules as well as alternative approaches that were very recently proposed.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Eli Ovits;Lior Wolf", "authorids": "eliovify@gmail.com;~Lior_Wolf1", "gender": ";M", "homepage": ";http://www.cs.tau.ac.il/~wolf", "dblp": ";83/4103", "google_scholar": ";UbFrXTsAAAAJ", "orcid": ";0000-0001-5578-8892", "linkedin": ";", "or_profile": "eliovify@gmail.com;~Lior_Wolf1", "aff": ";Tel Aviv University", "aff_domain": ";tau.ac.il", "position": ";Full Professor", "bibtex": "@inproceedings{\novits2021fidelitybased,\ntitle={Fidelity-based Deep Adiabatic Scheduling},\nauthor={Eli Ovits and Lior Wolf},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NECTfffOvn1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;6;8;9", "confidence": "4;4;4;5", "wc_review": "525;272;464;251", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "791;348;347;8", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.25, 1.299038105676658 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 378.0, 118.71183597266112 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 373.5, 278.05080471021836 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.7777777777777777, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:onFJPxHkb1IJ:scholar.google.com/&scioq=Fidelity-based+Deep+Adiabatic+Scheduling&hl=en&as_sdt=0,5", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=NECTfffOvn1", "email": ";tau.ac.il", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "Tel Aviv University", "aff_unique_dep": "", "aff_unique_url": "https://www.tau.ac.il", "aff_unique_abbr": "TAU", "aff_country_unique_index": "0", "aff_country_unique": "Israel" }, { "id": "NGBY716p1VR", "title": "Towards Understanding Fast Adversarial Training", "track": "main", "status": "Reject", "tldr": "", "abstract": "Current neural-network-based classifiers are susceptible to adversarial examples. The most empirically successful approach to defending against such adversarial examples is adversarial training, which incorporates a strong self-attack during training to enhance its robustness. This approach, however, is computationally expensive and hence is hard to scale up. A recent work, called fast adversarial training, has shown that it is possible to markedly reduce computation time without sacrificing significant performance. This approach incorporates simple self-attacks, yet it can only run for a limited number of training epochs, resulting in sub-optimal performance. In this paper, we conduct experiments to understand the behavior of fast adversarial training and show the key to its success is the ability to recover from overfitting to weak attacks. We then extend our findings to improve fast adversarial training, demonstrating superior robust accuracy to strong adversarial training, with much-reduced training time.", "keywords": "fast adversarial training;adversarial examples", "primary_area": "", "supplementary_material": "/attachment/d8c0a898f567a766e633d87435938af3901c7b90.zip", "author": "Bai Li;Shiqi Wang;Suman Jana;Lawrence Carin", "authorids": "~Bai_Li1;~Shiqi_Wang2;~Suman_Jana1;~Lawrence_Carin2", "gender": ";M;M;M", "homepage": ";https://shiqi-wang.github.io;http://sumanj.info;https://people.ee.duke.edu/~lcarin/", "dblp": "93/3383;58/9145-2;74/28;", "google_scholar": ";u_MzXeMAAAAJ;https://scholar.google.com.tw/citations?user=SDY9FwUAAAAJ;yuxwFscAAAAJ", "orcid": ";0000-0002-6338-1432;;", "linkedin": ";tcwangshiqi/;;", "or_profile": "~Bai_Li1;~Shiqi_Wang2;~Suman_Jana1;~Lawrence_Carin2", "aff": "Duke University;Columbia University;, Columbia University;Duke University", "aff_domain": "duke.edu;columbia.edu;cs.columbia.edu;duke.edu", "position": "PhD student;PhD student;Associate Professor;Full Professor", "bibtex": "@misc{\nli2021towards,\ntitle={Towards Understanding Fast Adversarial Training},\nauthor={Bai Li and Shiqi Wang and Suman Jana and Lawrence Carin},\nyear={2021},\nurl={https://openreview.net/forum?id=NGBY716p1VR}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=NGBY716p1VR", "pdf_size": 0, "rating": "5;5;5;7", "confidence": "5;4;4;5", "wc_review": "206;216;499;1059", "wc_reply_reviewers": "0;0;0;356", "wc_reply_authors": "268;234;489;836", "reply_reviewers": "0;0;0;3", "reply_authors": "1;1;1;4", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 495.0, 346.2203055859087 ], "wc_reply_reviewers_avg": [ 89.0, 154.1525218736301 ], "wc_reply_authors_avg": [ 456.75, 239.85138627908742 ], "reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "reply_authors_avg": [ 1.75, 1.299038105676658 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 56, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=375916478360005736&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "Duke University;Columbia University", "aff_unique_dep": ";", "aff_unique_url": "https://www.duke.edu;https://www.columbia.edu", "aff_unique_abbr": "Duke;Columbia", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "NHysmWivrHP", "title": "Single Image Depth Estimation Based on Spectral Consistency and Predicted View", "track": "main", "status": "Desk Reject", "tldr": "", "abstract": "Single image depth estimation is a critical issue for robot vision, augmented reality, and many other applications when an image sequence is not available. Self-supervised single image depth estimation models target at predicting accurate disparity map just from one single image without ground truth supervision or stereo image pair during real applications. Compared with direct single image depth estimation, single image stereo algorithm can generate the depth from different camera perspectives. In this paper, we propose a novel architecture to infer accurate disparity by leveraging both spectral-consistency based learning model and view-prediction based stereo reconstruction algorithm. Direct spectral-consistency based method can avoid false positive matching in smooth regions. Single image stereo can preserve more distinct boundaries from another camera perspective. By learning confidence map and designing a fusion strategy, the two disparities from two approaches are able to be effectively fused to produce the refined disparity. Extensive experiments indicate that our method exploits both advantages of spectral consistency and view prediction, especially in constraining boundaries and correcting wrong predicting regions.", "keywords": "Single Image Depth Estimation;Stereo Matching;View Prediction;Unsupervised Learning", "primary_area": "", "supplementary_material": "", "author": "Anonymous", "authorids": "ICLR.cc/2021/Conference/Paper2916/Authors", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nanonymous2021single,\ntitle={Single Image Depth Estimation Based on Spectral Consistency and Predicted View},\nauthor={Anonymous},\nbooktitle={Submitted to International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NHysmWivrHP},\nnote={under review}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=NHysmWivrHP", "pdf_size": 0, "rating": "3;4;4", "confidence": "4;4;5", "wc_review": "253;597;697", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "292;626;228", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 515.6666666666666, 190.16717791342322 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 382.0, 174.50119388321292 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1 }, { "id": "NL40q_yuavv", "title": "Training Data Generating Networks: Linking 3D Shapes and Few-Shot Classification", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We propose a novel 3d shape representation for 3d shape reconstruction from a single image. Rather than predicting a shape directly, we train a network to generate a training set which will be feed into another learning algorithm to define the shape.\nTraining data generating networks establish a link between few-shot learning and 3d shape analysis. We propose a novel meta-learning framework to jointly train the data generating network and other components. We improve upon recent work on standard benchmarks for 3d shape reconstruction, but our novel shape representation has many applications.", "keywords": "shape representation;single image 3d reconstruction;few-shot learning;meta learning", "primary_area": "", "supplementary_material": "/attachment/2d18794133c0de29ca27e280f223e625fc5876b0.zip", "author": "Biao Zhang;Peter Wonka", "authorids": "~Biao_Zhang5;~Peter_Wonka1", "gender": ";M", "homepage": "https://1zb.github.io;http://peterwonka.net", "dblp": "83/3266-5;98/5522", "google_scholar": "h5KukxEAAAAJ;https://scholar.google.com.tw/citations?user=0EKXSXgAAAAJ", "orcid": ";0000-0003-0627-9746", "linkedin": ";", "or_profile": "~Biao_Zhang5;~Peter_Wonka1", "aff": "KAUST;KAUST", "aff_domain": "kaust.edu.sa;kaust.edu.sa", "position": "PhD student;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=NL40q_yuavv", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "4;4;4;4", "wc_review": "914;637;678;536", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "69;206;61;84", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 691.25, 138.59901695178073 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 105.0, 58.89397252690635 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7193053860761549587&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "King Abdullah University of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaust.edu.sa", "aff_unique_abbr": "KAUST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Saudi Arabia" }, { "id": "NLuOUSp9zZd", "title": "DO-GAN: A Double Oracle Framework for Generative Adversarial Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we propose a new approach to train Generative Adversarial Networks (GAN) where we deploy a double-oracle framework using the generator and discriminator oracles. GAN is essentially a two-player zero-sum game between the generator and the discriminator. Training GANs is challenging as a pure Nash equilibrium may not exist and even finding the mixed Nash equilibrium is difficult as GANs have a large-scale strategy space. In DO-GAN, we extend the double oracle framework to GANs. We first generalize the player strategies as the trained models of generator and discriminator from the best response oracles. We then compute the meta-strategies using a linear program. Next, we prune the weakly-dominated player strategies to keep the oracles from becoming intractable. We apply our framework to established architectures such as vanilla GAN, Deep Convolutional GAN, Spectral Normalization GAN and Stacked GAN. Finally, we conduct evaluations on MNIST, CIFAR-10 and CelebA datasets and show that DO-GAN variants have significant improvements in both subjective qualitative evaluation and quantitative metrics, compared with their respective GAN architectures.", "keywords": "GAN;Generative Models;Adversarial Networks;Game Theory", "primary_area": "", "supplementary_material": "", "author": "Aye Phyu Phyu Aung;Xinrun Wang;Runsheng Yu;Bo An;Senthilnath Jayavelu;Xiaoli Li", "authorids": "~Aye_Phyu_Phyu_Aung1;~Xinrun_Wang1;~Runsheng_Yu2;~Bo_An2;j_senthilnath@i2r.a-star.edu.sg;~Xiaoli_Li1", "gender": "F;M;Not Specified;M;;M", "homepage": "https://scholar.google.com/citations?user=CGf-zXkAAAAJ&hl=en&oi=ao;https://rainwangphy.github.io/;https://www.linkedin.com/in/runsheng-yu-560696127/;https://personal.ntu.edu.sg/boan/;;https://personal.ntu.edu.sg/xlli/", "dblp": "266/4628;199/6413;210/2646.html?q=runsheng%20yu;42/6178-1.html;;l/XiaoliLi.html", "google_scholar": ";ROANfPUAAAAJ;;PEEpuNwAAAAJ;;E3yQKloAAAAJ", "orcid": ";;0000-0003-0053-1234;0000-0002-7064-7438;;0000-0002-0762-6562", "linkedin": ";;;;;li-xiaoli-41027ba/", "or_profile": "~Aye_Phyu_Phyu_Aung1;~Xinrun_Wang1;~Runsheng_Yu2;~Bo_An2;j_senthilnath@i2r.a-star.edu.sg;~Xiaoli_Li1", "aff": "Nanyang Technological University;Nanyang Technological University;Hong Kong University of Science and Technology;Nanyang Technological University;;A*STAR", "aff_domain": "ntu.edu.sg;ntu.edu.sg;ust.hk;ntu.edu.sg;;a-star.edu.sg", "position": "PhD student;Postdoc;PhD student;Full Professor;;Principal Researcher", "bibtex": "@misc{\naung2021dogan,\ntitle={{\\{}DO{\\}}-{\\{}GAN{\\}}: A Double Oracle Framework for Generative Adversarial Networks},\nauthor={Aye Phyu Phyu Aung and Xinrun Wang and Runsheng Yu and Bo An and Senthilnath Jayavelu and Xiaoli Li},\nyear={2021},\nurl={https://openreview.net/forum?id=NLuOUSp9zZd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=NLuOUSp9zZd", "pdf_size": 0, "rating": "3;4;6;6", "confidence": "4;4;4;3", "wc_review": "495;381;618;322", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "578;532;292;375", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 1.299038105676658 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 454.0, 113.28062499827585 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 444.25, 115.72029856511777 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.5555555555555555, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4659782895734926685&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;0;1;0;2", "aff_unique_norm": "Nanyang Technological University;Hong Kong University of Science and Technology;Agency for Science, Technology and Research", "aff_unique_dep": ";;", "aff_unique_url": "https://www.ntu.edu.sg;https://www.ust.hk;https://www.a-star.edu.sg", "aff_unique_abbr": "NTU;HKUST;A*STAR", "aff_campus_unique_index": "1", "aff_campus_unique": ";Hong Kong SAR", "aff_country_unique_index": "0;0;1;0;0", "aff_country_unique": "Singapore;China" }, { "id": "NMgB4CVnMh", "title": "Acoustic Neighbor Embeddings", "track": "main", "status": "Reject", "tldr": "", "abstract": "This paper proposes a novel acoustic word embedding called Acoustic Neighbor Embeddings where speech or text of arbitrary length are mapped to a vector space of fixed, reduced dimensions by adapting stochastic neighbor embedding (SNE) to sequential inputs. The Euclidean distance between coordinates in the embedding space reflects the phonetic confusability between their corresponding sequences. Two encoder neural networks are trained: an acoustic encoder that accepts speech signals in the form of frame-wise subword posterior probabilities obtained from an acoustic model and a text encoder that accepts text in the form of subword transcriptions. Compared to a triplet loss criterion, the proposed method is shown to have more effective gradients for neural network training. Experimentally, it also gives more accurate results with low-dimensional embeddings when the two encoder networks are used in tandem in a word (name) recognition task, and when the text encoder network is used standalone in an approximate phonetic matching task. In particular, in an isolated name recognition task depending solely on Euclidean nearest-neighbor search between the proposed embedding vectors, the recognition accuracy is identical to that of conventional finite state transducer(FST)-based decoding using test data with up to 1 million names in the vocabulary and 40 dimensions in the embeddings.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Woojay Jeon", "authorids": "~Woojay_Jeon1", "gender": "", "homepage": "https://sites.google.com/site/woojay", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "woojay/", "or_profile": "~Woojay_Jeon1", "aff": "Apple", "aff_domain": "apple.com", "position": "Machine Learning Engineer", "bibtex": "@misc{\njeon2021acoustic,\ntitle={Acoustic Neighbor Embeddings},\nauthor={Woojay Jeon},\nyear={2021},\nurl={https://openreview.net/forum?id=NMgB4CVnMh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer5;AnonReviewer1;AnonReviewer6", "site": "https://openreview.net/forum?id=NMgB4CVnMh", "pdf_size": 0, "rating": "6;6;6;6;6", "confidence": "4;4;3;4;4", "wc_review": "1128;523;851;450;807", "wc_reply_reviewers": "51;0;136;0;0", "wc_reply_authors": "1044;715;973;355;647", "reply_reviewers": "1;0;2;0;0", "reply_authors": "3;1;3;1;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.8, 0.39999999999999997 ], "wc_review_avg": [ 751.8, 244.0683510822327 ], "wc_reply_reviewers_avg": [ 37.4, 53.10969779616525 ], "wc_reply_authors_avg": [ 746.8, 246.5655288153638 ], "reply_reviewers_avg": [ 0.6, 0.7999999999999999 ], "reply_authors_avg": [ 1.8, 0.9797958971132713 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13843284116423503954&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0", "aff_unique_norm": "Apple", "aff_unique_dep": "Apple Inc.", "aff_unique_url": "https://www.apple.com", "aff_unique_abbr": "Apple", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "NNd0J677PN", "title": "Voting-based Approaches For Differentially Private Federated Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "While federated learning (FL) enables distributed agents to collaboratively train a centralized model without sharing data with each other, it fails to protect users against inference attacks that mine private information from the centralized model. Thus, facilitating federated learning methods with differential privacy (DPFL) becomes attractive. Existing algorithms based on privately aggregating clipped gradients require many rounds of communication, which may not converge, and cannot scale up to large-capacity models due to explicit dimension-dependence in its added noise. In this paper, we adopt the knowledge transfer model of private learning pioneered by Papernot et al. (2017; 2018) and extend their algorithm PATE, as well as the recent alternative PrivateKNN (Zhu et al., 2020) to the federated learning setting. The key difference is that our method privately aggregates the labels from the agents in a voting scheme, instead of aggregating the gradients, hence avoiding the dimension dependence and achieving signi\ufb01cant savings in communication cost. Theoretically, we show that when the margins of the voting scores are large, the agents enjoy exponentially higher accuracy and stronger (data-dependent) differential privacy guarantees on both agent-level and instance-level. Extensive experiments show that our approach signi\ufb01cantly improves the privacy-utility trade-off over the current state-of-the-art in DPFL.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/a742ddec82d7b6a22f2e3cf8a8f71b9301f7822e.zip", "author": "Yuqing Zhu;Xiang Yu;Yi-Hsuan Tsai;Francesco Pittaluga;Masoud Faraki;Manmohan Chandraker;Yu-Xiang Wang", "authorids": "~Yuqing_Zhu1;~Xiang_Yu1;~Yi-Hsuan_Tsai1;~Francesco_Pittaluga2;~Masoud_Faraki2;~Manmohan_Chandraker3;~Yu-Xiang_Wang1", "gender": "F;M;M;M;;M;", "homepage": "https://jeremy43.github.io/;https://sites.google.com/site/xiangyurutgers/;https://sites.google.com/site/yihsuantsai/home;https://www.francescopittaluga.com/;http://www.cs.ucsb.edu/~yuxiangw/publications.html;http://cseweb.ucsd.edu/~mkchandraker/;https://www.nicta.com.au/category/research/computer-vision/people/mfaraki/", "dblp": ";19/2453-2.html;142/2924;167/5304;62/1637-3.html;79/589;143/9779", "google_scholar": "QmMv9PIAAAAJ;QJbtEKMAAAAJ;https://scholar.google.it/citations?user=zjI51wEAAAAJ;bIeCNNoAAAAJ;HGNZ1fkAAAAJ;oPFCNk4AAAAJ;zEVWJu0AAAAJ", "orcid": ";;;;;;", "linkedin": ";;;;;;", "or_profile": "~Yuqing_Zhu1;~Xiang_Yu1;~Yi-Hsuan_Tsai1;~Francesco_Pittaluga2;~Yu-Xiang_Wang1;~Manmohan_Chandraker2;~Masoud_Faraki1", "aff": "UC Santa Barbara;NEC;NEC-Labs;NEC-Labs;UC Santa Barbara;University of California, San Diego;NEC-Labs", "aff_domain": "ucsb.edu;nec.com;nec-labs.com;nec-labs.com;ucsb.edu;ucsd.edu;nec-labs.com", "position": "PhD student;Researcher;Researcher;Researcher;Assistant Professor;Associate Professor;Researcher", "bibtex": "@misc{\nzhu2021votingbased,\ntitle={Voting-based Approaches For Differentially Private Federated Learning},\nauthor={Yuqing Zhu and Xiang Yu and Yi-Hsuan Tsai and Francesco Pittaluga and Masoud Faraki and Manmohan Chandraker and Yu-Xiang Wang},\nyear={2021},\nurl={https://openreview.net/forum?id=NNd0J677PN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=NNd0J677PN", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "4;4;2;2", "wc_review": "229;548;343;641", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "552;717;324;492", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 3.0, 1.0 ], "wc_review_avg": [ 440.25, 162.7841745993756 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 521.25, 140.55848426900454 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6355964122382479553&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;2;2;0;3;2", "aff_unique_norm": "University of California, Santa Barbara;NEC Corporation;NEC Laboratories;University of California, San Diego", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.ucsb.edu;https://www.nec.com;https://www.nec-labs.com;https://www.ucsd.edu", "aff_unique_abbr": "UCSB;NEC;NEC-Labs;UCSD", "aff_campus_unique_index": "0;0;2", "aff_campus_unique": "Santa Barbara;;San Diego", "aff_country_unique_index": "0;1;0;0;0;0;0", "aff_country_unique": "United States;Japan" }, { "id": "NPab8GcO5Pw", "title": "On the Landscape of Sparse Linear Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Network pruning, or sparse network has a long history and practical significance in modern applications. Although the loss functions of neural networks may yield bad landscape due to non-convexity, we focus on linear activation which already owes benign landscape. With no unrealistic assumption, we conclude the following statements for the squared loss objective of general sparse linear neural networks: 1) every local minimum is a global minimum for scalar output with any sparse structure, or non-intersected sparse first layer and dense other layers with orthogonal training data; 2) sparse linear networks have sub-optimal local-min for only sparse first layer due to low rank constraint, or output larger than three dimensions due to the global minimum of a sub-network. Overall, sparsity breaks the normal structure, cutting out the decreasing path in original fully-connected networks.", "keywords": "theory;sparse network;landscape", "primary_area": "", "supplementary_material": "/attachment/8a699e3414d36b14c4c1df65de0aa42e3d985546.zip", "author": "Dachao Lin;Ruoyu Sun;Zhihua Zhang", "authorids": "~Dachao_Lin1;~Ruoyu_Sun1;~Zhihua_Zhang1", "gender": "M;;M", "homepage": ";https://ruoyus.github.io/;http://www.math.pku.edu.cn/teachers/zhzhang/", "dblp": "76/8488;30/9879-1;52/5331", "google_scholar": ";PsfzbCMAAAAJ;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Dachao_Lin1;~Ruoyu_Sun1;~Zhihua_Zhang1", "aff": "Peking University;University of Illinois, Urbana-Champaign;Peking University", "aff_domain": "pku.edu.cn;uiuc.edu;pku.edu.cn", "position": "PhD student;Assistant Professor;Full Professor", "bibtex": "@misc{\nlin2021on,\ntitle={On the Landscape of Sparse Linear Networks},\nauthor={Dachao Lin and Ruoyu Sun and Zhihua Zhang},\nyear={2021},\nurl={https://openreview.net/forum?id=NPab8GcO5Pw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=NPab8GcO5Pw", "pdf_size": 0, "rating": "4;4;5;7", "confidence": "4;4;4;5", "wc_review": "422;498;855;499", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 568.5, 168.33374587408196 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.9428090415820632, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:_JCUwSziVxsJ:scholar.google.com/&scioq=On+the+Landscape+of+Sparse+Linear+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0", "aff_unique_norm": "Peking University;University of Illinois", "aff_unique_dep": ";", "aff_unique_url": "http://www.pku.edu.cn;https://illinois.edu", "aff_unique_abbr": "Peking U;UIUC", "aff_campus_unique_index": "1", "aff_campus_unique": ";Urbana-Champaign", "aff_country_unique_index": "0;1;0", "aff_country_unique": "China;United States" }, { "title": "On the Impossibility of Global Convergence in Multi-Loss Optimization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2794", "id": "NQbnPjPYaG6", "poster": "", "openreview": "https://openreview.net/forum?id=NQbnPjPYaG6", "slides": "https://iclr.cc/virtual/2021/poster/2794", "video": "https://iclr.cc/virtual/2021/poster/2794", "tldr": "", "abstract": "Under mild regularity conditions, gradient-based methods converge globally to a critical point in the single-loss setting. This is known to break down for vanilla gradient descent when moving to multi-loss optimization, but can we hope to build some algorithm with global guarantees? We negatively resolve this open problem by proving that desirable convergence properties cannot simultaneously hold for any algorithm. Our result has more to do with the existence of games with no satisfactory outcomes, than with algorithms per se. More explicitly we construct a two-player game with zero-sum interactions whose losses are both coercive and analytic, but whose only simultaneous critical point is a strict maximum. Any 'reasonable' algorithm, defined to avoid strict maxima, will therefore fail to converge. This is fundamentally different from single losses, where coercivity implies existence of a global minimum. Moreover, we prove that a wide range of existing gradient-based methods almost surely have bounded but non-convergent iterates in a constructed zero-sum game for suitably small learning rates. It nonetheless remains an open question whether such behavior can arise in high-dimensional games of interest to ML practitioners, such as GANs or multi-agent RL.", "keywords": "impossibility;global;convergence;optimization;multi-loss;multi-player;multi-agent;gradient;descent", "primary_area": "", "supplementary_material": "/attachment/946d9fbabab67c8a629490d201c814ac3a15dc7f.zip", "author": "Alistair Letcher", "authorids": "~Alistair_Letcher1", "gender": "M", "homepage": "https://aletcher.github.io", "dblp": "", "google_scholar": "o28w0mwAAAAJ", "orcid": "", "linkedin": "", "or_profile": "~Alistair_Letcher1", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nletcher2021on,\ntitle={On the Impossibility of Global Convergence in Multi-Loss Optimization},\nauthor={Alistair Letcher},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NQbnPjPYaG6}\n}", "github": "[![github](/images/github_icon.svg) aletcher/impossibility-global-convergence](https://github.com/aletcher/impossibility-global-convergence)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "4;6;7;8", "confidence": "4;4;4;5", "wc_review": "245;296;583;77", "wc_reply_reviewers": "178;0;0;0", "wc_reply_authors": "287;203;465;19", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 1.479019945774904 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 300.25, 182.2489711905118 ], "wc_reply_reviewers_avg": [ 44.5, 77.07626093681505 ], "wc_reply_authors_avg": [ 243.5, 160.464170455588 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.6831300510639732, "gs_citation": 36, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12737917021502438759&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=NQbnPjPYaG6", "email": "", "author_num": 1 }, { "title": "Degree-Quant: Quantization-Aware Training for Graph Neural Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2990", "id": "NSBrFgJAHg", "poster": "", "openreview": "https://openreview.net/forum?id=NSBrFgJAHg", "slides": "https://iclr.cc/virtual/2021/poster/2990", "video": "https://iclr.cc/virtual/2021/poster/2990", "author_site": "Shyam Tailor, Javier Fernandez-Marques, Nicholas Lane", "tldr": "", "abstract": "Graph neural networks (GNNs) have demonstrated strong performance on a wide variety of tasks due to their ability to model non-uniform structured data. Despite their promise, there exists little research exploring methods to make them more efficient at inference time. In this work, we explore the viability of training quantized GNNs, enabling the usage of low precision integer arithmetic during inference. For GNNs seemingly unimportant choices in quantization implementation cause dramatic changes in performance. We identify the sources of error that uniquely arise when attempting to quantize GNNs, and propose an architecturally-agnostic and stable method, Degree-Quant, to improve performance over existing quantization-aware training baselines commonly used on other architectures, such as CNNs. We validate our method on six datasets and show, unlike previous quantization attempts, that models generalize to unseen graphs. Models trained with Degree-Quant for INT8 quantization perform as well as FP32 models in most cases; for INT4 models, we obtain up to 26% gains over the baselines. Our work enables up to 4.7x speedups on CPU when using INT8 arithmetic.", "keywords": "Graph neural networks;quantization;benchmark", "primary_area": "", "supplementary_material": "/attachment/c9fa62fb6ee45741e84d2d448cca5ba41566bd26.zip", "author": "Shyam Anil Tailor;Javier Fernandez-Marques;Nicholas Donald Lane", "authorids": "~Shyam_Anil_Tailor1;~Javier_Fernandez-Marques1;~Nicholas_Donald_Lane1", "gender": "M;M;M", "homepage": "https://www.shyamt.com;;http://niclane.org", "dblp": "256/9384;171/7908;03/2663.html", "google_scholar": "aJVp0DsAAAAJ;Htu1YhIAAAAJ;https://scholar.google.co.uk/citations?hl=en", "orcid": ";;0000-0002-2728-8273", "linkedin": ";jafermarq/;niclane", "or_profile": "~Shyam_Anil_Tailor1;~Javier_Fern\u00e1ndez_Marqu\u00e9s1;~Nic_Lane2", "aff": "Computer Laboratory;University of Oxford;Samsung", "aff_domain": "cl.cam.ac.uk;ox.ac.uk;samsung.com", "position": "PhD student;PhD student;Laboratory Director", "bibtex": "@inproceedings{\ntailor2021degreequant,\ntitle={Degree-Quant: Quantization-Aware Training for Graph Neural Networks},\nauthor={Shyam Anil Tailor and Javier Fernandez-Marques and Nicholas Donald Lane},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NSBrFgJAHg}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer5;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7", "confidence": "2;2;4", "wc_review": "299;541;187", "wc_reply_reviewers": "0;205;0", "wc_reply_authors": "606;758;92", "reply_reviewers": "0;1;0", "reply_authors": "1;2;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 2.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 342.3333333333333, 147.7324909723277 ], "wc_reply_reviewers_avg": [ 68.33333333333333, 96.6379267621615 ], "wc_reply_authors_avg": [ 485.3333333333333, 284.967054626008 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 219, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4730819138390684730&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=NSBrFgJAHg", "email": "cl.cam.ac.uk;ox.ac.uk;samsung.com", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Cambridge;University of Oxford;Samsung", "aff_unique_dep": "Computer Laboratory;;Samsung", "aff_unique_url": "https://www.cl.cam.ac.uk;https://www.ox.ac.uk;https://www.samsung.com", "aff_unique_abbr": "CL;Oxford;Samsung", "aff_campus_unique_index": "0", "aff_campus_unique": "Cambridge;", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United Kingdom;South Korea" }, { "id": "NTElq-Fo-F4", "title": "Analysing Features Learned Using Unsupervised Models on Program Embeddings", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "In this paper, we propose a novel approach for analyzing and evaluating how a deep neural network is autonomously learning different features related to programs on different input representations.\nWe trained a simple autoencoder having 5 hidden layers on a dataset containing Java programs, and we tested the ability of each of its neurons in detecting different program features using only unlabeled data for the training phase. For doing that, we designed two binary classification problems having different scopes: while the first one is based on the program cyclomatic complexity, the other one is defined starting from the identifiers chosen by the programmers, making it more related to the functionality (and thus, to some extent, to the semantic) of the program than to its structure. Using different program vector representations as input, we performed experiments considering the two problems, showing how some neurons can be effectively used as classifiers for programs on different binary tasks. We also discuss how the program representation chosen as input affects the classification performance, stating that new and customized program embeddings could be designed in order to obtain models able to solve different tasks guided by the proposed benchmarking approach.", "keywords": "Source code embedding;Unsupervised learning", "primary_area": "", "supplementary_material": "", "author": "Martina Saletta;Claudio Ferretti", "authorids": "~Martina_Saletta1;claudio.ferretti@unimib.it", "gender": "F;", "homepage": ";", "dblp": "248/0645.html;", "google_scholar": "VJkSZvIAAAAJ;", "orcid": ";", "linkedin": ";", "or_profile": "~Martina_Saletta1;claudio.ferretti@unimib.it", "aff": "University of Milan-Bicocca;", "aff_domain": "unimib.it;", "position": "PhD student;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=NTElq-Fo-F4", "pdf_size": 0, "rating": "2;3;4;5", "confidence": "4;4;5;3", "wc_review": "335;512;724;376", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 486.75, 151.83770118122837 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3162277660168379, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:W1eX5F9F1-QJ:scholar.google.com/&scioq=Analysing+Features+Learned+Using+Unsupervised+Models+on+Program+Embeddings&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "University of Milan-Bicocca", "aff_unique_dep": "", "aff_unique_url": "https://www.unimib.it", "aff_unique_abbr": "UNIMIB", "aff_country_unique_index": "0", "aff_country_unique": "Italy" }, { "title": "Distilling Knowledge from Reader to Retriever for Question Answering", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2774", "id": "NTEz-6wysdb", "poster": "", "openreview": "https://openreview.net/forum?id=NTEz-6wysdb", "slides": "https://iclr.cc/virtual/2021/poster/2774", "video": "https://iclr.cc/virtual/2021/poster/2774", "author_site": "Gautier Izacard, Edouard Grave", "tldr": "", "abstract": "The task of information retrieval is an important component of many natural language processing systems, such as open domain question answering. While traditional methods were based on hand-crafted features, continuous representations based on neural networks recently obtained competitive results. A challenge of using such methods is to obtain supervised data to train the retriever model, corresponding to pairs of query and support documents. In this paper, we propose a technique to learn retriever models for downstream tasks, inspired by knowledge distillation, and which does not require annotated pairs of query and documents. Our approach leverages attention scores of a reader model, used to solve the task based on retrieved documents, to obtain synthetic labels for the retriever. We evaluate our method on question answering, obtaining state-of-the-art results.", "keywords": "question answering;information retrieval", "primary_area": "", "supplementary_material": "", "author": "Gautier Izacard;Edouard Grave", "authorids": "~Gautier_Izacard1;~Edouard_Grave1", "gender": "Unspecified;", "homepage": ";", "dblp": "222/3621;50/10261", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;7UV4ET4AAAAJ", "orcid": ";", "linkedin": ";edouard-grave-63099823/", "or_profile": "~Gautier_Izacard1;~Edouard_Grave1", "aff": "Meta Facebook;Meta Facebook", "aff_domain": "fb.com;fb.com", "position": "PhD student;Research Scientist", "bibtex": "@inproceedings{\nizacard2021distilling,\ntitle={Distilling Knowledge from Reader to Retriever for Question Answering},\nauthor={Gautier Izacard and Edouard Grave},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NTEz-6wysdb}\n}", "github": "[![github](/images/github_icon.svg) facebookresearch/FiD](https://github.com/facebookresearch/FiD) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=NTEz-6wysdb)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;3;4;4", "wc_review": "236;366;259;293", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "117;231;907;616", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 288.5, 49.1248409666637 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 467.75, 313.84500553617227 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 269, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18188741483036284668&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=NTEz-6wysdb", "email": "fb.com;fb.com", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta Platforms, Inc.", "aff_unique_url": "https://meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "NTP9OdaT6nm", "title": "Formal Language Constrained Markov Decision Processes", "track": "main", "status": "Reject", "tldr": "", "abstract": "In order to satisfy safety conditions, an agent may be constrained from acting freely. A safe controller can be designed a priori if an environment is well understood, but not when learning is employed. In particular, reinforcement learned (RL) controllers require exploration, which can be hazardous in safety critical situations. We study the benefits of giving structure to the constraints of a constrained Markov decision process by specifying them in formal languages as a step towards using safety methods from software engineering and controller synthesis. We instantiate these constraints as finite automata to efficiently recognise constraint violations. Constraint states are then used to augment the underlying MDP state and to learn a dense cost function, easing the problem of quickly learning joint MDP/constraint dynamics. We empirically evaluate the effect of these methods on training a variety of RL algorithms over several constraints specified in Safety Gym, MuJoCo, and Atari environments.", "keywords": "safe reinforcement learning;formal languages;constrained Markov decision process;safety gym;safety", "primary_area": "", "supplementary_material": "/attachment/eafc6a33d141c356320b069de7bef68e1b0482e6.zip", "author": "Eleanor Quint;Dong Xu;Samuel W Flint;Stephen D Scott;Matthew Dwyer", "authorids": "~Eleanor_Quint1;~Dong_Xu1;~Samuel_W_Flint1;~Stephen_D_Scott1;~Matthew_Dwyer1", "gender": "F;M;M;M;", "homepage": ";;;http://cse.unl.edu/~sscott;", "dblp": ";09/3493;;;", "google_scholar": ";;;;", "orcid": ";;0000-0002-8023-9710;;", "linkedin": ";;;;", "or_profile": "~Eleanor_Quint1;~Dong_Xu1;~Samuel_W_Flint1;~Stephen_D_Scott1;~Matthew_Dwyer1", "aff": "University of Nebraska, Lincoln;University of Virginia;University of Nebraska, Lincoln;;University of Nebraska-Lincoln", "aff_domain": "unl.edu;virginia.edu;unl.edu;;", "position": "PhD student;PhD student;PhD student;;", "bibtex": "@misc{\nquint2021formal,\ntitle={Formal Language Constrained Markov Decision Processes},\nauthor={Eleanor Quint and Dong Xu and Samuel W Flint and Stephen D Scott and Matthew Dwyer},\nyear={2021},\nurl={https://openreview.net/forum?id=NTP9OdaT6nm}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=NTP9OdaT6nm", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "5;3;3;5", "wc_review": "744;1342;489;466", "wc_reply_reviewers": "0;61;0;0", "wc_reply_authors": "1027;1663;347;650", "reply_reviewers": "0;1;0;0", "reply_authors": "2;3;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 760.25, 353.14895936417537 ], "wc_reply_reviewers_avg": [ 15.25, 26.413774815425377 ], "wc_reply_authors_avg": [ 921.75, 491.0994680306628 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6378119731164746320&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "University of Nebraska;University of Virginia;University of Nebraska-Lincoln", "aff_unique_dep": ";;", "aff_unique_url": "https://www.unl.edu;https://www.virginia.edu;https://www.unl.edu", "aff_unique_abbr": "UNL;UVA;UNL", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Lincoln;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "NUCZeoVlAe", "title": "Empirical Studies on the Convergence of Feature Spaces in Deep Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "While deep learning is effective to learn features/representations from data, the distributions of samples in feature spaces learned by various architectures for different training tasks (e.g., latent layers of AEs and feature vectors in CNN classifiers) have not been well-studied or compared. We hypothesize that the feature spaces of networks trained by various architectures (AEs or CNNs) and tasks (supervised, unsupervised, or self-supervised learning) share some common subspaces, no matter what types of DNN architectures or whether the labels have been used in feature learning. To test our hypothesis, through Singular Value Decomposition (SVD) of feature vectors, we demonstrate that one could linearly project the feature vectors of the same group of samples to a similar distribution, where the distribution is represented as the top left singular vector (i.e., principal subspace of feature vectors), namely $\\mathcal{P}$-vectors. We further assess the convergence of feature space learning using angles between $\\mathcal{P}$-vectors obtained from the well-trained model and its checkpoint per epoch during the learning procedure, where a quasi-monotonic trend of convergence to small angles has been observed. Finally, we carry out case studies to connect $\\mathcal{P}$-vectors to the data distribution, and generalization performance. Extensive experiments with practically-used MLP, AE and CNN architectures for classification, image reconstruction, and self-supervised learning tasks on MNIST, CIFAR-10 and CIFAR-100 datasets have been done to support our claims with solid evidences.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Haoran Liu;Haoyi Xiong;Yaqing Wang;Haozhe An;Dongrui Wu;Dejing Dou", "authorids": "~Haoran_Liu2;~Haoyi_Xiong1;~Yaqing_Wang2;~Haozhe_An1;~Dongrui_Wu1;~Dejing_Dou1", "gender": "M;F;;M;M;", "homepage": "https://sites.google.com/site/haoyixiongshomepage/;http://www.cse.ust.hk/~ywangcy/;https://haozhe-an.github.io;https://sites.google.com/site/drwuhust/home;https://ix.cs.uoregon.edu/~dou/;", "dblp": "06/2700;147/1393-2;263/7358;;26/2854.html;", "google_scholar": "f_Kcie0AAAAJ;https://scholar.google.com/citations?hl=en;;UYGzCPEAAAAJ;qBHsQ04AAAAJ;", "orcid": ";0000-0003-1457-1114;;0000-0002-7153-9703;;", "linkedin": ";;;;;%E6%B5%A9%E7%84%B6-%E5%88%98-3096bbb5/", "or_profile": "~Haoyi_Xiong1;~Yaqing_Wang2;~Haozhe_An1;~Dongrui_Wu1;~Dejing_Dou4;~Haoran_Liu3", "aff": "Baidu;Baidu Research;Apple;Huazhong University of Science and Technology;University of Oregon;", "aff_domain": "baidu.com;baidu.com;apple.com;hust.edu.cn;uoregon.edu;", "position": "Principal Researcher;Researcher;Intern;Full Professor;Full Professor;", "bibtex": "@misc{\nliu2021empirical,\ntitle={Empirical Studies on the Convergence of Feature Spaces in Deep Learning},\nauthor={Haoran Liu and Haoyi Xiong and Yaqing Wang and Haozhe An and Dongrui Wu and Dejing Dou},\nyear={2021},\nurl={https://openreview.net/forum?id=NUCZeoVlAe}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=NUCZeoVlAe", "pdf_size": 0, "rating": "3;5;6", "confidence": "4;3;4", "wc_review": "549;389;379", "wc_reply_reviewers": "250;0;0", "wc_reply_authors": "1270;1138;437", "reply_reviewers": "1;0;0", "reply_authors": "3;2;1", "rating_avg": [ 4.666666666666667, 1.247219128924647 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 439.0, 77.88880963698615 ], "wc_reply_reviewers_avg": [ 83.33333333333333, 117.85113019775793 ], "wc_reply_authors_avg": [ 948.3333333333334, 365.5610604111013 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.0, 0.816496580927726 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.18898223650461363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:351umnJgfGUJ:scholar.google.com/&scioq=Empirical+Studies+on+the+Convergence+of+Feature+Spaces+in+Deep+Learning&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;1;2;3", "aff_unique_norm": "Baidu;Apple;Huazhong University of Science and Technology;University of Oregon", "aff_unique_dep": "Baidu, Inc.;Apple Inc.;;", "aff_unique_url": "https://www.baidu.com;https://www.apple.com;http://www.hust.edu.cn;https://www.uoregon.edu", "aff_unique_abbr": "Baidu;Apple;HUST;UO", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0;1", "aff_country_unique": "China;United States" }, { "title": "Learning Value Functions in Deep Policy Gradients using Residual Variance", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3319", "id": "NX1He-aFO_F", "poster": "", "openreview": "https://openreview.net/forum?id=NX1He-aFO_F", "slides": "https://iclr.cc/virtual/2021/poster/3319", "video": "https://iclr.cc/virtual/2021/poster/3319", "author_site": "Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, philippe preux", "tldr": "", "abstract": "Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing a different approach for training the critic in the actor-critic framework. Our work builds on recent studies indicating that traditional actor-critic algorithms do not succeed in fitting the true value function, calling for the need to identify a better objective for the critic. In our method, the critic uses a new state-value (resp. state-action-value) function approximation that learns the value of the states (resp. state-action pairs) relative to their mean value rather than the absolute value as in conventional actor-critic. We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms. Furthermore, we validate our method in tasks with sparse rewards, where we provide experimental evidence and theoretical insights.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Yannis Flet-Berliac;reda ouhamma;odalric-ambrym maillard;Philippe Preux", "authorids": "~Yannis_Flet-Berliac1;~reda_ouhamma1;~odalric-ambrym_maillard1;~Philippe_Preux1", "gender": ";M;;M", "homepage": "https://ynns.io/;https://redaouhamma.github.io/;http://odalricambrymmaillard.neowordpress.fr/;https://philippe-preux.codeberg.page", "dblp": "239/5247;276/1574;83/7401;16/4835", "google_scholar": "https://scholar.google.fr/citations?user=qclRKHoAAAAJ;DYe2NmQAAAAJ;https://scholar.google.fr/citations?hl=fr;JTXxmeAAAAAJ", "orcid": ";;;0000-0002-2067-2838", "linkedin": ";reda-ouhamma/;;", "or_profile": "~Yannis_Flet-Berliac1;~reda_ouhamma1;~odalric-ambrym_maillard1;~Philippe_Preux1", "aff": "INRIA (SequeL team);Universit\u00e9 de Lille;inria;Universit\u00e9 de Lille", "aff_domain": "inria.fr;univ-lille.fr;inria.fr;univ-lille.fr", "position": "PhD student;PhD student;Assistant Professor;Full Professor", "bibtex": "@inproceedings{\nflet-berliac2021learning,\ntitle={Learning Value Functions in Deep Policy Gradients using Residual Variance},\nauthor={Yannis Flet-Berliac and reda ouhamma and odalric-ambrym maillard and Philippe Preux},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NX1He-aFO_F}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "5;7;8", "confidence": "3;4;3", "wc_review": "271;510;469", "wc_reply_reviewers": "101;169;391", "wc_reply_authors": "544;795;301", "reply_reviewers": "1;1;1", "reply_authors": "2;2;2", "rating_avg": [ 6.666666666666667, 1.247219128924647 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 416.6666666666667, 104.35303328392307 ], "wc_reply_reviewers_avg": [ 220.33333333333334, 123.8314266340424 ], "wc_reply_authors_avg": [ 546.6666666666666, 201.68347037430266 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.18898223650461363, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5423556606747540395&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "pdf": "https://openreview.net/pdf?id=NX1He-aFO_F", "email": "inria.fr;univ-lille.fr;inria.fr;univ-lille.fr", "author_num": 4, "aff_unique_index": "0;1;0;1", "aff_unique_norm": "INRIA;Universit\u00e9 de Lille", "aff_unique_dep": "SequeL team;", "aff_unique_url": "https://www.inria.fr;https://www.univ-lille.fr", "aff_unique_abbr": "INRIA;UdeL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "France" }, { "id": "NYLvNv8q4i", "title": "Intragroup sparsity for efficient inference", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "This work studies intragroup sparsity, a fine-grained structural constraint on network weight parameters. It eliminates the computational inefficiency of fine-grained sparsity due to irregular dataflow, while at the same time achieving high inference accuracy. We present theoretical analysis on how weight group sizes affect sparsification error, and on how the performance of pruned networks changes with sparsity level. Further, we analyze inference-time I/O cost of two different strategies for achieving intragroup sparsity and how the choice of strategies affect I/O cost under mild assumptions on accelerator architecture. Moreover, we present a novel training algorithm that yield models of improved accuracies over the standard training approach under the intragroup sparsity constraint.", "keywords": "Deep Learning;Model compression;Neural Network Pruning;High Performance Computation", "primary_area": "", "supplementary_material": "", "author": "Zilin Yu;Chao Wang;Xin Wang;Yong Zhao;Xundong Wu", "authorids": "~Zilin_Yu1;~Chao_Wang1;~Xin_Wang2;~Yong_Zhao4;~Xundong_Wu3", "gender": "M;M;M;M;M", "homepage": "https://scholar.google.com/schhp?hl=zh-CN;;;http://www.ece.pku.edu.cn/2014/wdz_gtdz_0415/39.html;", "dblp": ";188/7759;;;164/5876", "google_scholar": "https://scholar.google.com/schhp?hl=zh-CN;;8mICcqAAAAAJ;;", "orcid": ";;0000-0003-3270-3782;;0000-0002-6643-4384", "linkedin": ";;neuromorphic/;;", "or_profile": "~Zilin_Yu1;~Chao_Wang1;~Xin_Wang2;~Yong_Zhao4;~xundong_wu1", "aff": "Baidu;;Cerebras Systems, Inc;Peking University Shenzhen Graduate School (PKU Shenzhen) ;Hangzhou Dianzi University", "aff_domain": "baidu.com;;cerebras.net;pkusz.edu.cn;hdu.edu.cn", "position": "researcher and developer;;Principal Researcher;Associate Professor;Associate Investigator", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=NYLvNv8q4i", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "5;4;3;3", "wc_review": "682;183;312;208", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 346.25, 199.79035887649835 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8181818181818182, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13177210562734569450&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Baidu;Cerebras Systems;Peking University;Hangzhou Dianzi University", "aff_unique_dep": "Baidu, Inc.;;;", "aff_unique_url": "https://www.baidu.com;https://www.cerebras.com;http://www.pku.edu.cn;http://www.hdu.edu.cn/", "aff_unique_abbr": "Baidu;Cerebras;PKU;HGHDU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Shenzhen", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "China;United States" }, { "id": "NZj7TnMr01", "title": "Improving Neural Network Accuracy and Calibration Under Distributional Shift with Prior Augmented Data", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. However, neural networks are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. The problem of overconfidence becomes especially apparent in cases where the test-time data distribution differs from that which was seen during training. We propose a solution to this problem by seeking out regions in arbitrary feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the Bayesian prior on the distribution of the labels. Our method results in a better calibrated network and is agnostic to the underlying model structure, so it can be applied to any neural network which produces a probability density as an output. We demonstrate the effectiveness of our method and validate its performance on both classification and regression problems by applying it to the training of recent state-of-the-art neural network models.", "keywords": "Bayesian;Calibration", "primary_area": "", "supplementary_material": "/attachment/4ff284695f95b07a5212f23d135e1bcd016cd892.zip", "author": "Jeffrey Ryan Willette;Juho Lee;Sung Ju Hwang", "authorids": "~Jeffrey_Ryan_Willette1;~Juho_Lee2;~Sung_Ju_Hwang1", "gender": "M;M;", "homepage": "https://jeffwillette.github.io;https://juho.lee.github.io;", "dblp": "286/0937;55/3410-1;", "google_scholar": "https://scholar.google.com/citations?hl=en;Py4URJUAAAAJ;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Jeffrey_Ryan_Willette1;~Juho_Lee2;~Sung_Ju_Hwang1", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;", "aff_domain": "kaist.ac.kr;kaist.ac.kr;", "position": "Student;Assistant Professor;", "bibtex": "@misc{\nwillette2021improving,\ntitle={Improving Neural Network Accuracy and Calibration Under Distributional Shift with Prior Augmented Data },\nauthor={Jeffrey Ryan Willette and Juho Lee and Sung Ju Hwang},\nyear={2021},\nurl={https://openreview.net/forum?id=NZj7TnMr01}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=NZj7TnMr01", "pdf_size": 0, "rating": "3;5;6;6", "confidence": "4;3;3;3", "wc_review": "610;618;431;1351", "wc_reply_reviewers": "0;57;69;76", "wc_reply_authors": "1764;1479;1228;1020", "reply_reviewers": "0;1;1;1", "reply_authors": "4;4;3;3", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 752.5, 353.5396017421528 ], "wc_reply_reviewers_avg": [ 50.5, 29.937434759845406 ], "wc_reply_authors_avg": [ 1372.75, 278.2762790824974 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 3.5, 0.5 ], "replies_avg": [ 27, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9428090415820632, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:08JIWtEUs0UJ:scholar.google.com/&scioq=Improving+Neural+Network+Accuracy+and+Calibration+Under+Distributional+Shift+with+Prior+Augmented+Data&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "title": "Representation Learning for Sequence Data with Deep Autoencoding Predictive Components", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2825", "id": "Naqw7EHIfrv", "poster": "", "openreview": "https://openreview.net/forum?id=Naqw7EHIfrv", "slides": "https://iclr.cc/virtual/2021/poster/2825", "video": "https://iclr.cc/virtual/2021/poster/2825", "author_site": "Junwen Bai, Weiran Wang, Yingbo Zhou, Caiming Xiong", "tldr": "", "abstract": "We propose Deep Autoencoding Predictive Components (DAPC) -- a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space. We encourage this latent structure by maximizing an estimate of \\emph{predictive information} of latent feature sequences, which is the mutual information between the past and future windows at each time step. In contrast to the mutual information lower bound commonly used by contrastive learning, the estimate of predictive information we adopt is exact under a Gaussian assumption. Additionally, it can be computed without negative sampling. To reduce the degeneracy of the latent space extracted by powerful encoders and keep useful information from the inputs, we regularize predictive information learning with a challenging masked reconstruction loss. We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.", "keywords": "Mutual Information;Unsupervised Learning;Sequence Data;Masked Reconstruction", "primary_area": "", "supplementary_material": "", "author": "Junwen Bai;Weiran Wang;Yingbo Zhou;Caiming Xiong", "authorids": "~Junwen_Bai1;~Weiran_Wang1;~Yingbo_Zhou1;~Caiming_Xiong1", "gender": "M;M;;M", "homepage": "http://www.cs.cornell.edu/~junwen/;https://sites.google.com/corp/ttic.edu/weiranwang/home;;http://cmxiong.com/", "dblp": "188/6479;;72/8614;80/7282", "google_scholar": "JD7wLV4AAAAJ;O9djN1AAAAAJ;H_6RQ7oAAAAJ;vaSdahkAAAAJ", "orcid": "0000-0001-7939-4927;;;", "linkedin": "junwen-bai-7ba354155/;weiran-wang-12ab8b16b/;yingbozhou/;caiming-xiong-150a1417", "or_profile": "~Junwen_Bai1;~Weiran_Wang1;~Yingbo_Zhou1;~Caiming_Xiong1", "aff": "Cornell University;Google;Salesforce Research;Salesforce Research", "aff_domain": "cornell.edu;google.com;salesforce.com;salesforce.com", "position": "PhD student;Researcher;Research Scientist;Research Scientist", "bibtex": "@inproceedings{\nbai2021representation,\ntitle={Representation Learning for Sequence Data with Deep Autoencoding Predictive Components},\nauthor={Junwen Bai and Weiran Wang and Yingbo Zhou and Caiming Xiong},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Naqw7EHIfrv}\n}", "github": "[![github](/images/github_icon.svg) JunwenBai/DAPC](https://github.com/JunwenBai/DAPC) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=Naqw7EHIfrv)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "3;4;4;3", "wc_review": "465;406;340;244", "wc_reply_reviewers": "303;0;0;0", "wc_reply_authors": "711;353;272;220", "reply_reviewers": "2;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 363.75, 82.06818811208153 ], "wc_reply_reviewers_avg": [ 75.75, 131.20284867334246 ], "wc_reply_authors_avg": [ 389.0, 191.85280816292473 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 20, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14928540791785699734&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=Naqw7EHIfrv", "email": "cornell.edu;google.com;salesforce.com;salesforce.com", "author_num": 4, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "Cornell University;Google;Salesforce", "aff_unique_dep": ";Google;Salesforce Research", "aff_unique_url": "https://www.cornell.edu;https://www.google.com;https://research.salesforce.com", "aff_unique_abbr": "Cornell;Google;Salesforce", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Extracting Strong Policies for Robotics Tasks from Zero-Order Trajectory Optimizers", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3209", "id": "Nc3TJqbcl3", "poster": "", "openreview": "https://openreview.net/forum?id=Nc3TJqbcl3", "slides": "https://iclr.cc/virtual/2021/poster/3209", "video": "https://iclr.cc/virtual/2021/poster/3209", "author_site": "Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Georg Martius", "tldr": "", "abstract": "Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping", "keywords": "reinforcement learning;zero-order optimization;policy learning;model-based learning;robotics;model predictive control", "primary_area": "", "supplementary_material": "", "author": "Cristina Pinneri;Shambhuraj Sawant;Sebastian Blaes;Georg Martius", "authorids": "~Cristina_Pinneri1;shambhuraj.sawant@tuebingen.mpg.de;sebastian.blaes@tuebingen.mpg.de;~Georg_Martius1", "gender": "F;;;M", "homepage": "https://www.is.mpg.de/person/cpinneri;;;https://uni-tuebingen.de/de/264672", "dblp": ";;;47/2706", "google_scholar": ";;;https://scholar.google.de/citations?user=b-JF-UIAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Cristina_Pinneri1;shambhuraj.sawant@tuebingen.mpg.de;sebastian.blaes@tuebingen.mpg.de;~Georg_Martius1", "aff": "Swiss Federal Institute of Technology;;;Max Planck Institute for Intelligent Systems", "aff_domain": "ethz.ch;;;tuebingen.mpg.de", "position": "PhD student;;;Assistant Professor", "bibtex": "@inproceedings{\npinneri2021extracting,\ntitle={Extracting Strong Policies for Robotics Tasks from Zero-Order Trajectory Optimizers},\nauthor={Cristina Pinneri and Shambhuraj Sawant and Sebastian Blaes and Georg Martius},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Nc3TJqbcl3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;3;3;2", "wc_review": "720;295;854;182", "wc_reply_reviewers": "47;0;0;0", "wc_reply_authors": "640;505;431;318", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 2.75, 0.4330127018922193 ], "wc_review_avg": [ 512.75, 281.16487600694364 ], "wc_reply_reviewers_avg": [ 11.75, 20.351596988934308 ], "wc_reply_authors_avg": [ 473.5, 116.94122455319167 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 13, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17139202965750801158&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=Nc3TJqbcl3", "email": "ethz.ch;;;tuebingen.mpg.de", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Swiss Federal Institute of Technology;Max Planck Institute for Intelligent Systems", "aff_unique_dep": ";Intelligent Systems", "aff_unique_url": "https://www.ethz.ch;https://www.mpi-is.mpg.de", "aff_unique_abbr": "ETH Zurich;MPI-IS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Switzerland;Germany" }, { "title": "Shape or Texture: Understanding Discriminative Features in CNNs", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3180", "id": "NcFEZOi-rLa", "poster": "", "openreview": "https://openreview.net/forum?id=NcFEZOi-rLa", "slides": "https://iclr.cc/virtual/2021/poster/3180", "video": "https://iclr.cc/virtual/2021/poster/3180", "author_site": "Md Amirul Islam, Matthew Kowal, Patrick Esser, Sen Jia, Bj\u00f6rn Ommer, Kosta Derpanis, Neil Bruce", "tldr": "", "abstract": "Contrasting the previous evidence that neurons in the later layers of a Convolutional Neural Network (CNN) respond to complex object shapes, recent studies have shown that CNNs actually exhibit a 'texture bias': given an image with both texture and shape cues (e.g., a stylized image), a CNN is biased towards predicting the category corresponding to the texture. However, these previous studies conduct experiments on the final classification output of the network, and fail to robustly evaluate the bias contained (i) in the latent representations, and (ii) on a per-pixel level. In this paper, we design a series of experiments that overcome these issues. We do this with the goal of better understanding what type of shape information contained in the network is discriminative, where shape information is encoded, as well as when the network learns about object shape during training. We show that a network learns the majority of overall shape information at the first few epochs of training and that this information is largely encoded in the last few layers of a CNN. Finally, we show that the encoding of shape does not imply the encoding of localized per-pixel semantic information. The experimental results and findings provide a more accurate understanding of the behaviour of current CNNs, thus helping to inform future design choices.", "keywords": "Shape;Texture;Shape Bias;Texture Bias;Shape Encoding;Mutual Information", "primary_area": "", "supplementary_material": "", "author": "Md Amirul Islam;Matthew Kowal;Patrick Esser;Sen Jia;Bj\u00f6rn Ommer;Konstantinos G. Derpanis;Neil Bruce", "authorids": "~Md_Amirul_Islam1;~Matthew_Kowal1;~Patrick_Esser1;~Sen_Jia1;~Bj\u00f6rn_Ommer2;~Konstantinos_G._Derpanis1;~Neil_Bruce1", "gender": "M;M;M;;M;;M", "homepage": "http://www.scs.ryerson.ca/~amirul/;https://mkowal2.github.io/;;;http://socs.uoguelph.ca/~brucen/;https://ommer-lab.com/people/ommer/;https://csprofkgd.github.io/", "dblp": ";247/6389;184/1547;35/3232;https://dblp.uni-trier.de/pers/hd/b/Bruce:Neil_D=_B=;11/4098;39/253", "google_scholar": "https://scholar.google.ca/citations?user=AeibrqUAAAAJ;FCg8QxUAAAAJ;;;Gnezf-4AAAAJ;zWbvIUcAAAAJ;https://scholar.google.ca/citations?user=3Br8x_gAAAAJ", "orcid": ";;;;0000-0002-5710-1107;;", "linkedin": ";mkowal2/;;;;;", "or_profile": "~Md_Amirul_Islam1;~Matthew_Kowal1;~Patrick_Esser1;~Sen_Jia1;~Neil_Bruce1;~Bjorn_Ommer1;~Kosta_Derpanis1", "aff": "Ryerson University;York University;Heidelberg University;University of Waterloo;University of Guelph;Ruprecht-Karls-Universit\u00e4t Heidelberg;Samsung", "aff_domain": "ryerson.ca;yorku.ca;uni-heidelberg.de;uwaterloo.ca;uoguelph.ca;uni-heidelberg.de;samsung.com", "position": "PhD student;PhD student;PhD student;Postdoc;Associate Professor;Full Professor;Researcher", "bibtex": "@inproceedings{\nislam2021shape,\ntitle={Shape or Texture: Understanding Discriminative Features in {\\{}CNN{\\}}s},\nauthor={Md Amirul Islam and Matthew Kowal and Patrick Esser and Sen Jia and Bj{\\\"o}rn Ommer and Konstantinos G. Derpanis and Neil Bruce},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NcFEZOi-rLa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "4;4;7;8", "confidence": "4;4;4;3", "wc_review": "427;416;643;364", "wc_reply_reviewers": "0;282;159;99", "wc_reply_authors": "857;1693;1206;411", "reply_reviewers": "0;1;1;1", "reply_authors": "2;3;3;3", "rating_avg": [ 5.75, 1.7853571071357126 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 462.5, 106.89363872560425 ], "wc_reply_reviewers_avg": [ 135.0, 102.11023455070506 ], "wc_reply_authors_avg": [ 1041.75, 469.86241337225516 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.75, 0.4330127018922193 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.7276068751089989, "gs_citation": 89, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8107758455919475865&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=NcFEZOi-rLa", "email": "ryerson.ca;yorku.ca;uni-heidelberg.de;uwaterloo.ca;uoguelph.ca;uni-heidelberg.de;samsung.com", "author_num": 7, "aff_unique_index": "0;1;2;3;4;5;6", "aff_unique_norm": "Ryerson University;York University;Heidelberg University;University of Waterloo;University of Guelph;Ruprecht-Karls-Universit\u00e4t Heidelberg;Samsung", "aff_unique_dep": ";;;;;;Samsung", "aff_unique_url": "https://www.ryerson.ca;https://www.yorku.ca;https://www.uni-heidelberg.de;https://uwaterloo.ca;https://www.uoguelph.ca;https://www.uni-heidelberg.de/;https://www.samsung.com", "aff_unique_abbr": "Ryerson;York U;Uni Heidelberg;UW;U of G;Uni Heidelberg;Samsung", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;0;0;1;2", "aff_country_unique": "Canada;Germany;South Korea" }, { "title": "On Self-Supervised Image Representations for GAN Evaluation", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3329", "id": "NeRdBeTionN", "poster": "", "openreview": "https://openreview.net/forum?id=NeRdBeTionN", "slides": "https://iclr.cc/virtual/2021/poster/3329", "video": "https://iclr.cc/virtual/2021/poster/3329", "author_site": "Stanislav Morozov, Andrey Voynov, Artem Babenko", "tldr": "", "abstract": "The embeddings from CNNs pretrained on Imagenet classification are de-facto standard image representations for assessing GANs via FID, Precision and Recall measures. Despite broad previous criticism of their usage for non-Imagenet domains, these embeddings are still the top choice in most of the GAN literature.\n\nIn this paper, we advocate the usage of the state-of-the-art self-supervised representations to evaluate GANs on the established non-Imagenet benchmarks. These representations, typically obtained via contrastive learning, are shown to provide better transfer to new tasks and domains, therefore, can serve as more universal embeddings of natural images. With extensive comparison of the recent GANs on the common datasets, we show that self-supervised representations produce a more reasonable ranking of models in terms of FID/Precision/Recall, while the ranking with classification-pretrained embeddings often can be misleading.", "keywords": "GAN;evaluation;embedding", "primary_area": "", "supplementary_material": "", "author": "Stanislav Morozov;Andrey Voynov;Artem Babenko", "authorids": "~Stanislav_Morozov1;~Andrey_Voynov1;~Artem_Babenko1", "gender": "M;M;M", "homepage": ";https://anvoynov.github.io/anvoynov/;", "dblp": "231/7636;255/6107;117/4834", "google_scholar": ";imBjSgUAAAAJ;q885d1wAAAAJ", "orcid": ";;0000-0002-1830-8252", "linkedin": ";;", "or_profile": "~Stanislav_Morozov1;~Andrey_Voynov1;~Artem_Babenko1", "aff": "Yandex;Yandex;Yandex", "aff_domain": "yandex-team.ru;yandex-team.ru;yandex-team.ru", "position": "Researcher;Researcher;Researcher", "bibtex": "@inproceedings{\nmorozov2021on,\ntitle={On Self-Supervised Image Representations for {\\{}GAN{\\}} Evaluation},\nauthor={Stanislav Morozov and Andrey Voynov and Artem Babenko},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NeRdBeTionN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "4;4;4;4", "wc_review": "354;310;438;317", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "317;302;230;215", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 354.75, 50.88897228280406 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 266.0, 44.141816908686486 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 48, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10324304924199306059&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "pdf": "https://openreview.net/pdf?id=NeRdBeTionN", "email": "yandex-team.ru;yandex-team.ru;yandex-team.ru", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Yandex", "aff_unique_dep": "", "aff_unique_url": "https://yandex.com", "aff_unique_abbr": "Yandex", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Russian Federation" }, { "id": "NfZ6g2OmXEk", "title": "Prioritized Level Replay", "track": "main", "status": "Reject", "tldr": "", "abstract": "Simulated environments with procedurally generated content have become popular benchmarks for testing systematic generalization of reinforcement learning agents. Every level in such an environment is algorithmically created, thereby exhibiting a unique configuration of underlying factors of variation, such as layout, positions of entities, asset appearances, or even the rules governing environment transitions. Fixed sets of training levels can be determined to aid comparison and reproducibility, and test levels can be held out to evaluate the generalization and robustness of agents. While prior work samples training levels in a direct way (e.g.~uniformly) for the agent to learn from, we investigate the hypothesis that different levels provide different learning progress for an agent at specific times during training. We introduce Prioritized Level Replay, a general framework for estimating the future learning potential of a level given the current state of the agent's policy. We find that temporal-difference (TD) errors, while previously used to selectively sample past transitions, also prove effective for scoring a level's future learning potential when the agent replays (that is, revisits) that level to generate entirely new episodes of experiences from it. We report significantly improved sample-efficiency and generalization on the majority of Procgen Benchmark environments as well as two challenging MiniGrid environments. Lastly, we present a qualitative analysis showing that Prioritized Level Replay induces an implicit curriculum, taking the agent gradually from easier to harder levels.", "keywords": "Reinforcement Learning;Procedurally Generated Environments;Curriculum Learning;Procgen Benchmark", "primary_area": "", "supplementary_material": "", "author": "Minqi Jiang;Edward Grefenstette;Tim Rockt\u00e4schel", "authorids": "~Minqi_Jiang1;~Edward_Grefenstette1;~Tim_Rockt\u00e4schel1", "gender": "M;M;M", "homepage": "https://twitter.com/minqijiang;http://egrefen.com/;http://rockt.ai", "dblp": "270/7949;http://dblp.uni-trier.de/pers/hd/g/Grefenstette:Edward;43/11537", "google_scholar": ";https://scholar.google.co.uk/citations?user=ezllEwMAAAAJ;https://scholar.google.co.uk/citations?user=mWBY8aIAAAAJ", "orcid": ";;", "linkedin": "minqi-jiang-585a6536/;;rockt/", "or_profile": "~Minqi_Jiang1;~Edward_Grefenstette1;~Tim_Rocktaeschel1", "aff": "University College London;Meta Facebook;Department of Computer Science, University College London", "aff_domain": "ucl.ac.uk;fb.com;cs.ucl.ac.uk", "position": "PhD;Research Scientist;Assistant Professor", "bibtex": "@misc{\njiang2021prioritized,\ntitle={Prioritized Level Replay},\nauthor={Minqi Jiang and Edward Grefenstette and Tim Rockt{\\\"a}schel},\nyear={2021},\nurl={https://openreview.net/forum?id=NfZ6g2OmXEk}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=NfZ6g2OmXEk", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "3;4;3;3", "wc_review": "647;551;718;929", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "782;913;927;1183", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;2;2", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 711.25, 138.98628529462897 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 951.25, 145.26247794940028 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 193, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18011658212512846682&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;0", "aff_unique_norm": "University College London;Meta", "aff_unique_dep": ";Meta Platforms, Inc.", "aff_unique_url": "https://www.ucl.ac.uk;https://meta.com", "aff_unique_abbr": "UCL;Meta", "aff_campus_unique_index": "1", "aff_campus_unique": ";London", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United Kingdom;United States" }, { "id": "NgZKCRKaY3J", "title": "Mitigating bias in calibration error estimation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Building reliable machine learning systems requires that we correctly understand their level of confidence. Calibration focuses on measuring the degree of accuracy in a model's confidence and most research in calibration focuses on techniques to improve an empirical estimate of calibration error, $\\mathrm{ECE}_\\mathrm{BIN}$. Using simulation, we show that $\\mathrm{ECE}_\\mathrm{BIN}$ can systematically underestimate or overestimate the true calibration error depending on the nature of model miscalibration, the size of the evaluation data set, and the number of bins. Critically, $\\mathrm{ECE}_\\mathrm{BIN}$ is more strongly biased for perfectly calibrated models. We propose a simple alternative calibration error metric, $\\mathrm{ECE}_\\mathrm{SWEEP}$, in which the number of bins is chosen to be as large as possible while preserving monotonicity in the calibration function. Evaluating our measure on distributions fit to neural network confidence scores on CIFAR-10, CIFAR-100, and ImageNet, we show that $\\mathrm{ECE}_\\mathrm{SWEEP}$ produces a less biased estimator of calibration error and therefore should be used by any researcher wishing to evaluate the calibration of models trained on similar datasets.", "keywords": "calibration error;uncertainty estimation;statistical bias", "primary_area": "", "supplementary_material": "", "author": "Rebecca Roelofs;Nicholas Cain;Jonathon Shlens;Michael Curtis Mozer", "authorids": "~Rebecca_Roelofs1;~Nicholas_Cain1;~Jonathon_Shlens1;~Michael_Curtis_Mozer1", "gender": "F;;;M", "homepage": ";;;https://www.cs.colorado.edu/~mozer", "dblp": "145/2224;;;m/MichaelCMozer", "google_scholar": ";tqhLlfoAAAAJ;;lmjR_qMAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Rebecca_Roelofs1;~Nicholas_Cain1;~Jonathon_Shlens1;~Michael_Curtis_Mozer1", "aff": "Google;;;Google DeepMind", "aff_domain": "google.com;;;google.com", "position": "Research scientist;;;Research Scientist", "bibtex": "@misc{\nroelofs2021mitigating,\ntitle={Mitigating bias in calibration error estimation},\nauthor={Rebecca Roelofs and Nicholas Cain and Jonathon Shlens and Michael Curtis Mozer},\nyear={2021},\nurl={https://openreview.net/forum?id=NgZKCRKaY3J}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=NgZKCRKaY3J", "pdf_size": 0, "rating": "4;4;6;7", "confidence": "4;4;3;3", "wc_review": "225;264;769;1897", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "354;612;670;1567", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;5", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 788.75, 674.8675332982021 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 800.75, 458.1066333289663 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 1.6393596310755 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.9622504486493761, "gs_citation": 113, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6296264821294857539&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;United Kingdom" }, { "id": "NibHms070zC", "title": "Faster Federated Learning with Decaying Number of Local SGD Steps", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "In Federated Learning (FL), client devices collaboratively train a model without sharing the private data present on the devices. Federated Stochastic Gradient Descent (FedSGD) is a recent generalisation of the popular Federated Averaging algorithm. Recent works show that when client data is distributed heterogeneously, the loss function minimised by FedSGD differs from the 'true' loss that would be minimised by centralised training. Previous works propose decaying the client learning rate, $\\gamma$, to allow FedSGD to minimise the true loss. We propose instead decaying the number of local SGD steps, $K$, that clients perform during training rounds to allow minimisation of the true loss. Decaying $K$ has the added benefit of reducing the total computation that clients perform during FedSGD. Real-world applications of FL use large numbers of low-powered smartphone or Internet-of-Things clients, so reduction of computation would provide significant savings in terms of energy and time. In this work, we prove for quadratic objectives that annealing $K$ allows FedSGD to approach the true minimiser. We then perform thorough experimentation on three benchmark FL datasets to show that decaying $K$ can achieve the same generalisation performance as decaying $\\gamma$, but with up to $3.8\\times$ less total steps of SGD performed by clients.\n", "keywords": "Distributed Machine Learning;Federated Learning", "primary_area": "", "supplementary_material": "", "author": "Jed Mills;Jia Hu;Geyong Min", "authorids": "~Jed_Mills1;j.hu@exeter.ac.uk;g.min@exeter.ac.uk", "gender": "M;;", "homepage": ";;", "dblp": ";;", "google_scholar": ";;", "orcid": "0000-0001-6344-9364;;", "linkedin": ";;", "or_profile": "~Jed_Mills1;j.hu@exeter.ac.uk;g.min@exeter.ac.uk", "aff": "University of Exeter;;", "aff_domain": "exeter.ac.uk;;", "position": "PhD student;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=NibHms070zC", "pdf_size": 0, "rating": "4;4;5", "confidence": "5;4;4", "wc_review": "698;305;469", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 4.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 490.6666666666667, 161.17140634188877 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5213521870922916831&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "aff_unique_index": "0", "aff_unique_norm": "University of Exeter", "aff_unique_dep": "", "aff_unique_url": "https://www.exeter.ac.uk", "aff_unique_abbr": "Exeter", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "Nj8EIrSu5O", "title": "Divide-and-Conquer Monte Carlo Tree Search", "track": "main", "status": "Reject", "tldr": "", "abstract": "Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed.\nWe consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems.\nInstead of an environment transition model, we assume an imperfect, goal-directed policy.\nThis low-level policy can be improved by a plan, consisting of an appropriate sequence of sub-goals that guide it from the start to the goal state. We propose a planning algorithm, Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS), for approximating the optimal plan by means of proposing intermediate sub-goals which hierarchically partition the initial tasks into simpler ones that are then solved independently and recursively. The algorithm critically makes use of a learned sub-goal proposal for finding appropriate partitions trees of new tasks based on prior experience.\nDifferent strategies for learning sub-goal proposals give rise to different planning strategies that strictly generalize sequential planning.\nWe show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds as well as in challenging continuous control environments.", "keywords": "MCTS;planning;goal-directed planning;divide and conquer", "primary_area": "", "supplementary_material": "/attachment/f57b0478173347e9bf5863611c5f854e5d2ad77c.zip", "author": "Giambattista Parascandolo;Lars Holger Buesing;Josh Merel;Leonard Hasenclever;John Aslanides;Jessica B Hamrick;Nicolas Heess;Alexander Neitz;Theophane Weber", "authorids": "~Giambattista_Parascandolo1;~Lars_Holger_Buesing1;~Josh_Merel1;~Leonard_Hasenclever1;~John_Aslanides1;~Jessica_B_Hamrick1;~Nicolas_Heess1;~Alexander_Neitz1;~Theophane_Weber1", "gender": "M;;M;M;F;;;M;M", "homepage": ";;;;http://www.jesshamrick.com;;;http://www.thphn.com/;https://sites.google.com/view/giambattista-parascandolo/home", "dblp": "https://dblp.uni-trier.de/pers/hd/b/Buesing:Lars;139/1361;150/1667;198/1386;155/1885;76/9181;180/8340;;179/2714", "google_scholar": "1h_mxPMAAAAJ;https://scholar.google.co.uk/citations?user=K4OcFXUAAAAJ;https://scholar.google.co.uk/citations?user=dD-3S4QAAAAJ;;2ylcZSsAAAAJ;79k7bGEAAAAJ;;LZxqcX4AAAAJ;https://scholar.google.it/citations?user=1zCDX_UAAAAJ", "orcid": ";;;;;;;;", "linkedin": ";;;;;;;;", "or_profile": "~Lars_Holger_Buesing1;~Josh_Merel1;~Leonard_Hasenclever1;~John_Aslanides1;~Jessica_B_Hamrick1;~Nicolas_Heess1;~Alexander_Neitz1;~Theophane_Weber1;~Giambattista_Parascandolo2", "aff": "Deepmind;Google DeepMind;Google DeepMind;Google DeepMind;Google DeepMind;Google DeepMind;Max Planck Institute for Intelligent Systems, Max-Planck Institute;;Department of Computer Science, ETHZ - ETH Zurich", "aff_domain": "google.com;google.com;google.com;google.com;google.com;google.com;tuebingen.mpg.de;;inf.ethz.ch", "position": "Postdoc;Research Scientist;Research Scientist;Research Engineer;Research Scientist;Research Scientist;PhD student;;PhD student", "bibtex": "@misc{\nparascandolo2021divideandconquer,\ntitle={Divide-and-Conquer Monte Carlo Tree Search},\nauthor={Giambattista Parascandolo and Lars Holger Buesing and Josh Merel and Leonard Hasenclever and John Aslanides and Jessica B Hamrick and Nicolas Heess and Alexander Neitz and Theophane Weber},\nyear={2021},\nurl={https://openreview.net/forum?id=Nj8EIrSu5O}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=Nj8EIrSu5O", "pdf_size": 0, "rating": "5;5;7;8", "confidence": "4;3;3;4", "wc_review": "474;647;1370;745", "wc_reply_reviewers": "0;0;0;101", "wc_reply_authors": "944;270;861;342", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;2;2", "rating_avg": [ 6.25, 1.299038105676658 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 809.0, 338.11462553400435 ], "wc_reply_reviewers_avg": [ 25.25, 43.73428289111415 ], "wc_reply_authors_avg": [ 604.25, 300.76932606234965 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.19245008972987526, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9929482154442259035&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1;1;1;1;2;3", "aff_unique_norm": "DeepMind;Google;Max Planck Institute for Intelligent Systems;ETH Zurich", "aff_unique_dep": ";Google DeepMind;Intelligent Systems;Department of Computer Science", "aff_unique_url": "https://deepmind.com;https://deepmind.com;https://www.mpi-is.mpg.de;https://www.ethz.ch", "aff_unique_abbr": "DeepMind;DeepMind;MPI-IS;ETHZ", "aff_campus_unique_index": "1", "aff_campus_unique": ";Zurich", "aff_country_unique_index": "0;0;0;0;0;0;1;2", "aff_country_unique": "United Kingdom;Germany;Switzerland" }, { "title": "Learning the Pareto Front with Hypernetworks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2593", "id": "NjF772F4ZZR", "poster": "", "openreview": "https://openreview.net/forum?id=NjF772F4ZZR", "slides": "https://iclr.cc/virtual/2021/poster/2593", "video": "https://iclr.cc/virtual/2021/poster/2593", "author_site": "Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik", "tldr": "", "abstract": "Multi-objective optimization (MOO) problems are prevalent in machine learning. These problems have a set of optimal solutions, called the Pareto front, where each point on the front represents a different trade-off between possibly conflicting objectives. Recent MOO methods can target a specific desired ray in loss space however, most approaches still face two grave limitations: (i) A separate model has to be trained for each point on the front; and (ii) The exact trade-off must be known before the optimization process. Here, we tackle the problem of learning the entire Pareto front, with the capability of selecting a desired operating point on the front after training. We call this new setup Pareto-Front Learning (PFL).\n\nWe describe an approach to PFL implemented using HyperNetworks, which we term Pareto HyperNetworks (PHNs). PHN learns the entire Pareto front simultaneously using a single hypernetwork, which receives as input a desired preference vector and returns a Pareto-optimal model whose loss vector is in the desired ray. The unified model is runtime efficient compared to training multiple models and generalizes to new operating points not used during training. We evaluate our method on a wide set of problems, from multi-task regression and classification to fairness. PHNs learn the entire Pareto front at roughly the same time as learning a single point on the front and at the same time reach a better solution set. PFL opens the door to new applications where models are selected based on preferences that are only available at run time.", "keywords": "Multi-objective optimization;multi-task learning", "primary_area": "", "supplementary_material": "", "author": "Aviv Navon;Aviv Shamsian;Ethan Fetaya;Gal Chechik", "authorids": "~Aviv_Navon1;~Aviv_Shamsian1;~Ethan_Fetaya1;~Gal_Chechik1", "gender": "M;M;M;", "homepage": "https://avivnavon.github.io/;;http://www.cs.toronto.edu/~ethanf/;https://chechiklab.biu.ac.il/~gal/", "dblp": "269/9785;261/9492;01/10046;c/GalChechik", "google_scholar": "https://scholar.google.co.il/citations?user=N-sME4wAAAAJ;;zLuqh-0AAAAJ;Wk2gAZUAAAAJ", "orcid": ";;0000-0003-3125-1665;0000-0001-9164-5303", "linkedin": ";aviv-shamsian/;;", "or_profile": "~Aviv_Navon1;~Aviv_Shamsian1;~Ethan_Fetaya1;~Gal_Chechik1", "aff": "Bar Ilan University, Israel;Bar Ilan University;Bar Ilan University;NVIDIA", "aff_domain": "biu.ac.il;biu.ac.il;biu.ac.il;nvidia.com", "position": "PhD student;MS student;Assistant Professor;Principal Researcher", "bibtex": "@inproceedings{\nnavon2021learning,\ntitle={Learning the Pareto Front with Hypernetworks},\nauthor={Aviv Navon and Aviv Shamsian and Ethan Fetaya and Gal Chechik},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NjF772F4ZZR}\n}", "github": "[![github](/images/github_icon.svg) AvivNavon/pareto-hypernetworks](https://github.com/AvivNavon/pareto-hypernetworks)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;3;3;4", "wc_review": "262;272;678;428", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "124;292;923;276", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;2;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 410.0, 168.14874367654372 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 403.75, 306.87487270873123 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 162, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13675122104724715473&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=NjF772F4ZZR", "email": "biu.ac.il;biu.ac.il;biu.ac.il;nvidia.com", "author_num": 4, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Bar-Ilan University;NVIDIA", "aff_unique_dep": ";NVIDIA Corporation", "aff_unique_url": "https://www.biu.ac.il;https://www.nvidia.com", "aff_unique_abbr": "BIU;NVIDIA", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "Israel;United States" }, { "id": "NjpEx8XzDvm", "title": "Gradient penalty from a maximum margin perspective", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "A popular heuristic for improved performance in Generative adversarial networks (GANs) is to use some form of gradient penalty on the discriminator. This gradient penalty was originally motivated by a Wasserstein distance formulation. However, the use of gradient penalty in other GAN formulations is not well motivated. We present a unifying framework of expected margin maximization and show that a wide range of gradient-penalized GANs (e.g., Wasserstein, Standard, Least-Squares, and Hinge GANs) can be derived from this framework. Our results imply that employing gradient penalties induces a large-margin classifier (thus, a large-margin discriminator in GANs). We describe how expected margin maximization helps reduce vanishing gradients at fake (generated) samples, a known problem in GANs. From this framework, we derive a new $L^\\infty$ gradient norm penalty with Hinge loss which generally produces equally good (or better) generated output in GANs than $L^2$-norm penalties (based on the Fr\u00e9chet Inception Distance).", "keywords": "GAN;large margin;SVM", "primary_area": "", "supplementary_material": "/attachment/086c072de48cd241f0b26c22069251fe8e97406d.zip", "author": "Alexia Jolicoeur-Martineau;Ioannis Mitliagkas", "authorids": "~Alexia_Jolicoeur-Martineau1;~Ioannis_Mitliagkas1", "gender": "F;M", "homepage": "https://ajolicoeur.wordpress.com;http://mitliagkas.github.io/", "dblp": "223/4753;83/8757", "google_scholar": "0qytQ1oAAAAJ;K757SxgAAAAJ", "orcid": "0000-0003-2169-4008;", "linkedin": ";", "or_profile": "~Alexia_Jolicoeur-Martineau1;~Ioannis_Mitliagkas1", "aff": "University of Montreal;University of Montreal", "aff_domain": "umontreal.ca;umontreal.ca", "position": "PhD student;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=NjpEx8XzDvm", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;3;2;4", "wc_review": "421;404;357;313", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 373.75, 42.18634257671551 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 15, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15160229357133669426&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0", "aff_unique_norm": "University of Montreal", "aff_unique_dep": "", "aff_unique_url": "https://wwwumontreal.ca", "aff_unique_abbr": "UM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Canada" }, { "id": "NlrFDOgRRH", "title": "Distributed Associative Memory Network with Association Reinforcing Loss", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite recent progress in memory augmented neural network research, associative memory networks with a single external memory still show limited performance on complex relational reasoning tasks. The main reason for this problem comes from the lossy representation of a content-based addressing memory and its insufficient associating performance for long temporal sequence data. To address these problems, here we introduce a novel Distributed Associative Memory architecture (DAM) with Association Reinforcing Loss (ARL) function which enhances the relation reasoning performance of memory augmented neural network. In this framework, instead of relying on a single large external memory, we form a set of multiple smaller associative memory blocks and update these sub-memory blocks simultaneously and independently with the content-based addressing mechanism. Based on DAM architecture, we can effectively retrieve complex relational information by integrating diverse representations distributed across multiple sub-memory blocks with an attention mechanism. Moreover, to further enhance the relation modeling performance of memory network, we propose ARL which assists a task's target objective while learning relational information exist in data. ARL enables the memory augmented neural network to reinforce an association between input data and task objective by reproducing stochastically sampled input data from stored memory contents. With this content reproducing task, it enriches the representations with relational information. In experiments, we apply our two main approaches to Differential Neural Computer (DNC), which is one of the representative content-based addressing memory model and achieves state-of-the-art performance on both memorization and relational reasoning tasks.", "keywords": "memory augmented neural network;distributed memory;memorization;relational reasoning", "primary_area": "", "supplementary_material": "/attachment/cbb104bc9e5435cc930b65cf482ca6e920c10c73.zip", "author": "Taewon Park;Inchul Choi;Minho Lee", "authorids": "~Taewon_Park1;~Inchul_Choi1;mholee@gmail.com", "gender": "M;M;", "homepage": ";;", "dblp": "82/10595;;", "google_scholar": "https://scholar.google.co.kr/citations?hl=ko;JUEWM6QAAAAJ;", "orcid": ";;", "linkedin": "taewon-park-755394169/;;", "or_profile": "~Taewon_Park1;~Inchul_Choi1;mholee@gmail.com", "aff": "Kyungpook National University;;", "aff_domain": "knu.ac.kr;;", "position": "MS student;;", "bibtex": "@misc{\npark2021distributed,\ntitle={Distributed Associative Memory Network with Association Reinforcing Loss},\nauthor={Taewon Park and Inchul Choi and Minho Lee},\nyear={2021},\nurl={https://openreview.net/forum?id=NlrFDOgRRH}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer5;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=NlrFDOgRRH", "pdf_size": 0, "rating": "4;5;5;6;8", "confidence": "4;5;3;4;4", "wc_review": "854;411;338;149;231", "wc_reply_reviewers": "156;0;0;0;0", "wc_reply_authors": "3001;1794;1702;359;294", "reply_reviewers": "2;0;0;0;0", "reply_authors": "6;4;4;1;1", "rating_avg": [ 5.6, 1.3564659966250536 ], "confidence_avg": [ 4.0, 0.6324555320336759 ], "wc_review_avg": [ 396.6, 245.5952768275481 ], "wc_reply_reviewers_avg": [ 31.2, 62.39999999999999 ], "wc_reply_authors_avg": [ 1430.0, 1011.1437088762408 ], "reply_reviewers_avg": [ 0.4, 0.8000000000000002 ], "reply_authors_avg": [ 3.2, 1.9390719429665315 ], "replies_avg": [ 25, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:lTcHSczQxbUJ:scholar.google.com/&scioq=Distributed+Associative+Memory+Network+with+Association+Reinforcing+Loss&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Kyungpook National University", "aff_unique_dep": "", "aff_unique_url": "https://www.knu.ac.kr", "aff_unique_abbr": "KNU", "aff_country_unique_index": "0", "aff_country_unique": "South Korea" }, { "title": "Improving Transformation Invariance in Contrastive Representation Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2855", "id": "NomEDgIEBwE", "poster": "", "openreview": "https://openreview.net/forum?id=NomEDgIEBwE", "slides": "https://iclr.cc/virtual/2021/poster/2855", "video": "https://iclr.cc/virtual/2021/poster/2855", "author_site": "Adam Foster, Rattana Pukdee, Tom Rainforth", "tldr": "", "abstract": "We propose methods to strengthen the invariance properties of representations obtained by contrastive learning. While existing approaches implicitly induce a degree of invariance as representations are learned, we look to more directly enforce invariance in the encoding process. To this end, we first introduce a training objective for contrastive learning that uses a novel regularizer to control how the representation changes under transformation. We show that representations trained with this objective perform better on downstream tasks and are more robust to the introduction of nuisance transformations at test time. Second, we propose a change to how test time representations are generated by introducing a feature averaging approach that combines encodings from multiple transformations of the original input, finding that this leads to across the board performance gains. Finally, we introduce the novel Spirograph dataset to explore our ideas in the context of a differentiable generative process with multiple downstream tasks, showing that our techniques for learning invariance are highly beneficial.", "keywords": "contrastive learning;representation learning;transformation invariance", "primary_area": "", "supplementary_material": "/attachment/7719fa440718a00944a83b0887d6e5d339e9cab5.zip", "author": "Adam Foster;Rattana Pukdee;Tom Rainforth", "authorids": "~Adam_Foster1;~Rattana_Pukdee1;~Tom_Rainforth1", "gender": "M;M;M", "homepage": "https://ae-foster.github.io;;http://www.robots.ox.ac.uk/~twgr", "dblp": "223/5765;;166/1198", "google_scholar": "1MsXZJ0AAAAJ;KhnQ8zoAAAAJ;https://scholar.google.co.uk/citations?user=ieLRNKMAAAAJ", "orcid": ";;", "linkedin": "adamefoster;rattana-pukdee/;", "or_profile": "~Adam_Foster1;~Rattana_Pukdee1;~Tom_Rainforth1", "aff": "University of Oxford;Carnegie Mellon University;", "aff_domain": "ox.ac.uk;andrew.cmu.edu;ox.ac.uk", "position": "PhD student;PhD student;Postdoc", "bibtex": "@inproceedings{\nfoster2021improving,\ntitle={Improving Transformation Invariance in Contrastive Representation Learning},\nauthor={Adam Foster and Rattana Pukdee and Tom Rainforth},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NomEDgIEBwE}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=NomEDgIEBwE)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7", "confidence": "4;4;4", "wc_review": "242;239;409", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "801;425;539", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 296.6666666666667, 79.44110327084393 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 588.3333333333334, 157.4152329209456 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 27, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11621005501488646335&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=NomEDgIEBwE", "email": "ox.ac.uk;andrew.cmu.edu;ox.ac.uk", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of Oxford;Carnegie Mellon University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://www.cmu.edu", "aff_unique_abbr": "Oxford;CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;United States" }, { "id": "Nq5zyAUD65", "title": "Smooth Activations and Reproducibility in Deep Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep networks are gradually penetrating almost every domain in our lives due to their amazing success. However, with substantive performance accuracy improvements comes the price of irreproducibility. Two identical models, trained on the exact same training dataset may exhibit large differences in predictions on individual examples even when average accuracy is similar, especially when trained on highly distributed parallel systems. The popular Rectified Linear Unit (ReLU) activation has been key to recent success of deep networks. We demonstrate, however, that ReLU is also a catalyzer to irreproducibility in deep networks. We show that not only can activations smoother than ReLU provide better accuracy, but they can also provide better accuracy-reproducibility tradeoffs. We propose a new family of activations; Smooth ReLU (SmeLU), designed to give such better tradeoffs, while also keeping the mathematical expression simple, and thus implementation cheap. SmeLU is monotonic, mimics ReLU, while providing continuous gradients, yielding better reproducibility. We generalize SmeLU to give even more flexibility and then demonstrate that SmeLU and its generalized form are special cases of a more general methodology of REctified Smooth Continuous Unit (RESCU) activations. \nEmpirical results demonstrate the superior accuracy-reproducibility tradeoffs with smooth activations, SmeLU in particular.", "keywords": "Deep networks;activation functions;reproducibility", "primary_area": "", "supplementary_material": "", "author": "Gil I Shamir;Dong Lin;Lorenzo Coviello", "authorids": "gshamir@google.com;dongl@google.com;~Lorenzo_Coviello1", "gender": ";;", "homepage": ";;", "dblp": ";;", "google_scholar": ";;VXLA6xkAAAAJ", "orcid": ";;", "linkedin": ";;lorenzocoviello", "or_profile": "gshamir@google.com;dongl@google.com;~Lorenzo_Coviello1", "aff": ";;", "aff_domain": ";;", "position": ";;", "bibtex": "@misc{\nshamir2021smooth,\ntitle={Smooth Activations and Reproducibility in Deep Networks},\nauthor={Gil I Shamir and Dong Lin and Lorenzo Coviello},\nyear={2021},\nurl={https://openreview.net/forum?id=Nq5zyAUD65}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=Nq5zyAUD65", "pdf_size": 0, "rating": "2;4;4;5", "confidence": "5;3;2;3", "wc_review": "2096;371;247;723", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1184;560;910;952", "reply_reviewers": "0;0;0;0", "reply_authors": "3;1;2;2", "rating_avg": [ 3.75, 1.0897247358851685 ], "confidence_avg": [ 3.25, 1.0897247358851685 ], "wc_review_avg": [ 859.25, 735.076994266587 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 901.5, 223.0756598107467 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.7894736842105263, "gs_citation": 18, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3530254283875980790&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3 }, { "id": "NqJw2sVJbC8", "title": "Adversarial Attacks on Machine Learning Systems for High-Frequency Trading", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Algorithmic trading systems are often completely automated, and deep learning is increasingly receiving attention in this domain. Nonetheless, little is known about the robustness properties of these models. We study valuation models for algorithmic trading from the perspective of adversarial machine learning. We introduce new attacks specific to this domain with size constraints that minimize attack costs. We further discuss how these attacks can be used as an analysis tool to study and evaluate the robustness properties of financial models. Finally, we investigate the feasibility of realistic adversarial attacks in which an adversarial trader fools automated trading systems into making inaccurate predictions.", "keywords": "Adversarial;robustness;trading;finance;security", "primary_area": "", "supplementary_material": "/attachment/ee8833462ffc35e31aa8161bf49cf988a6f80b6f.zip", "author": "Micah Goldblum;Avi Schwarzschild;Ankit Patel;Tom Goldstein", "authorids": "~Micah_Goldblum1;~Avi_Schwarzschild1;~Ankit_Patel1;~Tom_Goldstein1", "gender": ";M;;M", "homepage": ";https://cs.umd.edu/~avi1;http://ankitlab.co/;https://www.cs.umd.edu/~tomg/", "dblp": "241/7231;249/9334.html;99/646;25/8184", "google_scholar": "pGDKzuUAAAAJ;WNvQ7AcAAAAJ;Gbe5UncAAAAJ;KmSuVtgAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Micah_Goldblum1;~Avi_Schwarzschild1;~Ankit_Patel1;~Tom_Goldstein1", "aff": "University of Maryland, College Park;University of Maryland, College Park;Rice University;University of Maryland, College Park", "aff_domain": "umd.edu;umd.edu;rice.edu;umd.edu", "position": "Postdoc;PhD student;Assistant Professor;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=NqJw2sVJbC8", "pdf_size": 0, "rating": "3;3;4", "confidence": "4;4;4", "wc_review": "439;862;866", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 3.3333333333333335, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 722.3333333333334, 200.35357634830368 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 30, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12647956316232672706&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "University of Maryland;Rice University", "aff_unique_dep": ";", "aff_unique_url": "https://www/umd.edu;https://www.rice.edu", "aff_unique_abbr": "UMD;Rice", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "College Park;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "NqPW1ZJjXDJ", "title": "NASOA: Towards Faster Task-oriented Online Fine-tuning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks. The common practice of fine-tuning is to adopt a default hyperparameter setting with a fixed pre-trained model, while both of them are not optimized for specific tasks and time constraints. Moreover, in cloud computing or GPU clusters where the tasks arrive sequentially in a stream, faster online fine-tuning is a more desired and realistic strategy for saving money, energy consumption, and CO2 emission. In this paper, we propose a joint Neural Architecture Search and Online Adaption framework named NASOA towards a faster task-oriented fine-tuning upon the request of users. Specifically, NASOA first adopts an offline NAS to identify a group of training-efficient networks to form a pretrained model zoo. We propose a novel joint block and macro-level search space to enable a flexible and efficient search. Then, by estimating fine-tuning performance via an adaptive model by accumulating experience from the past tasks, an online schedule generator is proposed to pick up the most suitable model and generate a personalized training regime with respect to each desired task in a one-shot fashion. The resulting model zoo is more training efficient than SOTA NAS models, e.g. 6x faster than RegNetY-16GF, and 1.7x faster than EfficientNetB3. Experiments on multiple datasets also show that NASOA achieves much better fine-tuning results, i.e. improving around 2.1% accuracy than the best performance in RegNet series under various time constraints and tasks; 40x faster compared to the BOHB method.", "keywords": "Fine-tuning;AutoML;NAS", "primary_area": "", "supplementary_material": "", "author": "Hang Xu;Ning Kang;Gengwei Zhang;Xiaodan Liang;Zhenguo Li", "authorids": "~Hang_Xu1;kang.ning2@huawei.com;~Gengwei_Zhang1;~Xiaodan_Liang2;~Zhenguo_Li1", "gender": "M;;M;F;M", "homepage": ";;https://gengdavid.github.io/;https://www.sysu-hcp.net/;http://www.ee.columbia.edu/~zgli/", "dblp": ";;226/6522;;23/6479", "google_scholar": "https://scholar.google.com.hk/citations?user=J_8TX6sAAAAJ;;YcikIekAAAAJ;voxznZAAAAAJ;XboZC1AAAAAJ", "orcid": "0000-0003-3645-8972;;0000-0003-1823-502X;;", "linkedin": ";;;;", "or_profile": "~Hang_Xu1;kang.ning2@huawei.com;~Gengwei_Zhang1;~Xiaodan_Liang2;~Zhenguo_Li1", "aff": "Huawei Noah\u2018s Ark Lab;;University of Technology Sydney;SUN YAT-SEN UNIVERSITY;Huawei Noah's Ark Lab", "aff_domain": "huawei.com;;student.uts.edu.au;sysu.edu.cn;huawei.com", "position": "Researcher;;PhD student;Associate Professor;Principal Researcher", "bibtex": "@misc{\nxu2021nasoa,\ntitle={{\\{}NASOA{\\}}: Towards Faster Task-oriented Online Fine-tuning},\nauthor={Hang Xu and Ning Kang and Gengwei Zhang and Xiaodan Liang and Zhenguo Li},\nyear={2021},\nurl={https://openreview.net/forum?id=NqPW1ZJjXDJ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=NqPW1ZJjXDJ", "pdf_size": 0, "rating": "3;6;7;7", "confidence": "3;3;3;4", "wc_review": "100;645;251;304", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "292;737;369;376", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.75, 1.6393596310755 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 325.0, 199.33765324193018 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 443.5, 172.62748912035997 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.44022545316281186, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:jkXlF9Z_APMJ:scholar.google.com/&scioq=NASOA:+Towards+Faster+Task-oriented+Online+Fine-tuning&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "Huawei;University of Technology Sydney;Sun Yat-sen University", "aff_unique_dep": "Noah's Ark Lab;;", "aff_unique_url": "https://www.huawei.com;https://www.uts.edu.au;http://www.sysu.edu.cn", "aff_unique_abbr": "Huawei;UTS;SYSU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "China;Australia" }, { "id": "NqWY3s0SILo", "title": "Differentiable Graph Optimization for Neural Architecture Search", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we propose Graph Optimized Neural Architecture Learning (GOAL), a novel gradient-based method for Neural Architecture Search (NAS), to find better architectures with fewer evaluated samples. Popular NAS methods usually employ black-box optimization based approaches like reinforcement learning, evolution algorithm or Bayesian optimization, which may be inefficient when having huge combinatorial NAS search spaces. In contrast, we aim to explicitly model the NAS search space as graphs, and then perform gradient-based optimization to learn graph structure with efficient exploitation. To this end, we learn a differentiable graph neural network as a surrogate model to rank candidate architectures, which enable us to obtain gradient w.r.t the input architectures. To cope with the difficulty in gradient-based optimization on the discrete graph structures, we propose to leverage proximal gradient descent to find potentially better architectures.\nOur empirical results show that GOAL outperforms mainstream black-box methods on existing NAS benchmarks in terms of search efficiency.", "keywords": "Neural Architecture Search;Graph Structure Learning", "primary_area": "", "supplementary_material": "/attachment/6bdb887f7baeea211180d54d84da67d61b6a85f5.zip", "author": "Chengyue Huang;Lingfei Wu;Yadong Ding;Siliang Tang;Fangli Xu;Chang Zong;Chilie Tan;Yueting Zhuang", "authorids": "~Chengyue_Huang1;~Lingfei_Wu1;~Yadong_Ding1;~Siliang_Tang1;lili@yixue.us;zongchang@zju.edu.cn;chilie.tan@tongdun.net;~Yueting_Zhuang1", "gender": ";;M;M;;;;M", "homepage": "https://hcyue.me;https://sites.google.com/view/teddy-lfwu/;http://dydcoding.cn;https://person.zju.edu.cn/en/siliang;;;;https://person.zju.edu.cn/yzhuang", "dblp": ";27/9060;;44/5693;;;;", "google_scholar": ";https://scholar.google.com/citations?hl=en;;8e7H3PcAAAAJ;;;;1RD7UJAAAAAJ", "orcid": ";;;0000-0002-7356-9711;;;;", "linkedin": ";;;siliang-tang-4734272a/;;;;", "or_profile": "~Chengyue_Huang1;~Lingfei_Wu1;~Yadong_Ding1;~Siliang_Tang1;lili@yixue.us;zongchang@zju.edu.cn;chilie.tan@tongdun.net;~Yueting_Zhuang1", "aff": "Zhejiang University;International Business Machines;Zhejiang University;Zhejiang University;;;;Zhejiang University", "aff_domain": "zju.edu.cn;ibm.com;zju.edu.cn;zju.edu.cn;;;;zju.edu.cn", "position": "MS student;Research Staff Member;MS student;Associate Professor;;;;Full Professor", "bibtex": "@misc{\nhuang2021differentiable,\ntitle={Differentiable Graph Optimization for Neural Architecture Search},\nauthor={Chengyue Huang and Lingfei Wu and Yadong Ding and Siliang Tang and Fangli Xu and Chang Zong and Chilie Tan and Yueting Zhuang},\nyear={2021},\nurl={https://openreview.net/forum?id=NqWY3s0SILo}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=NqWY3s0SILo", "pdf_size": 0, "rating": "4;5;6", "confidence": "4;4;3", "wc_review": "280;354;338", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "599;455;278", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 324.0, 31.790984046843636 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 444.0, 131.27833027579229 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:si8QXS2SHy8J:scholar.google.com/&scioq=Differentiable+Graph+Optimization+for+Neural+Architecture+Search&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;0;0;0", "aff_unique_norm": "Zhejiang University;International Business Machines Corporation", "aff_unique_dep": ";", "aff_unique_url": "https://www.zju.edu.cn;https://www.ibm.com", "aff_unique_abbr": "ZJU;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0;0", "aff_country_unique": "China;United States" }, { "id": "NrN8XarA2Iz", "title": "Learning to Dynamically Select Between Reward Shaping Signals", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement learning (RL) algorithms often have the limitation of sample complexity. Previous research has shown that the reliance on large amounts of experience can be mitigated through the presence of additional feedback. Automatic reward shaping is one approach to solving this problem, using automatic identification and modulation of shaping reward signals that are more informative about how agents should behave in any given scenario to learn and adapt faster. However, automatic reward shaping is still very challenging. To better study it, we break it down into two separate sub-problems: learning shaping reward signals in an application and learning how the signals can be adaptively used to provide a single reward feedback in the RL learning process. This paper focuses on the latter sub-problem. Unlike existing research, which tries to learn one shaping reward function from shaping signals, the proposed method learns to dynamically select the right reward signal to apply at each state, which is considerably more flexible. We further show that using an online strategy that seeks to match the learned shaping feedback with optimal value differences can lead to effective reward shaping and accelerated learning. The proposed ideas are verified through experiments in a variety of environments using different shaping reward paradigms.", "keywords": "selection;automatic;reward;shaping;reinforcement learning", "primary_area": "", "supplementary_material": "", "author": "Alexander Politowicz;Bing Liu", "authorids": "~Alexander_Politowicz1;~Bing_Liu1", "gender": "M;M", "homepage": ";https://www.cs.uic.edu/~liub/", "dblp": "259/3040;l/BingLiu1.html", "google_scholar": "x3ycem8AAAAJ;Kt1bjZoAAAAJ", "orcid": "0000-0001-6096-2031;", "linkedin": "alexander-politowicz-900244107/;", "or_profile": "~Alexander_Politowicz1;~Bing_Liu1", "aff": "University of Illinois, Chicago;University of Illinois at Chicago", "aff_domain": "uic.edu;uic.edu", "position": "PhD student;Full Professor", "bibtex": "@misc{\npolitowicz2021learning,\ntitle={Learning to Dynamically Select Between Reward Shaping Signals},\nauthor={Alexander Politowicz and Bing Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=NrN8XarA2Iz}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=NrN8XarA2Iz", "pdf_size": 0, "rating": "2;4;4;5", "confidence": "5;4;4;3", "wc_review": "232;293;1335;522", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 1.0897247358851685 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 595.5, 440.4262140245514 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.9733285267845754, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13165808335341139805&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Illinois at Chicago", "aff_unique_dep": "", "aff_unique_url": "https://www.uic.edu", "aff_unique_abbr": "UIC", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Chicago", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "Ns8v4jHGyAV", "title": "Matrix Shuffle-Exchange Networks for Hard 2D Tasks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Convolutional neural networks have become the main tools for processing two-dimensional data. They work well for images, yet convolutions have a limited receptive field that prevents its applications to more complex 2D tasks. We propose a new neural model, called Matrix Shuffle-Exchange network, that can efficiently exploit long-range dependencies in 2D data and has comparable speed to a convolutional neural network. It is derived from Neural Shuffle-Exchange network and has $\\mathcal{O}( \\log{n})$ layers and $\\mathcal{O}( n^2 \\log{n})$ total time and space complexity for processing a $n \\times n$ data matrix. We show that the Matrix Shuffle-Exchange network is well-suited for algorithmic and logical reasoning tasks on matrices and dense graphs, exceeding convolutional and graph neural network baselines. Its distinct advantage is the capability of retaining full long-range dependency modelling when generalizing to larger instances -- much larger than could be processed with models equipped with a dense attention mechanism.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/e51f1742a905fe90b11598ad614e70a1c5113c2e.zip", "author": "Em\u012bls Ozoli\u0146\u0161;Karlis Freivalds;Agris \u0160ostaks", "authorids": "~Em\u012bls_Ozoli\u0146\u01611;~Karlis_Freivalds1;agris.sostaks@lumii.lv", "gender": "M;;", "homepage": ";;", "dblp": "245/2796.html;;", "google_scholar": "eKUsjigAAAAJ;https://scholar.google.com/citations?hl=en;", "orcid": "0000-0002-8899-2651;;", "linkedin": "ozolinsemils/;;", "or_profile": "~Em\u012bls_Ozoli\u0146\u01611;~Karlis_Freivalds1;agris.sostaks@lumii.lv", "aff": "Institute of Mathematics and Computer Science, University of Latvia;Institute of Mathematics and Computer Science;", "aff_domain": "lumii.lv;lumii.lv;", "position": "Research Assistant;Principal Researcher;", "bibtex": "@misc{\nozoli{\\c{n}}{\\v{s}}2021matrix,\ntitle={Matrix Shuffle-Exchange Networks for Hard 2D Tasks},\nauthor={Em{\\={i}}ls Ozoli{\\c{n}}{\\v{s}} and Karlis Freivalds and Agris {\\v{S}}ostaks},\nyear={2021},\nurl={https://openreview.net/forum?id=Ns8v4jHGyAV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=Ns8v4jHGyAV", "pdf_size": 0, "rating": "4;4;8", "confidence": "3;3;3", "wc_review": "690;276;450", "wc_reply_reviewers": "258;0;0", "wc_reply_authors": "400;126;120", "reply_reviewers": "1;0;0", "reply_authors": "1;1;1", "rating_avg": [ 5.333333333333333, 1.8856180831641267 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 472.0, 169.72919607421701 ], "wc_reply_reviewers_avg": [ 86.0, 121.62236636408618 ], "wc_reply_authors_avg": [ 215.33333333333334, 130.60202482691028 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10197384453587797244&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1", "aff_unique_norm": "University of Latvia;Institute of Mathematics and Computer Science", "aff_unique_dep": "Institute of Mathematics and Computer Science;Mathematics and Computer Science", "aff_unique_url": "https://www.lu.lv;", "aff_unique_abbr": ";", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0", "aff_country_unique": "Latvia;" }, { "title": "WaveGrad: Estimating Gradients for Waveform Generation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3220", "id": "NsMLjcFaO8O", "poster": "", "openreview": "https://openreview.net/forum?id=NsMLjcFaO8O", "slides": "https://iclr.cc/virtual/2021/poster/3220", "video": "https://iclr.cc/virtual/2021/poster/3220", "author_site": "Nanxin Chen, Yu Zhang, Heiga Zen, Ron Weiss, Mohammad Norouzi, William Chan", "tldr": "", "abstract": "This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram.\nWaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality.\nWe find that it can generate high fidelity audio samples using as few as six iterations.\nExperiments reveal WaveGrad to generate high fidelity audio, outperforming adversarial non-autoregressive baselines and matching a strong likelihood-based autoregressive baseline using fewer sequential operations. Audio samples are available at https://wavegrad.github.io/.", "keywords": "vocoder;diffusion;score matching;text-to-speech;gradient estimation;waveform generation", "primary_area": "", "supplementary_material": "", "author": "Nanxin Chen;Yu Zhang;Heiga Zen;Ron J Weiss;Mohammad Norouzi;William Chan", "authorids": "~Nanxin_Chen1;~Yu_Zhang2;~Heiga_Zen1;~Ron_J_Weiss1;~Mohammad_Norouzi1;~William_Chan1", "gender": "M;M;M;M;;", "homepage": ";;https://research.google/people/heigazen;https://norouzi.github.io/;http://williamchan.ca;http://ronw.net", "dblp": ";50/671-33;42/7014;https://dblp.org/pers/hd/n/Norouzi_0002:Mohammad;58/2301;03/8052", "google_scholar": ";;z3IRvDwAAAAJ;Lncr-VoAAAAJ;Nla9qfUAAAAJ;_VhMIOIAAAAJ", "orcid": ";;0000-0002-8959-5471;;;0000-0003-2010-4053", "linkedin": ";;heiga-zen-b1a64b3;;;", "or_profile": "~Nanxin_Chen1;~Yu_Zhang2;~Heiga_Zen1;~Mohammad_Norouzi1;~William_Chan1;~Ron_Weiss1", "aff": "Johns Hopkins University;Google;Google;Google Brain;Google Brain;Google", "aff_domain": "jhu.edu;google.com;google.com;google.com;google.com;google.com", "position": "PhD student;Research Scientist;Researcher;Research Scientist;Research Scientist;Software Engineer", "bibtex": "@inproceedings{\nchen2021wavegrad,\ntitle={WaveGrad: Estimating Gradients for Waveform Generation},\nauthor={Nanxin Chen and Yu Zhang and Heiga Zen and Ron J Weiss and Mohammad Norouzi and William Chan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NsMLjcFaO8O}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 7 community implementations](https://paperswithcode.com/paper/?openreview=NsMLjcFaO8O)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "4;5;3;4", "wc_review": "333;274;252;169", "wc_reply_reviewers": "0;103;0;0", "wc_reply_authors": "603;904;205;179", "reply_reviewers": "0;1;0;0", "reply_authors": "2;3;1;1", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 257.0, 58.80901291468851 ], "wc_reply_reviewers_avg": [ 25.75, 44.60030829489859 ], "wc_reply_authors_avg": [ 472.75, 300.3834008396602 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.3162277660168379, "gs_citation": 894, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9166479714962885889&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=NsMLjcFaO8O", "email": "jhu.edu;google.com;google.com;google.com;google.com;google.com", "author_num": 6, "aff_unique_index": "0;1;1;1;1;1", "aff_unique_norm": "Johns Hopkins University;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.jhu.edu;https://www.google.com", "aff_unique_abbr": "JHU;Google", "aff_campus_unique_index": "1;1;1;1;1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "NyQedovJwAS", "title": "PANDA - Adapting Pretrained Features for Anomaly Detection", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Anomaly detection methods require high-quality features. One way of obtaining strong features is to adapt pre-trained features to anomaly detection on the target distribution. Unfortunately, simple adaptation methods often result in catastrophic collapse (feature deterioration) and reduce performance. DeepSVDD combats collapse by removing biases from architectures, but this limits the adaptation performance gain. In this work, we propose two methods for combating collapse: i) a variant of early stopping that dynamically learns the stopping iteration ii) elastic regularization inspired by continual learning. In addition, we conduct a thorough investigation of Imagenet-pretrained features for one-class anomaly detection. Our method, PANDA, outperforms the state-of-the-art in the one-class and outlier exposure settings (CIFAR10: 96.2% vs. 90.1% and 98.9% vs. 95.6%) .", "keywords": "anomaly detection", "primary_area": "", "supplementary_material": "", "author": "Tal Reiss;Niv Cohen;Liron Bergman;Yedid Hoshen", "authorids": "tal.reiss@mail.huji.ac.il;~Niv_Cohen1;~Liron_Bergman1;~Yedid_Hoshen3", "gender": ";M;F;M", "homepage": ";https://www.cs.huji.ac.il/w~nivc/;;https://www.cs.huji.ac.il/~ydidh/", "dblp": ";259/2291;259/2667;136/0280", "google_scholar": ";https://scholar.google.co.il/citations?user=ZMdC3OQAAAAJ;YQ5czYAAAAAJ;https://scholar.google.co.il/citations?user=6y1-qS4AAAAJ", "orcid": ";;;", "linkedin": ";niv-cohen-39b49521/;;", "or_profile": "tal.reiss@mail.huji.ac.il;~Niv_Cohen1;~Liron_Bergman1;~Yedid_Hoshen3", "aff": ";Hebrew University of Jerusalem;Hebrew University of Jerusalem;Hebrew University of Jerusalem", "aff_domain": ";huji.ac.il;huji.ac.il;huji.ac.il", "position": ";PhD student;MS student;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=NyQedovJwAS", "pdf_size": 0, "rating": "4;4;5;7", "confidence": "4;3;3;4", "wc_review": "92;478;189;485", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "462;650;724;250", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.224744871391589 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 311.0, 173.93245815545757 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 521.5, 183.55584981144023 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.40824829046386296, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13364084515677052767&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Hebrew University of Jerusalem", "aff_unique_dep": "", "aff_unique_url": "https://www.huji.ac.il", "aff_unique_abbr": "HUJI", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Jerusalem", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Israel" }, { "title": "EigenGame: PCA as a Nash Equilibrium", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3125", "id": "NzTU59SYbNq", "poster": "", "openreview": "https://openreview.net/forum?id=NzTU59SYbNq", "slides": "https://iclr.cc/virtual/2021/poster/3125", "video": "https://iclr.cc/virtual/2021/poster/3125", "author_site": "Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel", "tldr": "", "abstract": "We present a novel view on principal components analysis as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function. We analyze the properties of this PCA game and the behavior of its gradient based updates. The resulting algorithm---which combines elements from Oja's rule with a generalized Gram-Schmidt orthogonalization---is naturally decentralized and hence parallelizable through message passing. We demonstrate the scalability of the algorithm with experiments on large image datasets and neural network activations. We discuss how this new view of PCA as a differentiable game can lead to further algorithmic developments and insights.", "keywords": "pca;principal components analysis;nash;games;eigendecomposition;svd;singular value decomposition", "primary_area": "", "supplementary_material": "/attachment/1de52c13bf873c8388e295d31bed2315fb89f032.zip", "author": "Ian Gemp;Brian McWilliams;Claire Vernade;Thore Graepel", "authorids": "~Ian_Gemp1;~Brian_McWilliams2;~Claire_Vernade1;~Thore_Graepel1", "gender": "M;M;F;", "homepage": "https://imgemp.github.io/;https://sites.google.com/view/mcbrian/;https://www.cvernade.com;", "dblp": "66/10996;;168/8721;g/ThoreGraepel", "google_scholar": "5vo3MeEAAAAJ;https://scholar.google.ch/citations?user=IS4VSXAAAAAJ;tE2hCaYAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Ian_Gemp1;~Brian_McWilliams2;~Claire_Vernade1;~Thore_Graepel1", "aff": "Google DeepMind;Deepmind;Google;", "aff_domain": "google.com;google.com;google.com;", "position": "Research Scientist;Research Scientist;Research scientist;", "bibtex": "@inproceedings{\ngemp2021eigengame,\ntitle={EigenGame: {\\{}PCA{\\}} as a Nash Equilibrium},\nauthor={Ian Gemp and Brian McWilliams and Claire Vernade and Thore Graepel},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=NzTU59SYbNq}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=NzTU59SYbNq)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "7;8;8", "confidence": "3;3;3", "wc_review": "304;380;904", "wc_reply_reviewers": "0;0;258", "wc_reply_authors": "437;148;1543", "reply_reviewers": "0;0;1", "reply_authors": "1;1;3", "rating_avg": [ 7.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 529.3333333333334, 266.73998991943864 ], "wc_reply_reviewers_avg": [ 86.0, 121.62236636408618 ], "wc_reply_authors_avg": [ 709.3333333333334, 601.182353551917 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 60, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9951528772873805135&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=NzTU59SYbNq", "email": "google.com;google.com;google.com;", "author_num": 4, "aff_unique_index": "0;1;0", "aff_unique_norm": "Google;DeepMind", "aff_unique_dep": "Google DeepMind;", "aff_unique_url": "https://deepmind.com;https://deepmind.com", "aff_unique_abbr": "DeepMind;DeepMind", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United Kingdom;United States" }, { "title": "Deep Networks and the Multiple Manifold Problem", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2530", "id": "O-6Pm_d_Q-", "poster": "", "openreview": "https://openreview.net/forum?id=O-6Pm_d_Q-", "slides": "https://iclr.cc/virtual/2021/poster/2530", "video": "https://iclr.cc/virtual/2021/poster/2530", "author_site": "Sam Buchanan, Dar Gilboa, John Wright", "tldr": "", "abstract": "We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere. We provide an analysis of the one-dimensional case, proving for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width $n$ grows as a sufficiently large polynomial in $L$, and the number of i.i.d. samples from the manifolds is polynomial in $L$, randomly-initialized gradient descent rapidly learns to classify the two manifolds perfectly with high probability. Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem: the depth acts as a fitting resource, with larger depths corresponding to smoother networks that can more readily separate the class manifolds, and the width acts as a statistical resource, enabling concentration of the randomly-initialized network and its gradients. The argument centers around the \"neural tangent kernel\" of Jacot et al. and its role in the nonasymptotic analysis of training overparameterized neural networks; to this literature, we contribute essentially optimal rates of concentration for the neural tangent kernel of deep fully-connected ReLU networks, requiring width $n \\geq L\\,\\mathrm{poly}(d_0)$ to achieve uniform concentration of the initial kernel over a $d_0$-dimensional submanifold of the unit sphere $\\mathbb{S}^{n_0-1}$, and a nonasymptotic framework for establishing generalization of networks trained in the \"NTK regime\" with structured data. The proof makes heavy use of martingale concentration to optimally treat statistical dependencies across layers of the initial random network. This approach should be of use in establishing similar results for other network architectures.", "keywords": "deep learning;overparameterized neural networks;low-dimensional structure", "primary_area": "", "supplementary_material": "", "author": "Sam Buchanan;Dar Gilboa;John Wright", "authorids": "~Sam_Buchanan1;~Dar_Gilboa1;~John_Wright1", "gender": "M;;", "homepage": "http://sdbuchanan.com;;http://www.columbia.edu/~jw2966", "dblp": "226/5790;203/4469;", "google_scholar": "5WT38A0AAAAJ;;nujTx04AAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Sam_Buchanan1;~Dar_Gilboa1;~John_Wright1", "aff": "Columbia University;Harvard University;Columbia University", "aff_domain": "columbia.edu;harvard.edu;columbia.edu", "position": "PhD student;Swartz Fellow;Associate Professor", "bibtex": "@inproceedings{\nbuchanan2021deep,\ntitle={Deep Networks and the Multiple Manifold Problem},\nauthor={Sam Buchanan and Dar Gilboa and John Wright},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=O-6Pm_d_Q-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "1;2;4;2", "wc_review": "102;291;410;123", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "271;348;391;150", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 2.25, 1.0897247358851685 ], "wc_review_avg": [ 231.5, 126.43674307731911 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 290.0, 91.55053249435527 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5129891760425771, "gs_citation": 53, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17700820663970251557&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 11, "pdf": "https://openreview.net/pdf?id=O-6Pm_d_Q-", "email": "columbia.edu;harvard.edu;columbia.edu", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "Columbia University;Harvard University", "aff_unique_dep": ";", "aff_unique_url": "https://www.columbia.edu;https://www.harvard.edu", "aff_unique_abbr": "Columbia;Harvard", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Minimum Width for Universal Approximation", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2920", "id": "O-XJwyoIF-k", "poster": "", "openreview": "https://openreview.net/forum?id=O-XJwyoIF-k", "slides": "https://iclr.cc/virtual/2021/poster/2920", "video": "https://iclr.cc/virtual/2021/poster/2920", "author_site": "Sejun Park, Chulhee Yun, Jaeho Lee, Jinwoo Shin", "tldr": "", "abstract": "The universal approximation property of width-bounded networks has been studied as a dual of classical universal approximation results on depth-bounded networks. However, the critical width enabling the universal approximation has not been exactly characterized in terms of the input dimension $d_x$ and the output dimension $d_y$. In this work, we provide the first definitive result in this direction for networks using the ReLU activation functions: The minimum width required for the universal approximation of the $L^p$ functions is exactly $\\max\\{d_x+1,d_y\\}$. We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function. Our proof technique can be also used to derive a tighter upper bound on the minimum width required for the universal approximation using networks with general activation functions.", "keywords": "universal approximation;neural networks", "primary_area": "", "supplementary_material": "", "author": "Sejun Park;Chulhee Yun;Jaeho Lee;Jinwoo Shin", "authorids": "~Sejun_Park1;~Chulhee_Yun1;~Jaeho_Lee3;~Jinwoo_Shin1", "gender": ";M;M;M", "homepage": ";https://chulheeyun.github.io/;https://jaeho-lee.github.io;https://sites.google.com/site/mijirim/", "dblp": "155/9882;138/0148.html;78/6080-1;31/7062", "google_scholar": ";Ukl64ggAAAAJ;t91zoQMAAAAJ;https://scholar.google.com.tw/citations?user=m3eDp7kAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Sejun_Park1;~Chulhee_Yun1;~Jaeho_Lee3;~Jinwoo_Shin1", "aff": "Korea University;Massachusetts Institute of Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "korea.ac.kr;mit.edu;kaist.ac.kr;kaist.ac.kr", "position": "Assistant Professor;PhD student;Postdoc;Associate Professor", "bibtex": "@inproceedings{\npark2021minimum,\ntitle={Minimum Width for Universal Approximation},\nauthor={Sejun Park and Chulhee Yun and Jaeho Lee and Jinwoo Shin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=O-XJwyoIF-k}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "7;7;7;8", "confidence": "3;4;4;4", "wc_review": "731;361;197;555", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "485;103;664;411", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 461.0, 200.89300634915094 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 415.75, 202.64423875353575 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 171, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11949099437647058190&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=O-XJwyoIF-k", "email": "korea.ac.kr;mit.edu;kaist.ac.kr;kaist.ac.kr", "author_num": 4, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "Korea University;Massachusetts Institute of Technology;Korea Advanced Institute of Science and Technology", "aff_unique_dep": ";;", "aff_unique_url": "https://www.korea.ac.kr;https://web.mit.edu;https://www.kaist.ac.kr", "aff_unique_abbr": "KU;MIT;KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "South Korea;United States" }, { "id": "O1GEH9X8848", "title": "A Point Cloud Generative Model Based on Nonequilibrium Thermodynamics", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We present a probabilistic model for point cloud generation, which is critical for various 3D vision tasks such as shape completion, upsampling, synthesis and data augmentation. Inspired by the diffusion process in non-equilibrium thermodynamics, we view points in point clouds as particles in a thermodynamic system in contact with a heat bath, which diffuse from the original distribution to a noise distribution. Point cloud generation thus amounts to learning the reverse diffusion process that transforms the noise distribution to the distribution of a desired shape. Specifically, we propose to model the reverse diffusion process for point clouds as a Markov chain conditioned on certain shape latent. We derive the variational bound in closed form for training and provide implementations of the model. Experimental results demonstrate that our model achieves the state-of-the-art performance in point cloud generation and auto-encoding.", "keywords": "Point cloud;Generation;Generative model", "primary_area": "", "supplementary_material": "", "author": "Shitong Luo;Wei Hu", "authorids": "~Shitong_Luo1;~Wei_Hu6", "gender": ";F", "homepage": "https://luost.me;http://www.wict.pku.edu.cn/huwei/", "dblp": "271/0339;52/173-3.html", "google_scholar": "z1BrjyIAAAAJ;https://scholar.google.com.hk/citations?user=5oFf8Q4AAAAJ", "orcid": ";0000-0002-9860-0922", "linkedin": ";", "or_profile": "~Shitong_Luo1;~Wei_Hu6", "aff": "Peking University;", "aff_domain": "pku.edu.cn;", "position": "Undergrad student;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=O1GEH9X8848", "pdf_size": 0, "rating": "4;6;7", "confidence": "5;5;3", "wc_review": "813;325;374", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 5.666666666666667, 1.247219128924647 ], "confidence_avg": [ 4.333333333333333, 0.9428090415820634 ], "wc_review_avg": [ 504.0, 219.40981442648973 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7559289460184545, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:cukyC16xf_IJ:scholar.google.com/&scioq=A+Point+Cloud+Generative+Model+Based+on+Nonequilibrium+Thermodynamics&hl=en&as_sdt=0,14", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Peking University", "aff_unique_dep": "", "aff_unique_url": "http://www.pku.edu.cn", "aff_unique_abbr": "Peking U", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "O1pkU_4yWEt", "title": "Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality", "track": "main", "status": "Reject", "tldr": "", "abstract": "Medical entity extraction (EE) is a standard procedure used as a first stage inmedical texts processing. Usually Medical EE is a two-step process: named entityrecognition (NER) and named entity normalization (NEN). We propose a novelmethod of doing medical EE from electronic health records (EHR) as a single-step multi-label classification task by fine-tuning a transformer model pretrainedon a large EHR dataset. Our model is trained end-to-end in an distantly supervisedmanner using targets automatically extracted from medical knowledge base. Weshow that our model learns to generalize for entities that are present frequentlyenough, achieving human-level classification quality for most frequent entities.Our work demonstrates that medical entity extraction can be done end-to-endwithout human supervision and with human quality given the availability of alarge enough amount of unlabeled EHR and a medical knowledge base.", "keywords": "entity extraction;medical entity extraction;named entity recognition;named entity normalization;electronic health records;unsupervised learning;distant supervision.", "primary_area": "", "supplementary_material": "", "author": "Alexander Nesterov;Dmitry Umerenkov", "authorids": "ainesterov@sberbank.ru;~Dmitry_Umerenkov1", "gender": ";M", "homepage": ";", "dblp": ";250/3872", "google_scholar": ";https://scholar.google.com/citations?hl=en", "orcid": ";0000-0003-0413-7170", "linkedin": ";dumerenkov", "or_profile": "ainesterov@sberbank.ru;~Dmitry_Umerenkov1", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@misc{\nnesterov2021distantly,\ntitle={Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality},\nauthor={Alexander Nesterov and Dmitry Umerenkov},\nyear={2021},\nurl={https://openreview.net/forum?id=O1pkU_4yWEt}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=O1pkU_4yWEt", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "4;4;4;5", "wc_review": "228;325;666;337", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 389.0, 165.41614189673268 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7516800579730625482&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4 }, { "id": "O358nrve1W", "title": "Neurally Guided Genetic Programming for Turing Complete Programming by Example", "track": "main", "status": "Reject", "tldr": "", "abstract": "The ability to synthesise source code from input/output examples allows nonexperts to generate programs, and experts to abstract away a wide range of simple programming tasks. Current research in this area has explored neural synthesis, SMT solvers, and genetic programming; each of these approaches is limited, however, often using highly specialised target languages for synthesis. In this paper we present a novel hybrid approach using neural networks to guide genetic programming (GP), which allows us to successfully synthesise code from just ten I/O examples in a generalised Turing complete target language, up to and including a sorting algorithm. We show that GP by itself is able to synthesise a set of simple programs, and show which hints (suggested lines of code for inclusion) are of most utility to GP in solving harder problems. Using a form of unstructured curriculum learning, we then demonstrate that neural networks can be used to determine when to make use of these high-utility hints for specific I/O problems and thus enable complex functions to be successfully synthesised. We apply our approach to two different problem sets: common array-to-array programs (including sorting), and a canvas drawing problem set inspired by So & Oh (2018).", "keywords": "Code Synthesis;Neural Code Synthesis;Genetic Programming;Programming By Example", "primary_area": "", "supplementary_material": "", "author": "Alexander Newton Wild;Barry Porter", "authorids": "~Alexander_Newton_Wild1;~Barry_Porter1", "gender": "M;", "homepage": ";", "dblp": ";", "google_scholar": ";", "orcid": ";0000-0001-8376-736X", "linkedin": ";", "or_profile": "~Alexander_Newton_Wild1;~Barry_Porter1", "aff": ";Lancaster University", "aff_domain": ";lancaster.ac.uk", "position": ";Full Professor", "bibtex": "@misc{\nwild2021neurally,\ntitle={Neurally Guided Genetic Programming for Turing Complete Programming by Example},\nauthor={Alexander Newton Wild and Barry Porter},\nyear={2021},\nurl={https://openreview.net/forum?id=O358nrve1W}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=O358nrve1W", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;4;5", "wc_review": "498;477;788", "wc_reply_reviewers": "33;0;0", "wc_reply_authors": "540;490;370", "reply_reviewers": "1;0;0", "reply_authors": "2;1;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 587.6666666666666, 141.91625073338932 ], "wc_reply_reviewers_avg": [ 11.0, 15.556349186104045 ], "wc_reply_authors_avg": [ 466.6666666666667, 71.336448530109 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:1zOAduAwuAwJ:scholar.google.com/&scioq=Neurally+Guided+Genetic+Programming+for+Turing+Complete+Programming+by+Example&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Lancaster University", "aff_unique_dep": "", "aff_unique_url": "https://www.lancaster.ac.uk", "aff_unique_abbr": "Lancaster", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "title": "Self-training For Few-shot Transfer Across Extreme Task Differences", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3168", "id": "O3Y56aqpChA", "poster": "", "openreview": "https://openreview.net/forum?id=O3Y56aqpChA", "slides": "https://iclr.cc/virtual/2021/poster/3168", "video": "https://iclr.cc/virtual/2021/poster/3168", "author_site": "Cheng Perng Phoo, Bharath Hariharan", "tldr": "", "abstract": "Most few-shot learning techniques are pre-trained on a large, labeled \u201cbase dataset\u201d. In problem domains where such large labeled datasets are not available for pre-training (e.g., X-ray, satellite images), one must resort to pre-training in a different \u201csource\u201d problem domain (e.g., ImageNet), which can be very different from the desired target task. Traditional few-shot and transfer learning techniques fail in the presence of such extreme differences between the source and target tasks. In this paper, we present a simple and effective solution to tackle this extreme domain gap: self-training a source domain representation on unlabeled data from the target domain. We show that this improves one-shot performance on the target domain by 2.9 points on average on the challenging BSCD-FSL benchmark consisting of datasets from multiple domains.", "keywords": "few-shot learning;self-training;cross-domain few-shot learning", "primary_area": "", "supplementary_material": "", "author": "Cheng Perng Phoo;Bharath Hariharan", "authorids": "~Cheng_Perng_Phoo1;~Bharath_Hariharan3", "gender": "M;M", "homepage": "https://cpphoo.github.io/;http://home.bharathh.info", "dblp": "226/0521;05/8412", "google_scholar": "kt9D2usAAAAJ;TpglobcAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Cheng_Perng_Phoo1;~Bharath_Hariharan2", "aff": "International Business Machines;Cornell University", "aff_domain": "ibm.com;cornell.edu", "position": "Intern;Assistant Professor", "bibtex": "@inproceedings{\nphoo2021selftraining,\ntitle={Self-training For Few-shot Transfer Across Extreme Task Differences},\nauthor={Cheng Perng Phoo and Bharath Hariharan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=O3Y56aqpChA}\n}", "github": "[![github](/images/github_icon.svg) cpphoo/STARTUP](https://github.com/cpphoo/STARTUP)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;7;8;8", "confidence": "4;4;5;5", "wc_review": "487;335;822;672", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 7.25, 0.82915619758885 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 579.0, 184.18604724571296 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.9045340337332909, "gs_citation": 157, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13876494869867170602&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=O3Y56aqpChA", "email": "ibm.com;cornell.edu", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "International Business Machines Corporation;Cornell University", "aff_unique_dep": ";", "aff_unique_url": "https://www.ibm.com;https://www.cornell.edu", "aff_unique_abbr": "IBM;Cornell", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3101", "id": "O3bqkf_Puys", "poster": "", "openreview": "https://openreview.net/forum?id=O3bqkf_Puys", "slides": "https://iclr.cc/virtual/2021/poster/3101", "video": "https://iclr.cc/virtual/2021/poster/3101", "author_site": "Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli", "tldr": "", "abstract": "Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.", "keywords": "Point cloud;spatio-temporal modeling;video analysis;action recognition;semantic segmentation;convolutional neural network", "primary_area": "", "supplementary_material": "", "author": "Hehe Fan;Xin Yu;Yuhang Ding;Yi Yang;Mohan Kankanhalli", "authorids": "~Hehe_Fan1;~Xin_Yu1;~Yuhang_Ding1;~Yi_Yang4;~Mohan_Kankanhalli1", "gender": "M;M;M;M;M", "homepage": "https://hehefan.github.io;https://sites.google.com/view/xinyus-homepage/Home;;http://reler.net/;https://www.comp.nus.edu.sg/~mohan", "dblp": "184/5722.html;54/1184-2;244/9493;;09/3613.html", "google_scholar": "hVuflMQAAAAJ;oxdtuSEAAAAJ;2zbnTq8AAAAJ;https://scholar.google.com.au/citations?user=RMSuNFwAAAAJ;6Lx_eowAAAAJ", "orcid": "0000-0001-9572-2345;0000-0002-0269-5649;;;0000-0002-4846-2015", "linkedin": ";;;;mohan-kankanhalli-583417221", "or_profile": "~Hehe_Fan1;~Xin_Yu1;~Yuhang_Ding1;~Yi_Yang4;~Mohan_Kankanhalli1", "aff": "National University of Singapore;University of Technology Sydney;University of Technology Sydney;Zhejiang University;National University of Singapore", "aff_domain": "nus.edu.sg;uts.edu.au;uts.edu.au;zju.edu.cn;nus.edu.sg", "position": "Postdoc;Lecturer;PhD student;Full Professor;Full Professor", "bibtex": "@inproceedings{\nfan2021pstnet,\ntitle={{PSTN}et: Point Spatio-Temporal Convolution on Point Cloud Sequences},\nauthor={Hehe Fan and Xin Yu and Yuhang Ding and Yi Yang and Mohan Kankanhalli},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=O3bqkf_Puys}\n}", "github": "[![github](/images/github_icon.svg) hehefan/Point-Spatio-Temporal-Convolution](https://github.com/hehefan/Point-Spatio-Temporal-Convolution)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "5;7;7", "confidence": "5;3;4", "wc_review": "324;303;887", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1249;577;1694", "reply_reviewers": "0;0;0", "reply_authors": "2;1;2", "rating_avg": [ 6.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 504.6666666666667, 270.486393167732 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1173.3333333333333, 459.14146936308083 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 155, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2334272316624788650&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=O3bqkf_Puys", "email": "nus.edu.sg;uts.edu.au;uts.edu.au;zju.edu.cn;nus.edu.sg", "author_num": 5, "aff_unique_index": "0;1;1;2;0", "aff_unique_norm": "National University of Singapore;University of Technology Sydney;Zhejiang University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.nus.edu.sg;https://www.uts.edu.au;https://www.zju.edu.cn", "aff_unique_abbr": "NUS;UTS;ZJU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;2;0", "aff_country_unique": "Singapore;Australia;China" }, { "title": "INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2689", "id": "O6LPudowNQm", "poster": "", "openreview": "https://openreview.net/forum?id=O6LPudowNQm", "slides": "https://iclr.cc/virtual/2021/poster/2689", "video": "https://iclr.cc/virtual/2021/poster/2689", "author_site": "Yuhuai Wu, Albert Jiang, Jimmy Ba, Roger Grosse", "tldr": "", "abstract": "In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time. In this paper, we introduce INT, an INequality Theorem proving benchmark designed to test agents\u2019 generalization ability. INT is based on a theorem generator, which provides theoretically infinite data and allows us to measure 6 different types of generalization, each reflecting a distinct challenge, characteristic of automated theorem proving. In addition, provides a fast theorem proving environment with sequence-based and graph-based interfaces, conducive to performing learning-based research. We introduce base-lines with architectures including transformers and graph neural networks (GNNs)for INT. Using INT, we find that transformer-based agents achieve stronger test performance for most of the generalization tasks, despite having much larger out-of-distribution generalization gaps than GNNs. We further find that the addition of Monte Carlo Tree Search (MCTS) at test time helps to prove new theorems.", "keywords": "Theorem proving;Synthetic benchmark dataset;Generalization;Transformers;Graph neural networks;Monte Carlo Tree Search", "primary_area": "", "supplementary_material": "/attachment/5f58254353efdd68ec0f09f8291d780923819a95.zip", "author": "Yuhuai Wu;Albert Jiang;Jimmy Ba;Roger Baker Grosse", "authorids": "~Yuhuai_Wu1;~Albert_Jiang1;~Jimmy_Ba1;~Roger_Baker_Grosse1", "gender": "M;;M;M", "homepage": "http://www.cs.toronto.edu/~ywu/;https://albertqjiang.github.io/;http://jimmylba.github.io;http://www.cs.toronto.edu/~rgrosse/", "dblp": ";321/1049;https://dblp.org/pers/b/Ba:Jimmy.html;26/7058", "google_scholar": "https://scholar.google.ca/citations?user=bOQGfFIAAAAJ;Fe_RBHMAAAAJ;https://scholar.google.ca/citations?user=ymzxRhAAAAAJ;xgQd1qgAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Yuhuai_Wu1;~Albert_Jiang1;~Jimmy_Ba1;~Roger_Baker_Grosse1", "aff": "Department of Computer Science, University of Toronto;University of Oxford;Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto", "aff_domain": "cs.toronto.edu;ox.ac.uk;cs.toronto.edu;cs.toronto.edu", "position": "PhD student;MS student;Assistant Professor;Assistant Professor", "bibtex": "@inproceedings{\nwu2021int,\ntitle={{\\{}INT{\\}}: An Inequality Benchmark for Evaluating Generalization in Theorem Proving},\nauthor={Yuhuai Wu and Albert Jiang and Jimmy Ba and Roger Baker Grosse},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=O6LPudowNQm}\n}", "github": "[![github](/images/github_icon.svg) albertqjiang/INT](https://github.com/albertqjiang/INT)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "2;4;2;4", "wc_review": "222;1444;189;690", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "785;1664;82;1486", "reply_reviewers": "0;0;0;0", "reply_authors": "2;4;1;2", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.0, 1.0 ], "wc_review_avg": [ 636.25, 506.7012803417809 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1004.25, 625.6933653955426 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 1.0897247358851685 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 59, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2622676809142200746&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=O6LPudowNQm", "email": "cs.toronto.edu;ox.ac.uk;cs.toronto.edu;cs.toronto.edu", "author_num": 4, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "University of Toronto;University of Oxford", "aff_unique_dep": "Department of Computer Science;", "aff_unique_url": "https://www.utoronto.ca;https://www.ox.ac.uk", "aff_unique_abbr": "U of T;Oxford", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Toronto;", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "Canada;United Kingdom" }, { "title": "Disentangled Recurrent Wasserstein Autoencoder", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3257", "id": "O7ms4LFdsX", "poster": "", "openreview": "https://openreview.net/forum?id=O7ms4LFdsX", "slides": "https://iclr.cc/virtual/2021/poster/3257", "video": "https://iclr.cc/virtual/2021/poster/3257", "author_site": "Jun Han, Martin Min, Ligong Han, Erran Li, Xuan Zhang", "tldr": "", "abstract": "Learning disentangled representations leads to interpretable models and facilitates data generation with style transfer, which has been extensively studied on static data such as images in an unsupervised learning framework. However, only a few works have explored unsupervised disentangled sequential representation learning due to challenges of generating sequential data. In this paper, we propose recurrent Wasserstein Autoencoder (R-WAE), a new framework for generative modeling of sequential data. R-WAE disentangles the representation of an input sequence into static and dynamic factors (i.e., time-invariant and time-varying parts). Our theoretical analysis shows that, R-WAE minimizes an upper bound of a penalized form of the Wasserstein distance between model distribution and sequential data distribution, and simultaneously maximizes the mutual information between input data and different disentangled latent factors, respectively. This is superior to (recurrent) VAE which does not explicitly enforce mutual information maximization between input data and disentangled latent representations. When the number of actions in sequential data is available as weak supervision information, R-WAE is extended to learn a categorical latent representation of actions to improve its disentanglement. Experiments on a variety of datasets show that our models outperform other baselines with the same settings in terms of disentanglement and unconditional video generation both quantitatively and qualitatively.", "keywords": "Sequential Representation Learning;Disentanglement;Recurrent Generative Model", "primary_area": "", "supplementary_material": "/attachment/1b9c51bc439fc7b733288fdf7ea59cfc77bdfcae.zip", "author": "Jun Han;Martin Renqiang Min;Ligong Han;Li Erran Li;Xuan Zhang", "authorids": "~Jun_Han4;~Martin_Renqiang_Min1;~Ligong_Han1;~Li_Erran_Li1;~Xuan_Zhang3", "gender": ";M;M;;M", "homepage": ";http://www.cs.toronto.edu/~cuty;https://phymhan.github.io;http://www.cs.columbia.edu/~lierranli/;https://github.com/floatlazer", "dblp": "02/3721-4;29/7048;187/1675;l/ErranLLi.html;", "google_scholar": ";T2M4JjEAAAAJ;n2v43R4AAAAJ;GkMfzy4AAAAJ;https://scholar.google.com/citations?view_op=list_works", "orcid": ";0000-0002-8563-6133;0000-0003-3166-0848;;", "linkedin": ";martin-renqiang-min-955a8766;ligongh/;;", "or_profile": "~Jun_Han4;~Martin_Renqiang_Min1;~Ligong_Han1;~Li_Erran_Li1;~Xuan_Zhang3", "aff": "PCG, Tencent;NEC Laboratories America;Rutgers University;Columbia University;Texas A&M", "aff_domain": "tencent.com;nec-labs.com;rutgers.edu;columbia.edu;tamu.edu", "position": "Senior Researcher;Researcher;PhD student;Adjunct Professor;PhD student", "bibtex": "@inproceedings{\nhan2021disentangled,\ntitle={Disentangled Recurrent Wasserstein Autoencoder },\nauthor={Jun Han and Martin Renqiang Min and Ligong Han and Li Erran Li and Xuan Zhang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=O7ms4LFdsX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "7;7;7", "confidence": "4;4;3", "wc_review": "398;347;398", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "612;531;603", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 381.0, 24.041630560342615 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 582.0, 36.24913792078372 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 39, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11829381062149315974&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=O7ms4LFdsX", "email": "tencent.com;nec-labs.com;rutgers.edu;columbia.edu;tamu.edu", "author_num": 5, "aff_unique_index": "0;1;2;3;4", "aff_unique_norm": "Tencent;NEC Laboratories America;Rutgers University;Columbia University;Texas A&M University", "aff_unique_dep": "PCG;;;;", "aff_unique_url": "https://www.tencent.com;https://www.nec-labs.com;https://www.rutgers.edu;https://www.columbia.edu;https://www.tamu.edu", "aff_unique_abbr": "Tencent;NEC Labs America;Rutgers;Columbia;TAMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;1;1", "aff_country_unique": "China;United States" }, { "id": "O9NAKC_MqMx", "title": "Knapsack Pruning with Inner Distillation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Neural network pruning reduces the computational cost of an over-parameterized network to improve its efficiency. Popular methods vary from $\\ell_1$-norm sparsification to Neural Architecture Search (NAS). In this work, we propose a novel pruning method that optimizes the final accuracy of the pruned network and distills knowledge from the over-parameterized parent network's inner layers. To enable this approach, we formulate the network pruning as a Knapsack Problem which optimizes the trade-off between the importance of neurons and their associated computational cost. Then we prune the network channels while maintaining the high-level structure of the network. The pruned network is fine-tuned under the supervision of the parent network using its inner network knowledge, a technique we refer to as the {\\it Inner Knowledge Distillation}. Our method leads to state-of-the-art pruning results on ImageNet, CIFAR-10 and CIFAR-100 using ResNet backbones. \nTo prune complex network structures such as convolutions with skip-links and depth-wise convolutions, we propose a block grouping approach to cope with these structures.\nThrough this we produce compact architectures with the same FLOPs as EfficientNet-B0 and MobileNetV3 but with higher accuracy, by $1\\%$ and $0.3\\%$ respectively on ImageNet, and faster runtime on GPU.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/a8fff2a3c61482b7018cb40cb8c5c884bb6286d5.zip", "author": "Yonathan Aflalo;Itamar Friedman;Asaf Noy;Lihi Zelnik-Manor;Ming Lin", "authorids": "~Yonathan_Aflalo2;itamar.friedman@alibaba-inc.com;asaf.noy@alibaba-inc.com;~Lihi_Zelnik-Manor1;~Ming_Lin4", "gender": "M;;;F;M", "homepage": ";;;https://lihi.net.technion.ac.il/;https://minglin-home.github.io/", "dblp": "52/10673;;;z/LihiZelnikManor;", "google_scholar": ";;;https://scholar.google.com.tw/citations?user=E_ejWvYAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Yonathan_Aflalo2;itamar.friedman@alibaba-inc.com;asaf.noy@alibaba-inc.com;~Lihi_Zelnik-Manor1;~Ming_Lin4", "aff": ";;;Technion - Israel Institute of Technology, Technion;Alibaba Group", "aff_domain": ";;;technion.ac.il;alibaba-inc.com", "position": ";;;Associate Professor;Algorithm Engineer", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=O9NAKC_MqMx", "pdf_size": 0, "rating": "4;4;4;5", "confidence": "4;5;5;4", "wc_review": "409;259;503;364", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.4330127018922193 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 383.75, 87.76495599041795 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 44, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7172223920880342866&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;1", "aff_unique_norm": "Technion - Israel Institute of Technology;Alibaba Group", "aff_unique_dep": ";", "aff_unique_url": "https://www.technion.ac.il;https://www.alibaba.com", "aff_unique_abbr": "Technion;Alibaba", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Israel;China" }, { "title": "Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3301", "id": "O9bnihsFfXU", "poster": "", "openreview": "https://openreview.net/forum?id=O9bnihsFfXU", "slides": "https://iclr.cc/virtual/2021/poster/3301", "video": "https://iclr.cc/virtual/2021/poster/3301", "author_site": "Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine", "tldr": "", "abstract": "We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network. We char- acterize this loss of expressivity via a drop in the rank of the learned value net- work features, and show that this typically corresponds to a performance drop. We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings. We formally analyze this phenomenon and show that it results from a pathological interaction between bootstrapping and gradient-based optimization. We further show that mitigating implicit under-parameterization by controlling rank collapse can improve performance.", "keywords": "deep Q-learning;data-efficient RL;rank-collapse;offline RL", "primary_area": "", "supplementary_material": "", "author": "Aviral Kumar;Rishabh Agarwal;Dibya Ghosh;Sergey Levine", "authorids": "~Aviral_Kumar2;~Rishabh_Agarwal2;~Dibya_Ghosh1;~Sergey_Levine1", "gender": "M;M;M;M", "homepage": "https://aviralkumar2907.github.io/;https://agarwl.github.io;https://dibyaghosh.com;https://people.eecs.berkeley.edu/~svlevine/", "dblp": "202/7961;;210/2547;80/7594", "google_scholar": ";https://scholar.google.ca/citations?user=aH8AJu4AAAAJ;znnl0kwAAAAJ;8R35rCwAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Aviral_Kumar2;~Rishabh_Agarwal2;~Dibya_Ghosh1;~Sergey_Levine1", "aff": "University of California, Berkeley;Google DeepMind;University of California, Berkeley;Google", "aff_domain": "berkeley.edu;google.com;berkeley.edu;google.com", "position": "PhD student;Research Scientist;PhD student;Research Scientist", "bibtex": "@inproceedings{\nkumar2021implicit,\ntitle={Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning},\nauthor={Aviral Kumar and Rishabh Agarwal and Dibya Ghosh and Sergey Levine},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=O9bnihsFfXU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "4;2;3;4", "wc_review": "796;311;329;1135", "wc_reply_reviewers": "0;0;37;1189", "wc_reply_authors": "1368;235;778;1982", "reply_reviewers": "0;0;1;3", "reply_authors": "3;1;1;6", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 642.75, 344.34457669607633 ], "wc_reply_reviewers_avg": [ 306.5, 509.7354706119636 ], "wc_reply_authors_avg": [ 1090.75, 652.1722836030369 ], "reply_reviewers_avg": [ 1.0, 1.224744871391589 ], "reply_authors_avg": [ 2.75, 2.0463381929681126 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.1348399724926484, "gs_citation": 132, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9947899191992669932&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=O9bnihsFfXU", "email": "berkeley.edu;google.com;berkeley.edu;google.com", "author_num": 4, "aff_unique_index": "0;1;0;1", "aff_unique_norm": "University of California, Berkeley;Google", "aff_unique_dep": ";Google DeepMind", "aff_unique_url": "https://www.berkeley.edu;https://deepmind.com", "aff_unique_abbr": "UC Berkeley;DeepMind", "aff_campus_unique_index": "0;0;2", "aff_campus_unique": "Berkeley;;Mountain View", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "OAdGsaptOXy", "title": "Pretrain Knowledge-Aware Language Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "How much knowledge do pretrained language models hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this paper we incorporate knowledge-awareness in language model pretraining without changing the transformer architecture, inserting explicit knowledge layers, or adding external storage of semantic information. Rather, we simply signal the existence of entities to the input of the transformer in pretraining, with an entity-extended tokenizer; and at the output, with an additional entity prediction task. Our experiments show that solely by adding these entity signals in pretraining, significantly more knowledge is packed into the transformer parameters: we observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing. We also show that our knowledge-aware language model (\\kalm{}) can serve as a drop-in replacement for GPT-2 models, significantly improving downstream tasks like zero-shot question-answering with no task-related training. ", "keywords": "Pretraining;Natural Language Generation;GPT-2;QA;Knowledge Graph", "primary_area": "", "supplementary_material": "", "author": "Corbin L Rosset;Chenyan Xiong;Minh Phan;Xia Song;Paul N. Bennett;saurabh tiwary", "authorids": "~Corbin_L_Rosset1;~Chenyan_Xiong1;phan.minh@microsoft.com;~Xia_Song1;~Paul_N._Bennett1;~saurabh_tiwary1", "gender": "M;M;;M;;M", "homepage": "http://corbyrosset.com/;https://www.cs.cmu.edu/~cx/;;;https://www.microsoft.com/en-us/research/people/pauben/publications/;", "dblp": ";18/10886;;165/6299;33/6188;", "google_scholar": "Y2YBgCsAAAAJ;E9BaEBYAAAAJ;;0aPSv9kAAAAJ;AIncPrIAAAAJ;", "orcid": ";;;;0009-0006-7852-9651;", "linkedin": ";;;xiaso/;paulnbennett/;", "or_profile": "~Corbin_L_Rosset1;~Chenyan_Xiong1;phan.minh@microsoft.com;~Xia_Song1;~Paul_N._Bennett1;~saurabh_tiwary1", "aff": "Microsoft Research;Microsoft Research;;Microsoft;Microsoft;", "aff_domain": "research.microsoft.com;research.microsoft.com;;microsoft.com;microsoft.com;", "position": "Researcher;Senior Researcher;;Researcher;Researcher;", "bibtex": "@misc{\nrosset2021pretrain,\ntitle={Pretrain Knowledge-Aware Language Models},\nauthor={Corbin L Rosset and Chenyan Xiong and Minh Phan and Xia Song and Paul N. Bennett and saurabh tiwary},\nyear={2021},\nurl={https://openreview.net/forum?id=OAdGsaptOXy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=OAdGsaptOXy", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "4;2;2;3", "wc_review": "846;468;214;570", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "829;642;847;442", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 2.75, 0.82915619758885 ], "wc_review_avg": [ 524.5, 226.4039531456993 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 690.0, 164.1477992542087 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.40451991747794525, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:ksMfa3NxCQ4J:scholar.google.com/&scioq=Pretrain+Knowledge-Aware+Language+Models&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Microsoft Research", "aff_unique_url": "https://www.microsoft.com/en-us/research", "aff_unique_abbr": "MSR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "OBI5QuStBz3", "title": "Improved Communication Lower Bounds for Distributed Optimisation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Motivated by the interest in communication-efficient methods for distributed machine learning, we consider the communication complexity of minimising a sum of $d$-dimensional functions $\\sum_{i = 1}^N f_i (x)$, where each function $f_i$ is held by one of the $N$ different machines. Such tasks arise naturally in large-scale optimisation, where a standard solution is to apply variants of (stochastic) gradient descent. As our main result, we show that $\\Omega( Nd \\log d / \\varepsilon)$ bits in total need to be communicated between the machines to find an additive $\\epsilon$-approximation to the minimum of $\\sum_{i = 1}^N f_i (x)$. The results holds for deterministic algorithms, and randomised algorithms under some restrictions on the parameter values. Importantly, our lower bounds require no assumptions on the structure of the algorithm, and are matched within constant factors for strongly convex objectives by a new variant of quantised gradient descent. The lower bounds are obtained by bringing over tools from communication complexity to distributed optimisation, an approach we hope will find further use in future.\n", "keywords": "distributed optimization;lower bounds;upper bounds;communication complexity", "primary_area": "", "supplementary_material": "", "author": "Janne H. Korhonen;Dan Alistarh", "authorids": "~Janne_H._Korhonen2;~Dan_Alistarh7", "gender": ";M", "homepage": "https://pub.ist.ac.at/~jkorhone/;http://people.csail.mit.edu/alistarh/", "dblp": "115/4379;36/3251.html", "google_scholar": "NvQiBAsAAAAJ;https://scholar.google.com.tw/citations?user=75q-6ZQAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Janne_H._Korhonen2;~Dan_Alistarh1", "aff": "Institute of Science and Technology Austria;Institute of Science and Technology Austria", "aff_domain": "ist.ac.at;ist.ac.at", "position": "Postdoc;Assistant Professor", "bibtex": "@misc{\nkorhonen2021improved,\ntitle={Improved Communication Lower Bounds for Distributed Optimisation},\nauthor={Janne H. Korhonen and Dan Alistarh},\nyear={2021},\nurl={https://openreview.net/forum?id=OBI5QuStBz3}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=OBI5QuStBz3", "pdf_size": 0, "rating": "5;5;6", "confidence": "3;2;3", "wc_review": "225;419;796", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "443;288;893", "reply_reviewers": "0;0;0", "reply_authors": "1;1;2", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 2.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 480.0, 237.06679790022613 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 541.3333333333334, 256.59089790732816 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:uGzOBW4v6XUJ:scholar.google.com/&scioq=Improved+Communication+Lower+Bounds+for+Distributed+Optimisation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Institute of Science and Technology Austria", "aff_unique_dep": "", "aff_unique_url": "https://www.ist.ac.at", "aff_unique_abbr": "IST Austria", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Austria" }, { "id": "OCRKCul3eKN", "title": "Addressing Extrapolation Error in Deep Offline Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement learning (RL) encompasses both online and offline regimes. Unlike its online counterpart, offline RL agents are trained using logged-data only, without interaction with the environment. Therefore, offline RL is a promising direction for real-world applications, such as healthcare, where repeated interaction with environments is prohibitive. However, since offline RL losses often involve evaluating state-action pairs not well-covered by training data, they can suffer due to the errors introduced when the function approximator attempts to extrapolate those pairs' value. These errors can be compounded by bootstrapping when the function approximator overestimates, leading the value function to *grow unbounded*, thereby crippling learning. In this paper, we introduce a three-part solution to combat extrapolation errors: (i) behavior value estimation, (ii) ranking regularization, and (iii) reparametrization of the value function. We provide ample empirical evidence on the effectiveness of our method, showing state of the art performance on the RL Unplugged (RLU) ATARI dataset. Furthermore, we introduce new datasets for bsuite as well as partially observable DeepMind Lab environments, on which our method outperforms state of the art offline RL algorithms. \n", "keywords": "Addressing Extrapolation Error in Deep Offline Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Caglar Gulcehre;Sergio G\u00f3mez Colmenarejo;ziyu wang;Jakub Sygnowski;Thomas Paine;Konrad Zolna;Yutian Chen;Matthew Hoffman;Razvan Pascanu;Nando de Freitas", "authorids": "~Caglar_Gulcehre1;~Sergio_G\u00f3mez_Colmenarejo1;~ziyu_wang1;~Jakub_Sygnowski1;~Thomas_Paine1;~Konrad_Zolna1;~Yutian_Chen1;~Matthew_Hoffman1;~Razvan_Pascanu1;~Nando_de_Freitas1", "gender": "M;M;M;M;M;Unspecified;;;M;M", "homepage": "http://caglarg.com;https://www.researchgate.net/profile/Sergio_Gomez35;;;http://tomlepaine.github.io;;http://yutianchen.com/;;https://razp.info;", "dblp": "125/2132;172/1121.html;;180/5787;139/1033;;95/7441-1;92/794;65/8368.html;http://dblp.uni-trier.de/pers/hd/f/Freitas:Nando_de", "google_scholar": "https://scholar.google.ca/citations?user=7hwJ2ckAAAAJ;https://scholar.google.co.uk/citations?user=0Dkf68EAAAAJ;;_Iz9Z0sAAAAJ;oFIvUSQAAAAJ;https://scholar.google.ca/citations?user=Kg_f9PwAAAAJ;fAWKizAAAAAJ;https://scholar.google.co.uk/citations?user=n2osZaoAAAAJ;https://scholar.google.ca/citations?user=eSPY8LwAAAAJ;nzEluBwAAAAJ", "orcid": ";0000-0002-2699-9858;;;;;;;;", "linkedin": ";sergio-g%C3%B3mez-colmenarejo-10666131/;;;;http://linkedin.com/in/konradzolna;;;;", "or_profile": "~Caglar_Gulcehre1;~Sergio_G\u00f3mez_Colmenarejo1;~ziyu_wang1;~Jakub_Sygnowski1;~Thomas_Paine1;~Konrad_Zolna1;~Yutian_Chen1;~Matthew_Hoffman1;~Razvan_Pascanu1;~Nando_de_Freitas1", "aff": "Deepmind;;Google;Google DeepMind;Google/DeepMind;Jagiellonian University;Google DeepMind;;Google DeepMind;Google DeepMind", "aff_domain": "google.com;;google.com;deepmind.com;google.com;uj.edu.pl;google.com;;google.com;google.com", "position": "Research Scientist;;Research Scientist;Research engineer;Research Scientist;PhD student;Research Scientist;;Research Scientist;Principal Scientist", "bibtex": "@misc{\ngulcehre2021addressing,\ntitle={Addressing Extrapolation Error in Deep Offline Reinforcement Learning},\nauthor={Caglar Gulcehre and Sergio G{\\'o}mez Colmenarejo and ziyu wang and Jakub Sygnowski and Thomas Paine and Konrad Zolna and Yutian Chen and Matthew Hoffman and Razvan Pascanu and Nando de Freitas},\nyear={2021},\nurl={https://openreview.net/forum?id=OCRKCul3eKN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=OCRKCul3eKN", "pdf_size": 0, "rating": "3;4;4", "confidence": "4;4;4", "wc_review": "586;550;778", "wc_reply_reviewers": "0;141;0", "wc_reply_authors": "420;902;1882", "reply_reviewers": "0;1;0", "reply_authors": "1;2;3", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 638.0, 100.07996802557443 ], "wc_reply_reviewers_avg": [ 47.0, 66.46803743153546 ], "wc_reply_authors_avg": [ 1068.0, 608.291596741782 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.0, 0.816496580927726 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 10, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3527427003222205650&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;1;1;2;1;1;1", "aff_unique_norm": "DeepMind;Google;Jagiellonian University", "aff_unique_dep": ";Google;", "aff_unique_url": "https://deepmind.com;https://www.google.com;https://www.uj.edu.pl", "aff_unique_abbr": "DeepMind;Google;UJ", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;1;0;0;2;0;0;0", "aff_country_unique": "United Kingdom;United States;Poland" }, { "id": "OCm0rwa1lx1", "title": "Addressing Some Limitations of Transformers with Feedback Memory", "track": "main", "status": "Reject", "tldr": "", "abstract": "Transformers have been successfully applied to sequential tasks despite being feedforward networks. Unlike recurrent neural networks, Transformers use attention to capture temporal relations while processing input tokens in parallel. While this parallelization makes them computationally efficient, it restricts the model from fully exploiting the sequential nature of the input. The representation at a given layer can only access representations from lower layers, rather than the higher level representations already available. In this work, we propose the Feedback Transformer architecture that exposes all previous representations to all future representations, meaning the lowest representation of the current timestep is formed from the highest-level abstract representation of the past. We demonstrate on a variety of benchmarks in language modeling, machine translation, and reinforcement learning that the increased representation capacity can create small, shallow models with much stronger performance than comparable Transformers.", "keywords": "Feedback;Memory;Transformers", "primary_area": "", "supplementary_material": "/attachment/8bc2fb75eb8052be1bc1c05315b6bfbf6a440c1a.zip", "author": "Angela Fan;Thibaut Lavril;Edouard Grave;Armand Joulin;Sainbayar Sukhbaatar", "authorids": "~Angela_Fan2;thibautlav@fb.com;~Edouard_Grave1;~Armand_Joulin1;~Sainbayar_Sukhbaatar1", "gender": ";;;;M", "homepage": ";;;;", "dblp": "192/1872;;50/10261;68/8653;56/10550", "google_scholar": "TLZR9zgAAAAJ;;7UV4ET4AAAAJ;kRJkDakAAAAJ;ri1sE34AAAAJ", "orcid": ";;;;", "linkedin": ";;edouard-grave-63099823/;;", "or_profile": "~Angela_Fan2;thibautlav@fb.com;~Edouard_Grave1;~Armand_Joulin1;~Sainbayar_Sukhbaatar1", "aff": "Meta Facebook;;Meta Facebook;Meta Facebook;Meta Facebook", "aff_domain": "facebook.com;;fb.com;fb.com;fb.com", "position": "Research Engineer;;Research Scientist;Full Professor;Research Scientist", "bibtex": "@misc{\nfan2021addressing,\ntitle={Addressing Some Limitations of Transformers with Feedback Memory},\nauthor={Angela Fan and Thibaut Lavril and Edouard Grave and Armand Joulin and Sainbayar Sukhbaatar},\nyear={2021},\nurl={https://openreview.net/forum?id=OCm0rwa1lx1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=OCm0rwa1lx1", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "5;3;5;5", "wc_review": "468;269;1097;653", "wc_reply_reviewers": "322;0;919;146", "wc_reply_authors": "1271;492;3118;693", "reply_reviewers": "1;0;4;1", "reply_authors": "3;2;6;2", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 621.75, 306.1497795197638 ], "wc_reply_reviewers_avg": [ 346.75, 349.5063482971375 ], "wc_reply_authors_avg": [ 1393.5, 1035.8944202958137 ], "reply_reviewers_avg": [ 1.5, 1.5 ], "reply_authors_avg": [ 3.25, 1.6393596310755 ], "replies_avg": [ 25, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 69, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7494329547045277871&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta Platforms, Inc.", "aff_unique_url": "https://meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "ODKwX19UjOj", "title": "Unsupervised Hierarchical Concept Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Discovering concepts (or temporal abstractions) in an unsupervised manner from demonstration data in the absence of an environment is an important problem. Organizing these discovered concepts hierarchically at different levels of abstraction is useful in discovering patterns, building ontologies, and generating tutorials from demonstration data. However, recent work to discover such concepts without access to any environment does not discover relationships (or a hierarchy) between these discovered concepts. In this paper, we present a Transformer-based concept abstraction architecture UNHCLE (pronounced uncle) that extracts a hierarchy of concepts in an unsupervised way from demonstration data. We empirically demonstrate how UNHCLE discovers meaningful hierarchies using datasets from Chess and Cooking domains. Finally, we show how UNHCLE learns meaningful language labels for concepts by using demonstration data augmented with natural language for cooking and chess. All of our code is available at https://github.com/UNHCLE/UNHCLE\n", "keywords": "hierarchical learning;unsupervised learning;unsupervised hierarchical learning;video representation learning;learning from demonstrations", "primary_area": "", "supplementary_material": "", "author": "Sumegh Roychowdhury;Sumedh Anand Sontakke;Mausoom Sarkar;Nikaash Puri;Milan Aggarwal;Pinkesh Badjatiya;Balaji Krishnamurthy;Laurent Itti", "authorids": "~Sumegh_Roychowdhury1;~Sumedh_Anand_Sontakke1;~Mausoom_Sarkar1;~Nikaash_Puri1;~Milan_Aggarwal1;~Pinkesh_Badjatiya1;~Balaji_Krishnamurthy1;~Laurent_Itti1", "gender": "M;M;M;M;M;M;M;M", "homepage": ";https://sumedh7.github.io/;;;;http://pinkeshbadjatiya.github.io/;;http://ilab.usc.edu", "dblp": "246/0200;276/0127;43/6264;;206/6244.html;198/5418;79/1076;31/3256", "google_scholar": "8T4DcYIAAAAJ;https://scholar.google.com/citations?hl=en;N6J7J4IAAAAJ;;YiMNG_QAAAAJ;https://scholar.google.co.in/citations?user=9ICSXBsAAAAJ;n8iUBg8AAAAJ;xhUvqK8AAAAJ", "orcid": ";;;;;;0000-0002-0366-2427;0000-0002-0168-2977", "linkedin": ";sumedh-sontakke-0ab24210a/;;;milan-aggarwal-31a954b5/;;balaji-krishnamurthy-4241695/;", "or_profile": "~Sumegh_Roychowdhury1;~Sumedh_Anand_Sontakke1;~Mausoom_Sarkar1;~Nikaash_Puri1;~Milan_Aggarwal1;~Pinkesh_Badjatiya1;~Balaji_Krishnamurthy1;~Laurent_Itti1", "aff": "Indian Institute of Technology Kharagpur;University of Southern California;Adobe;;Adobe Systems;Adobe;Adobe Systems;University of Southern California", "aff_domain": "iitkgp.ac.in;usc.edu;adobe.com;;adobe.com;adobe.com;adobe.com;usc.edu", "position": "Undergrad student;PhD student;Principal Researcher;;Researcher;Machine Learning Researcher and Engineer 2;Principal Scientist;Professor", "bibtex": "@misc{\nroychowdhury2021unsupervised,\ntitle={Unsupervised Hierarchical Concept Learning},\nauthor={Sumegh Roychowdhury and Sumedh Anand Sontakke and Mausoom Sarkar and Nikaash Puri and Milan Aggarwal and Pinkesh Badjatiya and Balaji Krishnamurthy and Laurent Itti},\nyear={2021},\nurl={https://openreview.net/forum?id=ODKwX19UjOj}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=ODKwX19UjOj", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;3;3", "wc_review": "752;309;189;293", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1862;929;1014;535", "reply_reviewers": "0;0;0;0", "reply_authors": "4;2;2;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 385.75, 216.4155435730068 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1085.0, 483.6336423368416 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 1.0897247358851685 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.9045340337332909, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:kyEFQfPVpcYJ:scholar.google.com/&scioq=Unsupervised+Hierarchical+Concept+Learning&hl=en&as_sdt=0,5", "gs_version_total": 2, "aff_unique_index": "0;1;2;2;2;2;1", "aff_unique_norm": "Indian Institute of Technology Kharagpur;University of Southern California;Adobe", "aff_unique_dep": ";;Adobe Inc.", "aff_unique_url": "https://www.iitkgp.ac.in;https://www.usc.edu;https://www.adobe.com", "aff_unique_abbr": "IIT Kharagpur;USC;Adobe", "aff_campus_unique_index": "0;1;1", "aff_campus_unique": "Kharagpur;Los Angeles;", "aff_country_unique_index": "0;1;1;1;1;1;1", "aff_country_unique": "India;United States" }, { "id": "OEgDatKuz2O", "title": "EMTL: A Generative Domain Adaptation Approach", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose an unsupervised domain adaptation approach based on generative models. We show that when the source probability density function can be learned, one-step Expectation\u2013Maximization iteration plus an additional marginal density function constraint will produce a proper mediator probability density function to bridge the gap between the source and target domains. The breakthrough is based on modern generative models (autoregressive mixture density nets) that are competitive to discriminative models on moderate-dimensional classification problems. By decoupling the source density estimation from the adaption steps, we can design a domain adaptation approach where the source data is locked away after being processed only once, opening the door to transfer when data security or privacy concerns impede the use of traditional domain adaptation. We demonstrate that our approach can achieve state-of-the-art performance on synthetic and real data sets, without accessing the source data at the adaptation phase.", "keywords": "unsupervised domain adaptation;EM;generative model;density estimation;deep learning;transfer learning", "primary_area": "", "supplementary_material": "", "author": "Jianfeng Zhang;Illyyne Saffar;Aladin Virmaux;Bal\u00e1zs K\u00e9gl", "authorids": "~Jianfeng_Zhang2;~Illyyne_Saffar1;~Aladin_Virmaux1;~Bal\u00e1zs_K\u00e9gl2", "gender": "M;F;;M", "homepage": ";;https://avirmaux.github.io;https://scholar.google.com/citations?user=s0njcGgAAAAJ&hl=en&oi=ao", "dblp": "74/5065;;192/8303;k/BalazsKegl.html", "google_scholar": "_Wzsb6YAAAAJ;https://scholar.google.com/citations?hl=fr;5FxvLvwAAAAJ;s0njcGgAAAAJ", "orcid": ";;;", "linkedin": ";illyynesaffar/;;balazskegl", "or_profile": "~Jianfeng_Zhang2;~Illyyne_Saffar1;~Aladin_Virmaux1;~Balazs_Kegl1", "aff": "Huawei Technologies Ltd.;Huawei Technologies Ltd.;Huawei Technologies Ltd.;CNRS (on leave)", "aff_domain": "huawei.com;huawei.com;huawei.com;in2p3.fr", "position": "Researcher;Researcher;Researcher;Principal Researcher", "bibtex": "@misc{\nzhang2021emtl,\ntitle={{\\{}EMTL{\\}}: A Generative Domain Adaptation Approach},\nauthor={Jianfeng Zhang and Illyyne Saffar and Aladin Virmaux and Bal{\\'a}zs K{\\'e}gl},\nyear={2021},\nurl={https://openreview.net/forum?id=OEgDatKuz2O}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=OEgDatKuz2O", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "5;5;3;4", "wc_review": "545;492;735;386", "wc_reply_reviewers": "0;122;0;0", "wc_reply_authors": "718;478;416;176", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 539.5, 126.55927465026022 ], "wc_reply_reviewers_avg": [ 30.5, 52.827549630850754 ], "wc_reply_authors_avg": [ 447.0, 192.87560758167425 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.6363636363636364, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Yh_HYlxIz6AJ:scholar.google.com/&scioq=EMTL:+A+Generative+Domain+Adaptation+Approach&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Huawei;CNRS", "aff_unique_dep": "Huawei Technologies;", "aff_unique_url": "https://www.huawei.com;https://www.cnrs.fr", "aff_unique_abbr": "Huawei;CNRS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1", "aff_country_unique": "China;France" }, { "title": "Training independent subnetworks for robust prediction", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3062", "id": "OGg9XnKxFAH", "poster": "", "openreview": "https://openreview.net/forum?id=OGg9XnKxFAH", "slides": "https://iclr.cc/virtual/2021/poster/3062", "video": "https://iclr.cc/virtual/2021/poster/3062", "author_site": "Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Dai, Dustin Tran", "tldr": "", "abstract": "Recent approaches to efficiently ensemble neural networks have shown that strong robustness and uncertainty performance can be achieved with a negligible gain in parameters over the original network. However, these methods still require multiple forward passes for prediction, leading to a significant runtime cost. In this work, we show a surprising result:\nthe benefits of using multiple predictions can be achieved 'for free' under a single model's forward pass. In particular, we show that, using a multi-input multi-output (MIMO) configuration, one can utilize a single model's capacity to train multiple subnetworks that independently learn the task at hand. By ensembling the predictions made by the subnetworks, we improve model robustness without increasing compute. We observe a significant improvement in negative log-likelihood, accuracy, and calibration error on CIFAR10, CIFAR100, ImageNet, and their out-of-distribution variants compared to previous methods.", "keywords": "Efficient ensembles;robustness", "primary_area": "", "supplementary_material": "/attachment/d7e50ad0607378a41e238b0c3870f1d3c4f15066.zip", "author": "Marton Havasi;Rodolphe Jenatton;Stanislav Fort;Jeremiah Zhe Liu;Jasper Snoek;Balaji Lakshminarayanan;Andrew Mingbo Dai;Dustin Tran", "authorids": "~Marton_Havasi1;~Rodolphe_Jenatton3;~Stanislav_Fort1;~Jeremiah_Zhe_Liu1;~Jasper_Snoek1;~Balaji_Lakshminarayanan1;~Andrew_Mingbo_Dai1;~Dustin_Tran1", "gender": "M;M;M;M;M;M;M;", "homepage": "https://mhavasi.github.io/;http://rodolphejenatton.com/;http://stanford.edu/~sfort1/;;;http://www.gatsby.ucl.ac.uk/~balaji/;;http://dustintran.com", "dblp": "222/3332;68/8398;205/3072;199/2301;95/6097;71/8324;59/9736;", "google_scholar": "EaYZfmoAAAAJ;QIR6rygAAAAJ;https://scholar.google.cz/citations?user=eu2Kzn0AAAAJ;9jrmcG4AAAAJ;FM2DTXwAAAAJ;QYn8RbgAAAAJ;2r2NuDAAAAAJ;wVazIm8AAAAJ", "orcid": ";;;;;;;", "linkedin": "marton-havasi/;;stanislav-fort-38199a58/;;;;andrewdai/;", "or_profile": "~Marton_Havasi1;~Rodolphe_Jenatton3;~Stanislav_Fort1;~Jeremiah_Zhe_Liu1;~Jasper_Snoek1;~Balaji_Lakshminarayanan1;~Andrew_Mingbo_Dai1;~Dustin_Tran1", "aff": "University of Cambridge;Google;Stanford University;Harvard University;Google;Google Brain;Google;Google", "aff_domain": "cam.ac.uk;google.com;stanford.edu;harvard.edu;google.com;google.com;google.com;google.com", "position": "PhD student;Senior research scientist;PhD student;Visiting Scientist;Research Scientist;Research Scientist;Software Engineer;Research Scientist", "bibtex": "@inproceedings{\nhavasi2021training,\ntitle={Training independent subnetworks for robust prediction},\nauthor={Marton Havasi and Rodolphe Jenatton and Stanislav Fort and Jeremiah Zhe Liu and Jasper Snoek and Balaji Lakshminarayanan and Andrew Mingbo Dai and Dustin Tran},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OGg9XnKxFAH}\n}", "github": "[![github](/images/github_icon.svg) google/uncertainty-baselines](https://github.com/google/uncertainty-baselines) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=OGg9XnKxFAH)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "4;4;3;4", "wc_review": "528;264;276;148", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "611;227;254;141", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 304.0, 138.65064009949612 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 308.25, 179.7044448532089 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 252, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9264084238315698016&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=OGg9XnKxFAH", "email": "cam.ac.uk;google.com;stanford.edu;harvard.edu;google.com;google.com;google.com;google.com", "author_num": 8, "aff_unique_index": "0;1;2;3;1;1;1;1", "aff_unique_norm": "University of Cambridge;Google;Stanford University;Harvard University", "aff_unique_dep": ";Google;;", "aff_unique_url": "https://www.cam.ac.uk;https://www.google.com;https://www.stanford.edu;https://www.harvard.edu", "aff_unique_abbr": "Cambridge;Google;Stanford;Harvard", "aff_campus_unique_index": "0;1;2;1;1;1;1", "aff_campus_unique": "Cambridge;Mountain View;Stanford;", "aff_country_unique_index": "0;1;1;1;1;1;1;1", "aff_country_unique": "United Kingdom;United States" }, { "title": "Efficient Wasserstein Natural Gradients for Reinforcement Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2903", "id": "OHgnfSrn2jv", "poster": "", "openreview": "https://openreview.net/forum?id=OHgnfSrn2jv", "slides": "https://iclr.cc/virtual/2021/poster/2903", "video": "https://iclr.cc/virtual/2021/poster/2903", "author_site": "Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton", "tldr": "", "abstract": "A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient \\emph{Wasserstein natural gradient} (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. This method follows the recent theme in RL of including divergence penalties in the objective to establish trust regions. Experiments on challenging tasks demonstrate improvements in both computational cost and performance over advanced baselines. \n", "keywords": "reinforcement learning;optimization", "primary_area": "", "supplementary_material": "/attachment/d61488838b20822f933deae31ba59fc558d11393.zip", "author": "Ted Moskovitz;Michael Arbel;Ferenc Huszar;Arthur Gretton", "authorids": "~Ted_Moskovitz1;~Michael_Arbel1;~Ferenc_Huszar1;~Arthur_Gretton1", "gender": "M;M;M;M", "homepage": "https://tedmoskovitz.github.io/;https://michaelarbel.github.io/;;http://www.gatsby.ucl.ac.uk/~gretton/", "dblp": ";200/8609;http://dblp.uni-trier.de/pers/hd/h/Huszar:Ferenc;56/2574", "google_scholar": "pPVXrTYAAAAJ;NsOqVtkAAAAJ;https://scholar.google.co.uk/citations?user=koQCVT4AAAAJ;OUv7J6QAAAAJ", "orcid": ";;;", "linkedin": ";michael-arbel-0a38a655/;;", "or_profile": "~Ted_Moskovitz1;~Michael_Arbel1;~Ferenc_Huszar1;~Arthur_Gretton1", "aff": "Gatsby Computational Neuroscience Unit;University College London;University of Cambridge;University College London", "aff_domain": "gatsby.ucl.ac.uk;ucl.ac.uk;cam.ac.uk;ucl.ac.uk", "position": "PhD student;PhD student;Associate Professor;Professor", "bibtex": "@inproceedings{\nmoskovitz2021efficient,\ntitle={Efficient Wasserstein Natural Gradients for Reinforcement Learning},\nauthor={Ted Moskovitz and Michael Arbel and Ferenc Huszar and Arthur Gretton},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OHgnfSrn2jv}\n}", "github": "[![github](/images/github_icon.svg) tedmoskovitz/WNPG](https://github.com/tedmoskovitz/WNPG)", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;6;8", "confidence": "2;4;4", "wc_review": "164;635;291", "wc_reply_reviewers": "0;387;0", "wc_reply_authors": "686;1322;333", "reply_reviewers": "0;2;0", "reply_authors": "1;3;1", "rating_avg": [ 6.333333333333333, 1.247219128924647 ], "confidence_avg": [ 3.3333333333333335, 0.9428090415820634 ], "wc_review_avg": [ 363.3333333333333, 198.97124303666487 ], "wc_reply_reviewers_avg": [ 129.0, 182.43354954612926 ], "wc_reply_authors_avg": [ 780.3333333333334, 409.2304430947869 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.9428090415820634 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.7559289460184546, "gs_citation": 27, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18097668228161879279&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=OHgnfSrn2jv", "email": "gatsby.ucl.ac.uk;ucl.ac.uk;cam.ac.uk;ucl.ac.uk", "author_num": 4, "aff_unique_index": "0;0;1;0", "aff_unique_norm": "University College London;University of Cambridge", "aff_unique_dep": "Gatsby Computational Neuroscience Unit;", "aff_unique_url": "https://www.ucl.ac.uk;https://www.cam.ac.uk", "aff_unique_abbr": "UCL;Cambridge", "aff_campus_unique_index": "1", "aff_campus_unique": ";Cambridge", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "OItp-Avs6Iy", "title": "Concentric Spherical GNN for 3D Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning 3D representations that generalize well to arbitrarily oriented inputs is a challenge of practical importance in applications varying from computer vision to physics and chemistry.\nWe propose a novel multi-resolution convolutional architecture for learning over concentric spherical feature maps, of which the single sphere representation is a special case.\nOur hierarchical architecture is based on alternatively learning to incorporate both intra-sphere and inter-sphere information. \nWe show the applicability of our method for two different types of 3D inputs, mesh objects, which can be regularly sampled, and point clouds, which are irregularly distributed. \nWe also propose an efficient mapping of point clouds to concentric spherical images using radial basis functions, thereby bridging spherical convolutions on grids with general point clouds.\nWe demonstrate the effectiveness of our approach in achieving state-of-the-art performance on 3D classification tasks with rotated data.", "keywords": "spherical cnn;GNN;graph convolution;rotation equivariance;3D", "primary_area": "", "supplementary_material": "/attachment/cdcf033b48d57483e9c016ec7be86e4d015a8c42.zip", "author": "James S Fox;Bo Zhao;Sivasankaran Rajamanickam;Rampi Ramprasad;Le Song", "authorids": "~James_S_Fox1;bzhao68@gatech.edu;srajama@sandia.gov;rampi.ramprasad@mse.gatech;~Le_Song1", "gender": "M;;;;M", "homepage": ";;;;http://www.cc.gatech.edu/~lsong", "dblp": ";;;;94/3481", "google_scholar": "gYa-FTQAAAAJ;;;;Xl4E0CsAAAAJ", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~James_S_Fox1;bzhao68@gatech.edu;srajama@sandia.gov;rampi.ramprasad@mse.gatech;~Le_Song1", "aff": "University of California, Berkeley;;;;College of Computing, Georgia Institute of Technology", "aff_domain": "berkeley.edu;;;;cc.gatech.edu", "position": "Undergrad student;;;;Associate Professor", "bibtex": "@misc{\nfox2021concentric,\ntitle={Concentric Spherical {\\{}GNN{\\}} for 3D Representation Learning},\nauthor={James S Fox and Bo Zhao and Sivasankaran Rajamanickam and Rampi Ramprasad and Le Song},\nyear={2021},\nurl={https://openreview.net/forum?id=OItp-Avs6Iy}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=OItp-Avs6Iy", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "4;3;3;4", "wc_review": "355;212;334;416", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "546;411;411;471", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 329.25, 74.09242538883446 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 459.75, 55.494932201057786 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15715060486956336768&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1", "aff_unique_norm": "University of California, Berkeley;Georgia Institute of Technology", "aff_unique_dep": ";College of Computing", "aff_unique_url": "https://www.berkeley.edu;https://www.gatech.edu", "aff_unique_abbr": "UC Berkeley;Georgia Tech", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Berkeley;Atlanta", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "OJiM1R3jAtZ", "title": "AWAC: Accelerating Online Reinforcement Learning with Offline Datasets", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement learning provides an appealing formalism for learning control policies from experience. However, the classic active formulation of reinforcement learning necessitates a lengthy active exploration process for each behavior, making it difficult to apply in real-world settings. If we can instead allow reinforcement learning to effectively use previously collected data to aid the online learning process, where the data could be expert demonstrations or more generally any prior experience, we could make reinforcement learning a substantially more practical tool. While a number of recent methods have sought to learn offline from previously collected data, it remains exceptionally difficult to train a policy with offline data and improve it further with online reinforcement learning. In this paper we systematically analyze why this problem is so challenging, and propose an algorithm that combines sample-efficient dynamic programming with maximum likelihood policy updates, providing a simple and effective framework that is able to leverage large amounts of offline data and then quickly perform online fine-tuning of reinforcement learning policies. We show that our method enables rapid learning of skills with a combination of prior demonstration data and online experience across a suite of difficult dexterous manipulation and benchmark tasks.", "keywords": "reinforcement learning", "primary_area": "", "supplementary_material": "", "author": "Ashvin Nair;Murtaza Dalal;Abhishek Gupta;Sergey Levine", "authorids": "~Ashvin_Nair1;~Murtaza_Dalal1;~Abhishek_Gupta1;~Sergey_Levine1", "gender": "M;M;M;M", "homepage": "http://ashvin.me/;https://mihdalal.github.io/;https://homes.cs.washington.edu/~abhgupta/;https://people.eecs.berkeley.edu/~svlevine/", "dblp": "182/2436;215/5516;18/6404-4;80/7594", "google_scholar": "BsOkXDsAAAAJ;5dBp2f4AAAAJ;1wLVDP4AAAAJ;8R35rCwAAAAJ", "orcid": ";;;", "linkedin": ";murtaza-dalal-9b397a89/;;", "or_profile": "~Ashvin_Nair1;~Murtaza_Dalal1;~Abhishek_Gupta1;~Sergey_Levine1", "aff": "University of California, Berkeley;Carnegie Mellon University;University of California, Berkeley;Google", "aff_domain": "berkeley.edu;cmu.edu;berkeley.edu;google.com", "position": "PhD student;PhD student;PhD student;Research Scientist", "bibtex": "@misc{\nnair2021awac,\ntitle={{\\{}AWAC{\\}}: Accelerating Online Reinforcement Learning with Offline Datasets},\nauthor={Ashvin Nair and Murtaza Dalal and Abhishek Gupta and Sergey Levine},\nyear={2021},\nurl={https://openreview.net/forum?id=OJiM1R3jAtZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer5;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=OJiM1R3jAtZ", "pdf_size": 0, "rating": "3;4;6;6;6", "confidence": "4;5;4;4;3", "wc_review": "193;566;327;278;341", "wc_reply_reviewers": "0;0;0;47;0", "wc_reply_authors": "316;619;322;300;315", "reply_reviewers": "0;0;0;1;0", "reply_authors": "3;3;1;2;1", "rating_avg": [ 5.0, 1.2649110640673518 ], "confidence_avg": [ 4.0, 0.6324555320336759 ], "wc_review_avg": [ 341.0, 123.84990916427836 ], "wc_reply_reviewers_avg": [ 9.4, 18.8 ], "wc_reply_authors_avg": [ 374.4, 122.51465218495296 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 2.0, 0.8944271909999159 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.49999999999999994, "gs_citation": 697, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17761014529068732377&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "University of California, Berkeley;Carnegie Mellon University;Google", "aff_unique_dep": ";;Google", "aff_unique_url": "https://www.berkeley.edu;https://www.cmu.edu;https://www.google.com", "aff_unique_abbr": "UC Berkeley;CMU;Google", "aff_campus_unique_index": "0;0;2", "aff_campus_unique": "Berkeley;;Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "OLOr1K5zbDu", "title": "Triple-Search: Differentiable Joint-Search of Networks, Precision, and Accelerators", "track": "main", "status": "Reject", "tldr": "", "abstract": "The record-breaking performance and prohibitive complexity of deep neural networks (DNNs) have ignited a substantial need for customized DNN accelerators which have the potential to boost DNN acceleration efficiency by orders-of-magnitude. While it has been recognized that maximizing DNNs' acceleration efficiency requires a joint design/search for three different yet highly coupled aspects, including the networks, adopted precision, and their accelerators, the challenges associated with such a joint search have not yet been fully discussed and addressed. First, to jointly search for a network and its precision via differentiable search, there exists a dilemma of whether to explode the memory consumption or achieve sub-optimal designs. Second, a generic and differentiable joint search of the networks and their accelerators is non-trivial due to (1) the discrete nature of the accelerator space and (2) the difficulty of obtaining operation-wise hardware cost penalties because some accelerator parameters are determined by the whole network. To this end, we propose a Triple-Search (TRIPS) framework to address the aforementioned challenges towards jointly searching for the network structure, precision, and accelerator in a differentiable manner, to efficiently and effectively explore the huge joint search space. Our TRIPS addresses the first challenge above via a heterogeneous sampling strategy to achieve unbiased search with constant memory consumption, and tackles the latter one using a novel co-search pipeline that integrates a generic differentiable accelerator search engine. Extensive experiments and ablation studies validate that both TRIPS generated networks and accelerators consistently outperform state-of-the-art (SOTA) designs (including co-search/exploration techniques, hardware-aware NAS methods, and DNN accelerators), in terms of search time, task accuracy, and accelerator efficiency. All codes will be released upon acceptance.", "keywords": "neural architecture search;network hardware co-design", "primary_area": "", "supplementary_material": "", "author": "Yonggan Fu;Yongan Zhang;Haoran You;Yingyan Lin", "authorids": "~Yonggan_Fu1;~Yongan_Zhang1;~Haoran_You1;~Yingyan_Lin1", "gender": "M;M;M;F", "homepage": "https://www.yongganfu.com/;;http://haoranyou.com/;https://eiclab.scs.gatech.edu/", "dblp": "244/8166;137/8349;230/4247;120/6981", "google_scholar": "https://scholar.google.com/citations?hl=en;s3Qbrl0AAAAJ;z5Eku1sAAAAJ;dio8IesAAAAJ", "orcid": ";;0000-0002-2873-2153;", "linkedin": "yonggan-fu-b211831b0;yongan-zhang-141a71136/;haoran-you-b4b958165/;yingyan-celine-lin-a281211a/", "or_profile": "~Yonggan_Fu1;~Yongan_Zhang1;~Haoran_You1;~Yingyan_Lin1", "aff": "Rice University;Rice University;Rice University;Rice University", "aff_domain": "rice.edu;rice.edu;rice.edu;rice.edu", "position": "PhD student;MS student;PhD student;Assistant Professor", "bibtex": "@misc{\nfu2021triplesearch,\ntitle={Triple-Search: Differentiable Joint-Search of Networks, Precision, and Accelerators},\nauthor={Yonggan Fu and Yongan Zhang and Haoran You and Yingyan Lin},\nyear={2021},\nurl={https://openreview.net/forum?id=OLOr1K5zbDu}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=OLOr1K5zbDu", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "5;4;2;3", "wc_review": "428;322;814;215", "wc_reply_reviewers": "0;0;0;177", "wc_reply_authors": "1424;906;497;1015", "reply_reviewers": "0;0;0;2", "reply_authors": "2;2;1;3", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 444.75, 226.09663310186642 ], "wc_reply_reviewers_avg": [ 44.25, 76.64324823492282 ], "wc_reply_authors_avg": [ 960.5, 330.00189393395914 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8944271909999159, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:I8gpyrP4Bq8J:scholar.google.com/&scioq=Triple-Search:+Differentiable+Joint-Search+of+Networks,+Precision,+and+Accelerators&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Rice University", "aff_unique_dep": "", "aff_unique_url": "https://www.rice.edu", "aff_unique_abbr": "Rice", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "OLrVttqVt2", "title": "Model-Targeted Poisoning Attacks with Provable Convergence", "track": "main", "status": "Reject", "tldr": "", "abstract": "In a poisoning attack, an adversary with control over a small fraction of the training data attempts to select that data in a way that induces a model that misbehaves in a particular way desired by the adversary, such as misclassifying certain inputs. We propose an efficient poisoning attack that can target a desired model based on online convex optimization. Unlike previous model-targeted poisoning attacks, our attack comes with provable convergence to any achievable target classifier. The distance from the induced classifier to the target classifier is inversely proportional to the square root of the number of poisoning points. We also provide a lower bound on the minimum number of poisoning points needed to achieve a given target classifier. Our attack is the first model-targeted poisoning attack that provides provable convergence, and in our experiments it either exceeds or matches the best state-of-the-art attacks in terms of attack success rate and distance to the target model. In addition, as an online attack our attack can incrementally determine nearly optimal poisoning points. ", "keywords": "adversarial machine learning;data poisoning attack;convergence", "primary_area": "", "supplementary_material": "/attachment/f0130968943d7a2b6010ce8413878612dbd83676.zip", "author": "Fnu Suya;Saeed Mahloujifar;David Evans;Yuan Tian", "authorids": "~Fnu_Suya1;~Saeed_Mahloujifar1;~David_Evans1;~Yuan_Tian2", "gender": "M;M;Not Specified;F", "homepage": "https://fsuya.org;https://www.cs.virginia.edu/~sm5fd/;https://www.cs.virginia.edu/evans/;https://www.ytian.info/", "dblp": "211/7696;208/0825;https://dblp.uni-trier.de/pid/e/DavidEvans;", "google_scholar": "OmLIG8EAAAAJ;kW-hl3YAAAAJ;DsR4PucAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Fnu_Suya1;~Saeed_Mahloujifar1;~David_Evans1;~Yuan_Tian2", "aff": "University of Virginia;Princeton University;University of Virginia;University of Virginia", "aff_domain": "virginia.edu;princeton.edu;virginia.edu;virginia.edu", "position": "PhD student;Postdoc;Professor;Assistant Professor", "bibtex": "@misc{\nsuya2021modeltargeted,\ntitle={Model-Targeted Poisoning Attacks with Provable Convergence},\nauthor={Fnu Suya and Saeed Mahloujifar and David Evans and Yuan Tian},\nyear={2021},\nurl={https://openreview.net/forum?id=OLrVttqVt2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=OLrVttqVt2", "pdf_size": 0, "rating": "3;5;6;7", "confidence": "4;4;3;5", "wc_review": "348;521;483;430", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1458;813;651;379", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.25, 1.479019945774904 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 445.5, 64.90955245570562 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 825.25, 396.8704921003828 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.23904572186687872, "gs_citation": 53, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1651990358981165914&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 10, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "University of Virginia;Princeton University", "aff_unique_dep": ";", "aff_unique_url": "https://www.virginia.edu;https://www.princeton.edu", "aff_unique_abbr": "UVA;Princeton", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Model-Based Offline Planning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3240", "id": "OMNB1G5xzd4", "poster": "", "openreview": "https://openreview.net/forum?id=OMNB1G5xzd4", "slides": "https://iclr.cc/virtual/2021/poster/3240", "video": "https://iclr.cc/virtual/2021/poster/3240", "author_site": "Arthur Argenson, Gabriel Dulac-Arnold", "tldr": "", "abstract": "Offline learning is a key part of making reinforcement learning (RL) useable in real systems. Offline RL looks at scenarios where there is data from a system's operation, but no direct access to the system when learning a policy. Recent work on training RL policies from offline data has shown results both with model-free policies learned directly from the data, or with planning on top of learnt models of the data. Model-free policies tend to be more performant, but are more opaque, harder to command externally, and less easy to integrate into larger systems. We propose an offline learner that generates a model that can be used to control the system directly through planning. This allows us to have easily controllable policies directly from data, without ever interacting with the system. We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of robotics-inspired tasks, and demonstrate its ability leverage planning to respect environmental constraints. We are able to find near-optimal polices for certain simulated systems from as little as 50 seconds of real-time system interaction, and create zero-shot goal-conditioned policies on a series of environments.", "keywords": "off-line reinforcement learning;model-based reinforcement learning;model-based control;reinforcement learning;model predictive control;robotics", "primary_area": "", "supplementary_material": "/attachment/452d20f08756e74fb292f0375b666cfd30aa6062.zip", "author": "Arthur Argenson;Gabriel Dulac-Arnold", "authorids": "aarg@google.com;~Gabriel_Dulac-Arnold1", "gender": ";M", "homepage": ";http://gabe.squirrelsoup.net", "dblp": ";58/9457", "google_scholar": ";https://scholar.google.fr/citations?user=KxaYraAAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "aarg@google.com;~Gabriel_Dulac-Arnold1", "aff": ";Google Research", "aff_domain": ";google.com", "position": ";Researcher", "bibtex": "@inproceedings{\nargenson2021modelbased,\ntitle={Model-Based Offline Planning},\nauthor={Arthur Argenson and Gabriel Dulac-Arnold},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OMNB1G5xzd4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;5;7;8", "confidence": "5;3;4;4", "wc_review": "829;363;626;307", "wc_reply_reviewers": "0;0;49;0", "wc_reply_authors": "487;0;2032;126", "reply_reviewers": "0;0;1;0", "reply_authors": "1;0;4;1", "rating_avg": [ 6.25, 1.299038105676658 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 531.25, 209.89804072453845 ], "wc_reply_reviewers_avg": [ 12.25, 21.21762239271875 ], "wc_reply_authors_avg": [ 661.25, 811.3357427723741 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 1.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 170, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8262040116610023194&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=OMNB1G5xzd4", "email": ";google.com", "author_num": 2, "aff_unique_index": "0", "aff_unique_norm": "Google", "aff_unique_dep": "Google Research", "aff_unique_url": "https://research.google", "aff_unique_abbr": "Google Research", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Active Contrastive Learning of Audio-Visual Video Representations", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2900", "id": "OMizHuea_HB", "poster": "", "openreview": "https://openreview.net/forum?id=OMizHuea_HB", "slides": "https://iclr.cc/virtual/2021/poster/2900", "video": "https://iclr.cc/virtual/2021/poster/2900", "author_site": "Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song", "tldr": "", "abstract": "Contrastive learning has been shown to produce generalizable representations of audio and visual data by maximizing the lower bound on the mutual information (MI) between different views of an instance. However, obtaining a tight lower bound requires a sample size exponential in MI and thus a large set of negative samples. We can incorporate more samples by building a large queue-based dictionary, but there are theoretical limits to performance improvements even with a large number of negative samples. We hypothesize that random negative sampling leads to a highly redundant dictionary that results in suboptimal representations for downstream tasks. In this paper, we propose an active contrastive learning approach that builds an actively sampled dictionary with diverse and informative items, which improves the quality of negative samples and improves performances on tasks where there is high mutual information in the data, e.g., video classification. Our model achieves state-of-the-art performance on challenging audio and visual downstream benchmarks including UCF101, HMDB51 and ESC50. ", "keywords": "self-supervised learning;contrastive representation learning;active learning;audio-visual representation;video recognition", "primary_area": "", "supplementary_material": "", "author": "Shuang Ma;Zhaoyang Zeng;Daniel McDuff;Yale Song", "authorids": "~Shuang_Ma3;~Zhaoyang_Zeng1;~Daniel_McDuff1;~Yale_Song1", "gender": "M;M;M;F", "homepage": ";http://alumni.media.mit.edu/~djmcduff/;https://people.csail.mit.edu/yalesong;https://www.shuangma.me/", "dblp": ";63/9606;31/9606.html;98/3906", "google_scholar": ";m7Jr-b4AAAAJ;dNHNpxoAAAAJ;IHPRZuMAAAAJ", "orcid": ";;;", "linkedin": "%E5%85%86%E9%98%B3-%E6%9B%BE-1a505291/;;;", "or_profile": "~Zhaoyang_Zeng1;~Daniel_McDuff1;~Yale_Song1;~shuang_ma1", "aff": "SUN YAT-SEN UNIVERSITY;Microsoft;Microsoft Research;Microsoft", "aff_domain": "sysu.edu.cn;microsoft.com;microsoft.com;microsoft.com", "position": "PhD student;Principal Researcer;Researcher;Senior Research Scientist", "bibtex": "@inproceedings{\nma2021active,\ntitle={Active Contrastive Learning of Audio-Visual Video Representations},\nauthor={Shuang Ma and Zhaoyang Zeng and Daniel McDuff and Yale Song},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OMizHuea_HB}\n}", "github": "[![github](/images/github_icon.svg) yunyikristy/CM-ACC](https://github.com/yunyikristy/CM-ACC)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;3;5;3", "wc_review": "283;420;898;335", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "161;287;157;108", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 484.0, 243.97438390126123 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 178.25, 66.16409524810265 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.17407765595569782, "gs_citation": 122, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1763906632624707840&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=OMizHuea_HB", "email": "sysu.edu.cn;microsoft.com;microsoft.com;microsoft.com", "author_num": 4, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "Sun Yat-sen University;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "http://www.sysu.edu.cn;https://www.microsoft.com", "aff_unique_abbr": "SYSU;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;1", "aff_country_unique": "China;United States" }, { "title": "Temporally-Extended \u03b5-Greedy Exploration", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3210", "id": "ONBPHFZ7zG4", "poster": "", "openreview": "https://openreview.net/forum?id=ONBPHFZ7zG4", "slides": "https://iclr.cc/virtual/2021/poster/3210", "video": "https://iclr.cc/virtual/2021/poster/3210", "author_site": "Will Dabney, Georg Ostrovski, Andre Barreto", "tldr": "", "abstract": "Recent work on exploration in reinforcement learning (RL) has led to a series of increasingly complex solutions to the problem. This increase in complexity often comes at the expense of generality. Recent empirical studies suggest that, when applied to a broader set of domains, some sophisticated exploration methods are outperformed by simpler counterparts, such as \u03b5-greedy. In this paper we propose an exploration algorithm that retains the simplicity of \u03b5-greedy while reducing dithering. We build on a simple hypothesis: the main limitation of \u03b5-greedy exploration is its lack of temporal persistence, which limits its ability to escape local optima. We propose a temporally extended form of \u03b5-greedy that simply repeats the sampled action for a random duration. It turns out that, for many duration distributions, this suffices to improve exploration on a large set of domains. Interestingly, a class of distributions inspired by ecological models of animal foraging behaviour yields particularly strong performance.", "keywords": "reinforcement learning;exploration", "primary_area": "", "supplementary_material": "", "author": "Will Dabney;Georg Ostrovski;Andre Barreto", "authorids": "~Will_Dabney1;~Georg_Ostrovski1;~Andre_Barreto1", "gender": "M;M;M", "homepage": ";http://ostrovski.co.uk/;https://sites.google.com/corp/view/andrebarreto/about", "dblp": "https://dblp.uni-trier.de/pers/hd/d/Dabney:Will;133/8425;72/953", "google_scholar": "https://scholar.google.co.uk/citations?user=dR-7QW8AAAAJ;;https://scholar.google.co.uk/citations?user=H-xtdV4AAAAJ", "orcid": ";0000-0001-7707-2633;", "linkedin": ";georg-ostrovski-5690a538;", "or_profile": "~Will_Dabney1;~Georg_Ostrovski1;~Andre_Barreto1", "aff": "Google DeepMind;Google DeepMind;Google DeepMind", "aff_domain": "google.com;deepmind.com;google.com", "position": "Research Scientist;Researcher;Research Scientist", "bibtex": "@inproceedings{\ndabney2021temporallyextended,\ntitle={Temporally-Extended {\\ensuremath{\\varepsilon}}-Greedy Exploration},\nauthor={Will Dabney and Georg Ostrovski and Andre Barreto},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=ONBPHFZ7zG4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer5", "pdf_size": 0, "rating": "5;5;6;8;8", "confidence": "4;4;4;5;4", "wc_review": "396;156;359;622;638", "wc_reply_reviewers": "0;0;143;67;0", "wc_reply_authors": "419;229;978;921;204", "reply_reviewers": "0;0;2;1;0", "reply_authors": "1;1;3;2;1", "rating_avg": [ 6.4, 1.3564659966250536 ], "confidence_avg": [ 4.2, 0.39999999999999997 ], "wc_review_avg": [ 434.2, 179.61781648823148 ], "wc_reply_reviewers_avg": [ 42.0, 56.776755807284374 ], "wc_reply_authors_avg": [ 550.2, 334.8858910136407 ], "reply_reviewers_avg": [ 0.6, 0.7999999999999999 ], "reply_authors_avg": [ 1.6, 0.8 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.5897678246195885, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "pdf": "https://openreview.net/pdf?id=ONBPHFZ7zG4", "email": "google.com;deepmind.com;google.com", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google DeepMind", "aff_unique_url": "https://deepmind.com", "aff_unique_abbr": "DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United Kingdom" }, { "title": "Trusted Multi-View Classification", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2908", "id": "OOsR8BzCnl5", "poster": "", "openreview": "https://openreview.net/forum?id=OOsR8BzCnl5", "slides": "https://iclr.cc/virtual/2021/poster/2908", "video": "https://iclr.cc/virtual/2021/poster/2908", "author_site": "Zongbo Han, Changqing Zhang, Huazhu FU, Joey T Zhou", "tldr": "", "abstract": "Multi-view classification (MVC) generally focuses on improving classification accuracy by using information from different views, typically integrating them into a unified comprehensive representation for downstream tasks. However, it is also crucial to dynamically assess the quality of a view for different samples in order to provide reliable uncertainty estimations, which indicate whether predictions can be trusted. To this end, we propose a novel multi-view classification method, termed trusted multi-view classification, which provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. The algorithm jointly utilizes multiple views to promote both classification reliability (uncertainty estimation during testing) and robustness (out-of-distribution-awareness during training) by integrating evidence from each view. To achieve this, the Dirichlet distribution is used to model the distribution of the class probabilities, parameterized with evidence from different views and integrated with the Dempster-Shafer theory. The unified learning framework induces accurate uncertainty and accordingly endows the model with both reliability and robustness for out-of-distribution samples. Extensive experimental results validate the effectiveness of the proposed model in accuracy, reliability and robustness.", "keywords": "Multi-Modal Learning;Multi-View Learning;Uncertainty Machine Learning", "primary_area": "", "supplementary_material": "/attachment/340df893388838e9a70e995b0d7067166c7641a0.zip", "author": "Zongbo Han;Changqing Zhang;Huazhu Fu;Joey Tianyi Zhou", "authorids": "~Zongbo_Han1;~Changqing_Zhang1;~Huazhu_Fu4;~Joey_Tianyi_Zhou1", "gender": "M;M;M;M", "homepage": "https://zongbo-han.github.io/;http://cic.tju.edu.cn/faculty/zhangchangqing/index.html;https://hzfu.github.io;https://joeyzhouty.github.io/", "dblp": "255/6965;78/2668;63/7767;123/5110", "google_scholar": "F2BBkQEAAAAJ;yJGhdykAAAAJ;https://scholar.google.com/citations?hl=en;https://scholar.google.com.sg/citations?user=cYNqDokAAAAJ", "orcid": ";;0000-0002-9702-5524;0000-0002-4675-7055", "linkedin": ";;;", "or_profile": "~Zongbo_Han1;~Changqing_Zhang1;~Huazhu_Fu4;~Joey_Tianyi_Zhou1", "aff": "Tianjin University;Tianjin University;Inception Institute of Artificial Intelligence;A*STAR Centre for Frontier AI Research", "aff_domain": "tju.edu.cn;tju.edu.cn;inceptioniai.org;cfar.a-star.edu.sg", "position": "MS student;Associate Professor;Senior Scientist;Principal Researcher", "bibtex": "@inproceedings{\nhan2021trusted,\ntitle={Trusted Multi-View Classification},\nauthor={Zongbo Han and Changqing Zhang and Huazhu Fu and Joey Tianyi Zhou},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OOsR8BzCnl5}\n}", "github": "[![github](/images/github_icon.svg) hanmenghan/TMC](https://github.com/hanmenghan/TMC) + [![Papers with Code](/images/pwc_icon.svg) 4 community implementations](https://paperswithcode.com/paper/?openreview=OOsR8BzCnl5)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "pdf_size": 0, "rating": "4;7;8", "confidence": "5;3;5", "wc_review": "1487;170;244", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1584;95;211", "reply_reviewers": "0;0;0", "reply_authors": "3;1;1", "rating_avg": [ 6.333333333333333, 1.699673171197595 ], "confidence_avg": [ 4.333333333333333, 0.9428090415820634 ], "wc_review_avg": [ 633.6666666666666, 604.1535860652948 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 630.0, 676.2400954296238 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.2773500981126145, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "pdf": "https://openreview.net/pdf?id=OOsR8BzCnl5", "email": "tju.edu.cn;tju.edu.cn;inceptioniai.org;cfar.a-star.edu.sg", "author_num": 4, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "Tianjin University;Inception Institute of Artificial Intelligence;A*STAR", "aff_unique_dep": ";;Centre for Frontier AI Research", "aff_unique_url": "http://www.tju.edu.cn;https://www.inceptioniai.org;https://www.a-star.edu.sg", "aff_unique_abbr": "TJU;;A*STAR", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;2", "aff_country_unique": "China;United Arab Emirates;Singapore" }, { "title": "Shapley explainability on the data manifold", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2856", "id": "OPyWRrcjVQw", "poster": "", "openreview": "https://openreview.net/forum?id=OPyWRrcjVQw", "slides": "https://iclr.cc/virtual/2021/poster/2856", "video": "https://iclr.cc/virtual/2021/poster/2856", "author_site": "Christopher Frye, Damien De Mijolla, Tom Begley, Laurence Cowton, Megan Stanley, Ilya Feige", "tldr": "", "abstract": "Explainability in AI is crucial for model development, compliance with regulation, and providing operational nuance to predictions. The Shapley framework for explainability attributes a model\u2019s predictions to its input features in a mathematically principled and model-agnostic way. However, general implementations of Shapley explainability make an untenable assumption: that the model\u2019s features are uncorrelated. In this work, we demonstrate unambiguous drawbacks of this assumption and develop two solutions to Shapley explainability that respect the data manifold. One solution, based on generative modelling, provides flexible access to data imputations; the other directly learns the Shapley value-function, providing performance and stability at the cost of flexibility. While \u201coff-manifold\u201d Shapley values can (i) give rise to incorrect explanations, (ii) hide implicit model dependence on sensitive attributes, and (iii) lead to unintelligible explanations in higher-dimensional data, on-manifold explainability overcomes these problems.\n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Christopher Frye;Damien de Mijolla;Tom Begley;Laurence Cowton;Megan Stanley;Ilya Feige", "authorids": "~Christopher_Frye1;damiendemijolla@gmail.com;~Tom_Begley1;laurence.c@faculty.ai;t-mestan@microsoft.com;~Ilya_Feige1", "gender": ";;M;;;", "homepage": ";;https://tcbegley.com;;;", "dblp": ";;;;;222/3226", "google_scholar": ";;;;;", "orcid": ";;;;;", "linkedin": "christopher-frye/;;https://linkedin.com/in/tcbegley;;;", "or_profile": "~Christopher_Frye1;damiendemijolla@gmail.com;~Tom_Begley1;laurence.c@faculty.ai;t-mestan@microsoft.com;~Ilya_Feige1", "aff": "Faculty;;Faculty;;;University College London", "aff_domain": "faculty.ai;;faculty.ai;;;ucl.ac.uk", "position": "Head of R&D;;R&D Lead;;;Postdoc", "bibtex": "@inproceedings{\nfrye2021shapley,\ntitle={Shapley explainability on the data manifold},\nauthor={Christopher Frye and Damien de Mijolla and Tom Begley and Laurence Cowton and Megan Stanley and Ilya Feige},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OPyWRrcjVQw}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "3;5;4;3", "wc_review": "681;965;354;133", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "717;727;613;47", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 533.25, 316.4525043351688 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 526.0, 280.1303268123607 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 177, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8067615666994052842&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=OPyWRrcjVQw", "email": "faculty.ai;;faculty.ai;;;ucl.ac.uk", "author_num": 6, "aff_unique_index": "1", "aff_unique_norm": ";University College London", "aff_unique_dep": "Faculty;", "aff_unique_url": ";https://www.ucl.ac.uk", "aff_unique_abbr": ";UCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "1", "aff_country_unique": ";United Kingdom" }, { "title": "Better Fine-Tuning by Reducing Representational Collapse", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3061", "id": "OQ08SN70M1V", "poster": "", "openreview": "https://openreview.net/forum?id=OQ08SN70M1V", "slides": "https://iclr.cc/virtual/2021/poster/3061", "video": "https://iclr.cc/virtual/2021/poster/3061", "author_site": "Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta", "tldr": "", "abstract": "Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.", "keywords": "finetuning;nlp;representational learning;glue", "primary_area": "", "supplementary_material": "", "author": "Armen Aghajanyan;Akshat Shrivastava;Anchit Gupta;Naman Goyal;Luke Zettlemoyer;Sonal Gupta", "authorids": "~Armen_Aghajanyan1;akshats@fb.com;anchit@fb.com;~Naman_Goyal1;~Luke_Zettlemoyer1;sonalgupta@fb.com", "gender": ";;;M;M;", "homepage": ";;;;https://www.cs.washington.edu/people/faculty/lsz/;", "dblp": ";;;183/1418;21/6793;", "google_scholar": ";;;CRbM_P4AAAAJ;https://scholar.google.com.tw/citations?user=UjpbO6IAAAAJ;", "orcid": ";;;;;", "linkedin": ";;;ngoyal2707/;luke-zettlemoyer-a0109b226/;", "or_profile": "~Armen_Aghajanyan1;akshats@fb.com;anchit@fb.com;~Naman_Goyal1;~Luke_Zettlemoyer1;sonalgupta@fb.com", "aff": ";;;Meta Facebook;Meta;", "aff_domain": ";;;fb.com;meta.com;", "position": ";;;Researcher;Researcher;", "bibtex": "@inproceedings{\naghajanyan2021better,\ntitle={Better Fine-Tuning by Reducing Representational Collapse},\nauthor={Armen Aghajanyan and Akshat Shrivastava and Anchit Gupta and Naman Goyal and Luke Zettlemoyer and Sonal Gupta},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OQ08SN70M1V}\n}", "github": "[![github](/images/github_icon.svg) pytorch/fairseq](https://github.com/pytorch/fairseq/tree/master/examples/rxf) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=OQ08SN70M1V)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;4;3;3", "wc_review": "393;231;569;345", "wc_reply_reviewers": "0;0;0;143", "wc_reply_authors": "379;367;331;180", "reply_reviewers": "0;0;0;1", "reply_authors": "1;1;1;2", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 384.5, 121.69120757063757 ], "wc_reply_reviewers_avg": [ 35.75, 61.92081637058736 ], "wc_reply_authors_avg": [ 314.25, 79.49646218543313 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 261, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17258197096487538069&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=OQ08SN70M1V", "email": ";;;fb.com;meta.com;", "author_num": 6, "aff_unique_index": "0;0", "aff_unique_norm": "Meta", "aff_unique_dep": "Meta Platforms, Inc.", "aff_unique_url": "https://meta.com", "aff_unique_abbr": "Meta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "OSynkDOWbk2", "title": "First-Order Optimization Algorithms via Discretization of Finite-Time Convergent Flows", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we investigate the performance of several discretization algorithms for two first-order finite-time optimization flows. These flows are, namely, the rescaled-gradient flow (RGF) and the signed-gradient flow (SGF), and consist of non-Lipscthiz or discontinuous dynamical systems that converge locally in finite time to the minima of gradient-dominated functions. We introduce three discretization methods for these first-order finite-time flows, and provide convergence guarantees. We then apply the proposed algorithms in training neural networks and empirically test their performances on three standard datasets, namely, CIFAR10, SVHN, and MNIST. Our results show that our schemes demonstrate faster convergences against standard optimization alternatives, while achieving equivalent or better accuracy.", "keywords": "Finite-time optimization;dynamical systems;deep neural networks optimization", "primary_area": "", "supplementary_material": "/attachment/661eda620ebdd5565cbb3a307ee8fa5dcb575615.zip", "author": "Mouhacine Benosman;Orlando Romero;Anoop Cherian", "authorids": "~Mouhacine_Benosman1;orlando.rodrigues.romero@gmail.com;~Anoop_Cherian1", "gender": "M;;", "homepage": ";;", "dblp": ";;", "google_scholar": "cs7AJxcAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Mouhacine_Benosman1;orlando.rodrigues.romero@gmail.com;~Anoop_Cherian1", "aff": "Mitsubishi Electric Research Labs;;", "aff_domain": "merl.com;;", "position": "Researcher;;", "bibtex": "@misc{\nbenosman2021firstorder,\ntitle={First-Order Optimization Algorithms via Discretization of Finite-Time Convergent Flows},\nauthor={Mouhacine Benosman and Orlando Romero and Anoop Cherian},\nyear={2021},\nurl={https://openreview.net/forum?id=OSynkDOWbk2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=OSynkDOWbk2", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "4;3;4;3", "wc_review": "454;425;533;216", "wc_reply_reviewers": "254;0;0;0", "wc_reply_authors": "829;740;787;448", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 407.0, 117.14307491268957 ], "wc_reply_reviewers_avg": [ 63.5, 109.9852262806237 ], "wc_reply_authors_avg": [ 701.0, 149.4238936716615 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:baSKbJ-fO8sJ:scholar.google.com/&scioq=First-Order+Optimization+Algorithms+via+Discretization+of+Finite-Time+Convergent+Flows&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Mitsubishi Electric Research Laboratories", "aff_unique_dep": "", "aff_unique_url": "https://www.merl.com", "aff_unique_abbr": "MERL", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "OZgVHzdKicb", "title": "Reinforcement Learning with Bayesian Classifiers: Efficient Skill Learning from Outcome Examples", "track": "main", "status": "Reject", "tldr": "", "abstract": "Exploration in reinforcement learning is, in general, a challenging problem. In this work, we study a more tractable class of reinforcement learning problems defined by data that provides examples of successful outcome states. In this case, the reward function can be obtained automatically by training a classifier to classify states as successful or not. We argue that, with appropriate representation and regularization, such a classifier can guide a reinforcement learning algorithm to an effective solution. However, as we will show, this requires the classifier to make uncertainty-aware predictions that are very difficult with standard deep networks. To address this, we propose a novel mechanism for obtaining calibrated uncertainty based on an amortized technique for computing the normalized maximum likelihood distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions from data, while being able to guide algorithms towards the specified goal more effectively. We show how using amortized normalized maximum likelihood for reward inference is able to provide effective reward guidance for solving a number of challenging navigation and robotic manipulation tasks which prove difficult for other algorithms.", "keywords": "Reinforcement Learning;Goal Reaching;Bayesian Classification;Reward Inference", "primary_area": "", "supplementary_material": "", "author": "Kevin Li;Abhishek Gupta;Vitchyr H. Pong;Ashwin Reddy;Aurick Zhou;Justin Yu;Sergey Levine", "authorids": "kevintli@berkeley.edu;~Abhishek_Gupta1;~Vitchyr_H._Pong1;~Ashwin_Reddy1;~Aurick_Zhou1;justinvyu@berkeley.edu;~Sergey_Levine1", "gender": ";M;;M;;;M", "homepage": ";https://homes.cs.washington.edu/~abhgupta/;;;;;https://people.eecs.berkeley.edu/~svlevine/", "dblp": ";18/6404-4;;;213/7312;;80/7594", "google_scholar": ";1wLVDP4AAAAJ;;;1O83J5MAAAAJ;;8R35rCwAAAAJ", "orcid": ";;;;;;", "linkedin": ";;;;;;", "or_profile": "kevintli@berkeley.edu;~Abhishek_Gupta1;~Vitchyr_H._Pong1;~Ashwin_Reddy1;~Aurick_Zhou1;justinvyu@berkeley.edu;~Sergey_Levine1", "aff": ";University of California, Berkeley;;University of California, Berkeley;University of California, Berkeley;;Google", "aff_domain": ";berkeley.edu;;berkeley.edu;berkeley.edu;;google.com", "position": ";PhD student;;Undergrad student;PhD student;;Research Scientist", "bibtex": "@misc{\nli2021reinforcement,\ntitle={Reinforcement Learning with Bayesian Classifiers: Efficient Skill Learning from Outcome Examples},\nauthor={Kevin Li and Abhishek Gupta and Vitchyr H. Pong and Ashwin Reddy and Aurick Zhou and Justin Yu and Sergey Levine},\nyear={2021},\nurl={https://openreview.net/forum?id=OZgVHzdKicb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=OZgVHzdKicb", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;3;3;4", "wc_review": "550;449;1095;562", "wc_reply_reviewers": "44;644;193;0", "wc_reply_authors": "1214;1764;1716;1231", "reply_reviewers": "1;3;2;0", "reply_authors": "3;5;3;3", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 664.0, 252.67864967187077 ], "wc_reply_reviewers_avg": [ 220.25, 254.89250185127062 ], "wc_reply_authors_avg": [ 1481.25, 259.3755722885253 ], "reply_reviewers_avg": [ 1.5, 1.118033988749895 ], "reply_authors_avg": [ 3.5, 0.8660254037844386 ], "replies_avg": [ 26, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2683267320191076059&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "University of California, Berkeley;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.berkeley.edu;https://www.google.com", "aff_unique_abbr": "UC Berkeley;Google", "aff_campus_unique_index": "0;0;0;1", "aff_campus_unique": "Berkeley;Mountain View", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "O_PZRnYcUCm", "title": "Blank", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Blank", "keywords": "Blank", "primary_area": "", "supplementary_material": "/attachment/260fab9fb7cb463cf0d434ada6c1b7a9d38030d3.zip", "author": "Moitreya Chatterjee;Anoop Cherian;Narendra Ahuja", "authorids": "~Moitreya_Chatterjee1;~Anoop_Cherian1;~Narendra_Ahuja1", "gender": "M;;M", "homepage": "http://sites.google.com/site/metrosmiles;http://vision.ai.illinois.edu/ahuja.html;http://users.cecs.anu.edu.au/~cherian/", "dblp": "124/2773.html;;44/7734", "google_scholar": "https://scholar.google.co.in/citations?user=CSxgi6AAAAAJ;dY7OSl0AAAAJ;https://scholar.google.com.au/citations?hl=en", "orcid": ";;0000-0002-5566-0351", "linkedin": "moitreya-chatterjee-3937b863;;anoop-cherian-4678a04/", "or_profile": "~Moitreya_Chatterjee1;~Narendra_Ahuja1;~Anoop_Cherian2", "aff": "University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign;Mitsubishi Electric Research Labs", "aff_domain": "illinois.edu;illinois.edu;merl.com", "position": "PhD student;Research Professor;Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=O_PZRnYcUCm", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;5;4;4", "wc_review": "618;334;380;512", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 461.0, 111.73629669896886 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": -1, "gs_cited_by_link": "", "gs_version_total": -1, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of Illinois Urbana-Champaign;Mitsubishi Electric Research Laboratories", "aff_unique_dep": ";", "aff_unique_url": "https://illinois.edu;https://www.merl.com", "aff_unique_abbr": "UIUC;MERL", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Urbana-Champaign;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "Oc-Aedbjq0", "title": "Model Compression via Hyper-Structure Network", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we propose a novel channel pruning method to solve the problem of compression and acceleration of Convolutional Neural Networks (CNNs). Previous channel pruning methods usually ignore the relationships between channels and layers. Many of them parameterize each channel independently by using gates or similar concepts. To fill this gap, a hyper-structure network is proposed to generate the architecture of the main network. Like the existing hypernet, our hyper-structure network can be optimized by regular backpropagation. Moreover, we use a regularization term to specify the computational resource of the compact network. Usually, FLOPs is used as the criterion of computational resource. However, if FLOPs is used in the regularization, it may over penalize early layers. To address this issue, we further introduce learnable layer-wise scaling factors to balance the gradients from different terms, and they can be optimized by hyper-gradient descent. Extensive experimental results on CIFAR-10 and ImageNet show that our method is competitive with state-of-the-art methods. ", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/b07bc664bfb239e28b82128b5a1fd8926e9238fe.zip", "author": "Shangqian Gao;Feihu Huang;Heng Huang", "authorids": "~Shangqian_Gao1;~Feihu_Huang1;~Heng_Huang1", "gender": ";M;M", "homepage": ";;https://www.cs.umd.edu/~heng/", "dblp": "195/2523;169/6247;03/281", "google_scholar": "9mNI83oAAAAJ;tRQwlHUAAAAJ;4OqLaDwAAAAJ", "orcid": ";0000-0003-0806-6074;", "linkedin": ";;", "or_profile": "~Shangqian_Gao1;~Feihu_Huang1;~Heng_Huang1", "aff": "University of Pittsburgh;University of Pittsburgh;University of Pittsburgh", "aff_domain": "pitt.edu;pitt.edu;pitt.edu", "position": "PhD student;Senior Postdoc;Full Professor", "bibtex": "@misc{\ngao2021model,\ntitle={Model Compression via Hyper-Structure Network},\nauthor={Shangqian Gao and Feihu Huang and Heng Huang},\nyear={2021},\nurl={https://openreview.net/forum?id=Oc-Aedbjq0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=Oc-Aedbjq0", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "5;5;3;5", "wc_review": "212;453;386;1011", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "198;313;270;576", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.8660254037844386 ], "wc_review_avg": [ 515.5, 299.29458732158855 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 339.25, 142.72942058314396 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8837222199826622787&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Pittsburgh", "aff_unique_dep": "", "aff_unique_url": "https://www.pitt.edu", "aff_unique_abbr": "Pitt", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "OcTUl1kc_00", "title": "Are Graph Convolutional Networks Fully Exploiting the Graph Structure?", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph Convolutional Networks (GCNs) represent the state-of-the-art for many graph related tasks. At every layer, GCNs rely on the graph structure to define an aggregation strategy where each node updates its representation by combining information from its neighbours. A known limitation of GCNs is their inability to infer long-range dependencies. In fact, as the number of layers increases, information gets smoothed and node embeddings become indistinguishable, negatively affecting performance. In this paper we formalize four levels of injection of graph structural information, and use them to analyze the importance of long-range dependencies. We then propose a novel regularization technique based on random walks with restart, called RWRReg, which encourages the network to encode long-range information into node embeddings. RWRReg does not require additional operations at inference time, is model-agnostic, and is further supported by our theoretical analysis connecting it to the Weisfeiler-Leman algorithm. Our experimental analysis, on both transductive and inductive tasks, shows that the lack of long-range structural information greatly affects the performance of state-of-the-art models, and that the long-range information exploited by RWRReg leads to an average accuracy improvement of more than $5\\%$ on all considered tasks.", "keywords": "Graph Representation Learning;Graph Neural Networks;Random Walks", "primary_area": "", "supplementary_material": "/attachment/e927e1104568caccd1e21eec05b8fbe4846a5b3d.zip", "author": "Davide Buffelli;Fabio Vandin", "authorids": "~Davide_Buffelli1;~Fabio_Vandin2", "gender": "M;", "homepage": "https://davidebuffelli.github.io;", "dblp": "267/1651;62/5172", "google_scholar": "v28My7wAAAAJ;", "orcid": "0000-0001-5565-1634;", "linkedin": "davide-buffelli/;", "or_profile": "~Davide_Buffelli1;~Fabio_Vandin2", "aff": "Samsung;", "aff_domain": "samsung.com;", "position": "Research Intern;", "bibtex": "@misc{\nbuffelli2021are,\ntitle={Are Graph Convolutional Networks Fully Exploiting the Graph Structure?},\nauthor={Davide Buffelli and Fabio Vandin},\nyear={2021},\nurl={https://openreview.net/forum?id=OcTUl1kc_00}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=OcTUl1kc_00", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;4;4", "wc_review": "481;200;382;346", "wc_reply_reviewers": "150;0;0;0", "wc_reply_authors": "1272;488;366;385", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 352.25, 100.8473475109782 ], "wc_reply_reviewers_avg": [ 37.5, 64.9519052838329 ], "wc_reply_authors_avg": [ 627.75, 374.8428837526464 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:h46zAw6RF-0J:scholar.google.com/&scioq=Are+Graph+Convolutional+Networks+Fully+Exploiting+the+Graph+Structure%3F&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Samsung", "aff_unique_dep": "Samsung", "aff_unique_url": "https://www.samsung.com", "aff_unique_abbr": "Samsung", "aff_country_unique_index": "0", "aff_country_unique": "South Korea" }, { "id": "Oe2XI-Aft-k", "title": "Perturbation Type Categorization for Multiple $\\ell_p$ Bounded Adversarial Robustness", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite the recent advances in $\\textit{adversarial training}$ based defenses, deep neural networks are still vulnerable to adversarial attacks outside the perturbation type they are trained to be robust against. Recent works have proposed defenses to improve the robustness of a single model against the union of multiple perturbation types. However, when evaluating the model against each individual attack, these methods still suffer significant trade-offs compared to the ones specifically trained to be robust against that perturbation type. In this work, we introduce the problem of categorizing adversarial examples based on their $\\ell_p$ perturbation types. Based on our analysis, we propose $\\textit{PROTECTOR}$, a two-stage pipeline to improve the robustness against multiple perturbation types. Instead of training a single predictor, $\\textit{PROTECTOR}$ first categorizes the perturbation type of the input, and then utilizes a predictor specifically trained against the predicted perturbation type to make the final prediction. We first theoretically show that adversarial examples created by different perturbation types constitute different distributions, which makes it possible to distinguish them. Further, we show that at test time the adversary faces a natural trade-off between fooling the perturbation type classifier and the succeeding predictor optimized with perturbation specific adversarial training. This makes it challenging for an adversary to plant strong attacks against the whole pipeline. In addition, we demonstrate the realization of this trade-off in deep networks by adding random noise to the model input at test time, enabling enhanced robustness against strong adaptive attacks. Extensive experiments on MNIST and CIFAR-10 show that $\\textit{PROTECTOR}$ outperforms prior adversarial training based defenses by over $5\\%$, when tested against the union of $\\ell_1, \\ell_2, \\ell_\\infty$ attacks.", "keywords": "adversarial examples;robustness;multiple perturbation types", "primary_area": "", "supplementary_material": "", "author": "Pratyush Maini;Xinyun Chen;Bo Li;Dawn Song", "authorids": "~Pratyush_Maini1;~Xinyun_Chen1;~Bo_Li19;~Dawn_Song1", "gender": "M;F;F;F", "homepage": "https://pratyushmaini.github.io/;http://boli.cs.illinois.edu/;;https://jungyhuk.github.io/", "dblp": "248/8071;50/3402-26;s/DXSong;", "google_scholar": ";K8vJkTcAAAAJ;;d4W1UT0AAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Pratyush_Maini1;~Bo_Li19;~Dawn_Song1;~Xinyun_Chen2", "aff": "Carnegie Mellon University;University of Illinois, Urbana Champaign;University of California, Berkeley;University of California, Berkeley", "aff_domain": "cmu.edu;illinois.edu;berkeley.edu;berkeley.edu", "position": "PhD student;Assistant Professor;Full Professor;PhD student", "bibtex": "@misc{\nmaini2021perturbation,\ntitle={Perturbation Type Categorization for Multiple {\\$}{\\textbackslash}ell{\\_}p{\\$} Bounded Adversarial Robustness},\nauthor={Pratyush Maini and Xinyun Chen and Bo Li and Dawn Song},\nyear={2021},\nurl={https://openreview.net/forum?id=Oe2XI-Aft-k}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=Oe2XI-Aft-k", "pdf_size": 0, "rating": "4;4;6;6", "confidence": "3;4;3;4", "wc_review": "83;261;225;349", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "339;403;684;914", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;2", "rating_avg": [ 5.0, 1.0 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 229.5, 95.85796784827018 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 585.0, 230.04456090070897 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7546368976831251055&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "Carnegie Mellon University;University of Illinois Urbana-Champaign;University of California, Berkeley", "aff_unique_dep": ";;", "aff_unique_url": "https://www.cmu.edu;https://illinois.edu;https://www.berkeley.edu", "aff_unique_abbr": "CMU;UIUC;UC Berkeley", "aff_campus_unique_index": "1;2;2", "aff_campus_unique": ";Urbana-Champaign;Berkeley", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "Oecm1tBcguW", "title": "Meta-Learning Bayesian Neural Network Priors Based on PAC-Bayesian Theory", "track": "main", "status": "Reject", "tldr": "", "abstract": "Bayesian deep learning is a promising approach towards improved uncertainty quantification and sample efficiency.\nDue to their complex parameter space, choosing informative priors for Bayesian Neural Networks (BNNs) is challenging. Thus, often a naive, zero-centered Gaussian is used, resulting both in bad generalization and poor uncertainty estimates when training data is scarce. In contrast, meta-learning aims to extract such prior knowledge from a set of related learning tasks. We propose a principled and scalable algorithm for meta-learning BNN priors based on PAC-Bayesian bounds. Whereas previous approaches require optimizing the prior and multiple variational posteriors in an interdependent manner, our method does not rely on difficult nested optimization problems and is agnostic to the variational inference method in use. Our experiments show that the proposed method is not only computationally more efficient but also yields better predictions and uncertainty estimates when compared to previous meta-learning methods and BNNs with standard priors.", "keywords": "meta-learning;life-long learning;transfer;bayesian neural networks;prior;few-shot learning;pac-bayes;generalization bound", "primary_area": "", "supplementary_material": "", "author": "Jonas Rothfuss;Martin Josifoski;Andreas Krause", "authorids": "~Jonas_Rothfuss1;martin.josifoski@epfl.ch;~Andreas_Krause1", "gender": "M;;M", "homepage": "https://las.inf.ethz.ch/people/jonas-rothfuss;;https://las.inf.ethz.ch/krausea", "dblp": "213/7319.html;;87/1831-1.html", "google_scholar": "EfLpX8QAAAAJ;;https://scholar.google.ch/citations?user=eDHv58AAAAAJ", "orcid": ";;0000-0001-7260-9673", "linkedin": ";;krausea/", "or_profile": "~Jonas_Rothfuss1;martin.josifoski@epfl.ch;~Andreas_Krause1", "aff": "Swiss Federal Institute of Technology;;ETH Zurich", "aff_domain": "ethz.ch;;ethz.ch", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nrothfuss2021metalearning,\ntitle={Meta-Learning Bayesian Neural Network Priors Based on {\\{}PAC{\\}}-Bayesian Theory},\nauthor={Jonas Rothfuss and Martin Josifoski and Andreas Krause},\nyear={2021},\nurl={https://openreview.net/forum?id=Oecm1tBcguW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=Oecm1tBcguW", "pdf_size": 0, "rating": "4;6;7;7", "confidence": "4;4;2;4", "wc_review": "494;430;694;760", "wc_reply_reviewers": "171;0;0;91", "wc_reply_authors": "1083;387;449;695", "reply_reviewers": "1;0;0;1", "reply_authors": "2;1;1;1", "rating_avg": [ 6.0, 1.224744871391589 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 594.5, 136.42855272999125 ], "wc_reply_reviewers_avg": [ 65.5, 71.34598797409704 ], "wc_reply_authors_avg": [ 653.5, 273.4204637550013 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4714045207910316, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8622963863107037626&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Swiss Federal Institute of Technology;ETH Zurich", "aff_unique_dep": ";", "aff_unique_url": "https://www.ethz.ch;https://www.ethz.ch", "aff_unique_abbr": "ETH Zurich;ETHZ", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Switzerland" }, { "id": "Ofjd8AkVExe", "title": "Data Transformer for Anomalous Trajectory Detection", "track": "main", "status": "Desk Reject", "tldr": "", "abstract": "Anomaly detection is an important task in many traffic applications. Methods based on convolutional neural networks reach state-of-the-art accuracy; however, they typically rely on supervised training with large labeled data and the trained network is only applicable to the intersection that the training data are collected from. Considering that anomaly data are generally hard to obtain, we present data transformation methods for converting data obtained from one intersection to other intersections to mitigate the effort of training data collection. We demonstrate our methods on the task of anomalous trajectory detection and leverage an unsupervised method that require only normal trajectories for network training. We proposed a general model and a universal model for our transformation methods. The general model focuses on saving data collection effort; while the universal model aims at training a universal network for being used by other intersections. We evaluated our methods on the dataset with trajectories collected from GTA V virtual world. The experimental results show that with significant reduction in data collecting and network training efforts, our methods still can achieve state-of-the-art accuracy for anomalous trajectory detection.", "keywords": "anomaly detection;trajectory;variational auto-encoder;data transformation", "primary_area": "", "supplementary_material": "", "author": "Anonymous", "authorids": "ICLR.cc/2021/Conference/Paper2393/Authors", "gender": "", "homepage": "", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nanonymous2021data,\ntitle={Data Transformer for Anomalous Trajectory Detection},\nauthor={Anonymous},\nbooktitle={Submitted to International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ofjd8AkVExe},\nnote={under review}\n}", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=Ofjd8AkVExe", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1651433207273664253&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3 }, { "id": "Og7kVwRVStV", "title": "SGD on Neural Networks learns Robust Features before Non-Robust", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural networks are known to be vulnerable to adversarial attacks - small, imperceptible perturbations that cause the network to misclassify an input. A recent line of work attempts to explain this behavior by positing the existence of non-robust features - well-generalizing but brittle features present in the data distribution that are learned by the network and can be perturbed to cause misclassification.\n\nIn this paper, we look at the dynamics of neural network training through the perspective of robust and non-robust features. We find that there are two very distinct pathways that neural network training can follow, depending on the hyperparameters used. In the first pathway, the network initially learns only predictive, robust features and weakly predictive non-robust features, and subsequently learns predictive, non-robust features. On the other hand, a network trained via the second pathway eschews predictive non-robust features altogether, and rapidly overfits the training data. We provide strong empirical evidence to corroborate this hypothesis, as well as theoretical analysis in a simplified setting. Key to our analysis is a better understanding of the relationship between predictive non-robust features and adversarial transferability. We present our findings in light of other recent results on the evolution of inductive biases learned by neural networks over the course of training.\n\nFinally, we digress to show that rather than being quirks of the data distribution, predictive non-robust features might actually occur across datasets with different distributions drawn from independent sources, indicating that they perhaps possess some meaning in terms of human semantics.", "keywords": "neural networks;gradient descent;sgd;adversarial;robustness;features", "primary_area": "", "supplementary_material": "", "author": "Vikram Nitin", "authorids": "~Vikram_Nitin1", "gender": "M", "homepage": "", "dblp": "252/5277", "google_scholar": "FLiz6csAAAAJ", "orcid": "0009-0004-8620-8255", "linkedin": "", "or_profile": "~Vikram_Nitin1", "aff": "Columbia University", "aff_domain": "columbia.edu", "position": "Graduate student", "bibtex": "@misc{\nnitin2021sgd,\ntitle={{\\{}SGD{\\}} on Neural Networks learns Robust Features before Non-Robust},\nauthor={Vikram Nitin},\nyear={2021},\nurl={https://openreview.net/forum?id=Og7kVwRVStV}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=Og7kVwRVStV", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;4;4;4", "wc_review": "282;914;753;478", "wc_reply_reviewers": "0;128;96;0", "wc_reply_authors": "346;746;714;237", "reply_reviewers": "0;1;1;0", "reply_authors": "1;3;2;1", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 606.75, 243.83947075894008 ], "wc_reply_reviewers_avg": [ 56.0, 57.1314274283428 ], "wc_reply_authors_avg": [ 510.75, 222.8983793121879 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3994932528044674203&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Columbia University", "aff_unique_dep": "", "aff_unique_url": "https://www.columbia.edu", "aff_unique_abbr": "Columbia", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "FedMix: Approximation of Mixup under Mean Augmented Federated Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3145", "id": "Ogga20D2HO-", "poster": "", "openreview": "https://openreview.net/forum?id=Ogga20D2HO-", "slides": "https://iclr.cc/virtual/2021/poster/3145", "video": "https://iclr.cc/virtual/2021/poster/3145", "author_site": "Tehrim Yoon, Sumin Shin, Sung Ju Hwang, Eunho Yang", "tldr": "", "abstract": "Federated learning (FL) allows edge devices to collectively learn a model without directly sharing data within each device, thus preserving privacy and eliminating the need to store data globally. While there are promising results under the assumption of independent and identically distributed (iid) local data, current state-of-the-art algorithms suffer a performance degradation as the heterogeneity of local data across clients increases. To resolve this issue, we propose a simple framework, \\emph{Mean Augmented Federated Learning (MAFL)}, where clients send and receive \\emph{averaged} local data, subject to the privacy requirements of target applications. Under our framework, we propose a new augmentation algorithm, named \\emph{FedMix}, which is inspired by a phenomenal yet simple data augmentation method, Mixup, but does not require local raw data to be directly shared among devices. Our method shows greatly improved performance in the standard benchmark datasets of FL, under highly non-iid federated settings, compared to conventional algorithms.", "keywords": "federated learning;mixup", "primary_area": "", "supplementary_material": "/attachment/b9cbc090138d58eed6b5ec0184d5232cbe5a2500.zip", "author": "Tehrim Yoon;Sumin Shin;Sung Ju Hwang;Eunho Yang", "authorids": "~Tehrim_Yoon1;sym807@kaist.ac.kr;~Sung_Ju_Hwang1;~Eunho_Yang1", "gender": ";;;M", "homepage": ";;;https://sites.google.com/site/hleehome2/", "dblp": ";;;96/2621", "google_scholar": ";;;", "orcid": "0000-0002-6222-2456;;;", "linkedin": ";;;", "or_profile": "~Tehrim_Yoon1;sym807@kaist.ac.kr;~Sung_Ju_Hwang1;~Eunho_Yang1", "aff": "Korea Advanced Institute of Science & Technology;;;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;;;kaist.ac.kr", "position": "PhD student;;;Associate Professor", "bibtex": "@inproceedings{\nyoon2021fedmix,\ntitle={FedMix: Approximation of Mixup under Mean Augmented Federated Learning},\nauthor={Tehrim Yoon and Sumin Shin and Sung Ju Hwang and Eunho Yang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ogga20D2HO-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;4;4", "wc_review": "670;382;842", "wc_reply_reviewers": "94;211;170", "wc_reply_authors": "1512;1138;1875", "reply_reviewers": "1;2;2", "reply_authors": "3;4;5", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 631.3333333333334, 189.77413475556205 ], "wc_reply_reviewers_avg": [ 158.33333333333334, 48.472214262972926 ], "wc_reply_authors_avg": [ 1508.3333333333333, 300.89016083762004 ], "reply_reviewers_avg": [ 1.6666666666666667, 0.4714045207910317 ], "reply_authors_avg": [ 4.0, 0.816496580927726 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 229, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11940438324071051918&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=Ogga20D2HO-", "email": "kaist.ac.kr;;;kaist.ac.kr", "author_num": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "South Korea" }, { "id": "Oi-Kh379U0", "title": "Generalizing and Tensorizing Subgraph Search in the Supernet", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recently, a special kind of graph, i.e., supernet, which allows two nodes connected by multi-choice edges, has exhibited its power in neural architecture search (NAS) by searching better architectures for computer vision (CV) and natural language processing (NLP) tasks. In this paper, we discover that the design of such discrete architectures also appears in many other important learning tasks, e.g., logical chain inference in knowledge graphs (KGs) and meta-path discovery in heterogeneous information networks (HINs). Thus, we are motivated to generalize the supernet search problem on a broader horizon. However, none of the existing works are effective since the supernet's topology is highly task-dependent and diverse. To address this issue, we propose to tensorize the supernet, i.e. unify the subgraph search problems by a tensor formulation and encode the topology inside the supernet by a tensor network. We further propose an efficient algorithm that admits both stochastic and deterministic objectives to solve the search problem. Finally, we perform extensive experiments on diverse learning tasks, i.e., architecture design for CV, logic inference for KG, and meta-path discovery for HIN. Empirical results demonstrate that our method leads to better performance and architectures.\n", "keywords": "deep learning;neural architecture search;tensor decomposition", "primary_area": "", "supplementary_material": "", "author": "Hansi Yang;quanming yao", "authorids": "~Hansi_Yang1;~quanming_yao1", "gender": "M;M", "homepage": "https://www.linkedin.com/in/%E7%80%9A%E6%80%9D-%E6%9D%A8-6463a4a1;https://lars-group.github.io/", "dblp": "252/5354;158/1014", "google_scholar": ";https://scholar.google.com/schhp?hl=en", "orcid": "0000-0002-0479-9898;", "linkedin": "%E7%80%9A%E6%80%9D-%E6%9D%A8-6463a4a1;", "or_profile": "~Hansi_Yang1;~quanming_yao1", "aff": "Tsinghua University;4Paradigm Inc.", "aff_domain": "tsinghua.edu.cn;4paradigm.com", "position": "Undergrad student;Senior Scientist", "bibtex": "@misc{\nyang2021generalizing,\ntitle={Generalizing and Tensorizing Subgraph Search in the Supernet},\nauthor={Hansi Yang and quanming yao},\nyear={2021},\nurl={https://openreview.net/forum?id=Oi-Kh379U0}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=Oi-Kh379U0", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "5;3;3;3", "wc_review": "607;455;281;332", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "442;801;241;523", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 418.75, 125.75049701691043 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 501.75, 200.97434537771232 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:4wDArPnGiV8J:scholar.google.com/&scioq=Generalizing+and+Tensorizing+Subgraph+Search+in+the+Supernet&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Tsinghua University;4Paradigm", "aff_unique_dep": ";", "aff_unique_url": "https://www.tsinghua.edu.cn;https://www.4paradigm.com/", "aff_unique_abbr": "THU;4Paradigm", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "OifRuTHyQU", "title": "Deep Manifold Computing and Visualization Using Elastic Locally Isometric Smoothness", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "The ability to preserve local geometry of highly nonlinear manifolds in high dimensional spaces and properly unfold them into lower dimensional hyperplanes is the key to the success of manifold computing, nonlinear dimensionality reduction (NLDR) and visualization. This paper proposes a novel method, called elastic locally isometric smoothness (ELIS), to empower deep neural networks with such an ability. ELIS requires that a desired metric between points should be preserved across layers in order to preserve local geometry; such a smoothness constraint effectively regularizes vector-based transformations to become well-behaved local metric-preserving homeomorphisms. Moreover, ELIS requires that the smoothness should be imposed in a way to render sufficient flexibility for tackling complicated nonlinearity and non-Euclideanity; this is achieved layer-wisely via nonlinearity in both the similarity and activation functions. The ELIS method incorporates a class of suitable nonlinear similarity functions into a two-way divergence loss and uses hyperparameter continuation in finding optimal solutions. Extensive experiments, comparisons, and ablation study demonstrate that ELIS can deliver results not only superior to UMAP and t-SNE for and visualization but also better than other leading counterparts of manifold and autoencoder learning for NLDR and manifold data generation.", "keywords": "manifold learning;dimensionality reduction;visualization;data generation", "primary_area": "", "supplementary_material": "/attachment/5307f9e143807076127d2f2d4565935a2f7d033f.zip", "author": "Stan Z. Li;Zelin Zang;Lirong Wu", "authorids": "~Stan_Z._Li2;~Zelin_Zang2;~Lirong_Wu1", "gender": "M;;M", "homepage": ";;https://en.westlake.edu.cn/academics/School_of_Engineering/About/Our_People/Faculty/201912/t20191206_2497.shtml", "dblp": "226/7615;15/10330;l/StanZLi", "google_scholar": "foERjnQAAAAJ;Tk7TrCoAAAAJ;https://scholar.google.com/citations?hl=zh-CN", "orcid": ";;", "linkedin": ";;stan-z-li-%E6%9D%8E%E5%AD%90%E9%9D%92-55753224/", "or_profile": "~Zelin_Zang2;~Lirong_Wu1;~Stan_Z._Li1", "aff": "Westlake University, Zhejiang University, National University of Singapore;Westlake University;Westlake University", "aff_domain": "westlake.edu.cn;westlake.edu.cn;westlake.edu.cn", "position": "PhD student;PhD student;Chair Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=OifRuTHyQU", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "5;4;3;3", "wc_review": "612;214;908;566", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 575.0, 246.3026593441492 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:MEmh-4Gr9OMJ:scholar.google.com/&scioq=Deep+Manifold+Computing+and+Visualization+Using+Elastic+Locally+Isometric+Smoothness&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Westlake University", "aff_unique_dep": "", "aff_unique_url": "https://www.westlake.edu.cn", "aff_unique_abbr": "WU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "Oj2hGyJwhwX", "title": "SelfNorm and CrossNorm for Out-of-Distribution Robustness", "track": "main", "status": "Reject", "tldr": "", "abstract": "Normalization techniques are crucial in stabilizing and accelerating the training of deep neural networks. However, they are mainly designed for the independent and identically distributed (IID) data, not satisfying many real-world out-of-distribution (OOD) situations. Unlike most previous works, this paper presents two normalization methods, SelfNorm and CrossNorm, to promote OOD generalization. SelfNorm uses attention to recalibrate statistics (channel-wise mean and variance), while CrossNorm exchanges the statistics between feature maps. SelfNorm and CrossNorm can complement each other in OOD generalization, though exploring different directions in statistics usage. Extensive experiments on different domains (vision and language), tasks (classification and segmentation), and settings (supervised and semi-supervised) show their effectiveness.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Zhiqiang Tang;Yunhe Gao;Yi Zhu;Zhi Zhang;Mu Li;Dimitris N. Metaxas", "authorids": "~Zhiqiang_Tang1;~Yunhe_Gao2;~Yi_Zhu1;~Zhi_Zhang4;~Mu_Li1;~Dimitris_N._Metaxas1", "gender": "M;M;M;M;;M", "homepage": "https://sites.google.com/site/zhiqiangtanghomepage/home;https://www.cs.rutgers.edu/people/graduate-students/details/yunhe-gao;https://bryanyzhu.github.io/;https://zhreshold.github.io;;https://www.cs.rutgers.edu/~dnm/", "dblp": "71/10098-1;237/4741;;;36/4526;m/DNMetaxas", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;TOsFPu4AAAAJ;IXw4UiwAAAAJ;nZr0oXQAAAAJ;;https://scholar.google.com.tw/citations?user=a7VNhCIAAAAJ", "orcid": ";;0000-0002-6482-6712;0000-0003-0249-1678;;", "linkedin": ";;yi-zhu-546a437a/;;;dimitris-metaxas-1bb74914/", "or_profile": "~Zhiqiang_Tang1;~Yunhe_Gao2;~Yi_Zhu1;~Zhi_Zhang4;~Mu_Li1;~Dimitris_Metaxas1", "aff": "Rutgers University;Rutgers University;Amazon;Amazon;School of Computer Science;Rutgers University", "aff_domain": "rutgers.edu;rutgers.edu;amazon.com;amazon.com;cs.cmu.edu;cs.rutgers.edu", "position": "PhD student;PhD student;Applied Scientist;Applied Scientist;Researcher;Full Professor", "bibtex": "@misc{\ntang2021selfnorm,\ntitle={SelfNorm and CrossNorm for Out-of-Distribution Robustness},\nauthor={Zhiqiang Tang and Yunhe Gao and Yi Zhu and Zhi Zhang and Mu Li and Dimitris N. Metaxas},\nyear={2021},\nurl={https://openreview.net/forum?id=Oj2hGyJwhwX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=Oj2hGyJwhwX", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;3;4;4", "wc_review": "975;302;460;264", "wc_reply_reviewers": "551;0;0;0", "wc_reply_authors": "1049;272;918;234", "reply_reviewers": "1;0;0;0", "reply_authors": "2;1;2;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 500.25, 283.78017460703626 ], "wc_reply_reviewers_avg": [ 137.75, 238.58999874261283 ], "wc_reply_authors_avg": [ 618.25, 368.41985220669096 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16728158861276119419&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;1;1;2;0", "aff_unique_norm": "Rutgers University;Amazon;School of Computer Science", "aff_unique_dep": ";Amazon.com, Inc.;Computer Science", "aff_unique_url": "https://www.rutgers.edu;https://www.amazon.com;", "aff_unique_abbr": "Rutgers;Amazon;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "United States;" }, { "id": "OjUsDdCpR5", "title": "Inferring Principal Components in the Simplex with Multinomial Variational Autoencoders", "track": "main", "status": "Reject", "tldr": "", "abstract": "Covariance estimation on high-dimensional data is a central challenge across multiple scientific disciplines. Sparse high-dimensional count data, frequently encountered in biological applications such as DNA sequencing and proteomics, are often well modeled using multinomial logistic normal models. In many cases, these datasets are also compositional, presented item-wise as fractions of a normalized total, due to measurement and instrument constraints. In compositional settings, three key factors limit the ability of these models to estimate covariance: (1) the computational complexity of inverting high-dimensional covariance matrices, (2) the non-exchangeability introduced from the summation constraint on multinomial parameters, and (3) the irreducibility of the component multinomial logistic normal distribution that necessitates the use of parameter augmentation, or similar techniques, during inference. We show that a variational autoencoder augmented with a fast isometric log-ratio (ILR) transform can address these issues and accurately estimate principal components from multinomially logistic normal distributed data.\nThis model can be optimized on GPUs and modified to handle mini-batching, with the ability to scale across thousands of dimensions and thousands of samples.", "keywords": "Multinomial variational autoencoders;variational autoencoders;ILR transform;compositional PCA;probabilistic PCA;multinomial logistic normal", "primary_area": "", "supplementary_material": "", "author": "James Morton;Justin Silverman;Gleb Tikhonov;Harri L\u00e4hdesm\u00e4ki;Rich Bonneau", "authorids": "~James_Morton1;jsilve24@gmail.com;gleb.tikhonov@aalto.fi;~Harri_L\u00e4hdesm\u00e4ki1;rbonneau@flatironinstitute.org", "gender": ";;;M;", "homepage": ";;;https://research.cs.aalto.fi/csb/;", "dblp": ";;;85/4466;", "google_scholar": "gwzQvp4AAAAJ;;;https://scholar.google.com/citations?hl=en;", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~James_Morton1;jsilve24@gmail.com;gleb.tikhonov@aalto.fi;~Harri_L\u00e4hdesm\u00e4ki1;rbonneau@flatironinstitute.org", "aff": ";;;Aalto University;", "aff_domain": ";;;aalto.fi;", "position": ";;;Associate Professor;", "bibtex": "@misc{\nmorton2021inferring,\ntitle={Inferring Principal Components in the Simplex with Multinomial Variational Autoencoders},\nauthor={James Morton and Justin Silverman and Gleb Tikhonov and Harri L{\\\"a}hdesm{\\\"a}ki and Rich Bonneau},\nyear={2021},\nurl={https://openreview.net/forum?id=OjUsDdCpR5}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=OjUsDdCpR5", "pdf_size": 0, "rating": "4;5;6;7", "confidence": "3;3;3;3", "wc_review": "547;292;296;213", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "284;219;209;35", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.5, 1.118033988749895 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 337.0, 125.68014958616178 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 186.75, 92.22357345060969 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:EaImJ_82HmwJ:scholar.google.com/&scioq=Inferring+Principal+Components+in+the+Simplex+with+Multinomial+Variational+Autoencoders&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Aalto University", "aff_unique_dep": "", "aff_unique_url": "https://www.aalto.fi", "aff_unique_abbr": "Aalto", "aff_country_unique_index": "0", "aff_country_unique": "Finland" }, { "id": "OkXODFHhfum", "title": "Out-of-Distribution Classification and Clustering", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "One of the long-term goals of machine learning is to develop models which will generalize well enough that they can handle a broader range of circumstances than what they were trained on. In the context of vision, such a model would not get confounded by elements outside of its training distribution. To this end, we propose a new task for training neural networks, in which the goal is to determine if all images in a given set share the same class. We demonstrate that a model trained with this task can classify and cluster samples from out-of-distribution classes. This includes left out classes from the same dataset, as well as entire datasets never trained on. Our experiments also reveal an unreported phenomenon, which is that neural networks can overfit their training classes, leading to poorer out-of-distribution performance. It is our belief that mitigating this effect and improving on our task will lead to better out-of-distribution generalization as well as behaviours more resembling those of humans.", "keywords": "Out-of-Distribution Generalization;Out-of-Distribution Classification;Out-of-Distribution Clustering;Class Overfitting", "primary_area": "", "supplementary_material": "", "author": "Gabriele Prato;Sarath Chandar", "authorids": "~Gabriele_Prato1;~Sarath_Chandar1", "gender": ";M", "homepage": ";http://sarathchandar.in/", "dblp": ";45/8542", "google_scholar": ";https://scholar.google.co.in/citations?user=yxWtZLAAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Gabriele_Prato1;~Sarath_Chandar1", "aff": ";\u00c9cole Polytechnique de Montr\u00e9al", "aff_domain": ";polymtl.ca", "position": ";Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=OkXODFHhfum", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;4;4;4", "wc_review": "714;315;303;324", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 414.0, 173.36522142575194 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Z0vLPPJhR2IJ:scholar.google.com/&scioq=Out-of-Distribution+Classification+and+Clustering&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "\u00c9cole Polytechnique de Montr\u00e9al", "aff_unique_dep": "", "aff_unique_url": "https://www.polymtl.ca", "aff_unique_abbr": "Polytechnique Montr\u00e9al", "aff_campus_unique_index": "0", "aff_campus_unique": "Montr\u00e9al", "aff_country_unique_index": "0", "aff_country_unique": "Canada" }, { "title": "Multiplicative Filter Networks", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3198", "id": "OmtmcPkkhT", "poster": "", "openreview": "https://openreview.net/forum?id=OmtmcPkkhT", "slides": "https://iclr.cc/virtual/2021/poster/3198", "video": "https://iclr.cc/virtual/2021/poster/3198", "author_site": "Rizal Fathony, Anit Kumar Sahu, Devin Willmott, Zico Kolter", "tldr": "", "abstract": "Although deep networks are typically used to approximate functions over high dimensional inputs, recent work has increased interest in neural networks as function approximators for low-dimensional-but-complex functions, such as representing images as a function of pixel coordinates, solving differential equations, or representing signed distance fields or neural radiance fields. Key to these recent successes has been the use of new elements such as sinusoidal nonlinearities, or Fourier features in positional encodings, which vastly outperform simple ReLU networks. In this paper, we propose and empirically demonstrate that an arguably simpler class of function approximators can work just as well for such problems: multiplicative filter networks. In these networks, we avoid traditional compositional depth altogether, and simply multiply together (linear functions of) sinusoidal or Gabor wavelet functions applied to the input. This representation has the notable advantage that the entire function can simply be viewed as a linear function approximator over an exponential number of Fourier or Gabor basis functions, respectively. Despite this simplicity, when compared to recent approaches that use Fourier features with ReLU networks or sinusoidal activation networks, we show that these multiplicative filter networks largely outperform or match the performance of these recent approaches on the domains highlighted in these past works.", "keywords": "Deep Architectures;Implicit Neural Representations;Fourier Features", "primary_area": "", "supplementary_material": "/attachment/4bd61cfbf77e25f8703041bd4a4a5adb172afc49.zip", "author": "Rizal Fathony;Anit Kumar Sahu;Devin Willmott;J Zico Kolter", "authorids": "~Rizal_Fathony1;~Anit_Kumar_Sahu1;~Devin_Willmott1;~J_Zico_Kolter1", "gender": "M;;M;", "homepage": "https://rizal.fathony.com/;;;", "dblp": "191/6741;;;", "google_scholar": "_cOHKxkAAAAJ;;WwDT3JEAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Rizal_Fathony1;~Anit_Kumar_Sahu1;~Devin_Willmott1;~J_Zico_Kolter1", "aff": "Carnegie Mellon University;;Bosch Center for Artificial Intelligence;", "aff_domain": "cmu.edu;;bosch-ai.com;", "position": "Postdoc;;Research Scientist;", "bibtex": "@inproceedings{\nfathony2021multiplicative,\ntitle={Multiplicative Filter Networks},\nauthor={Rizal Fathony and Anit Kumar Sahu and Devin Willmott and J Zico Kolter},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OmtmcPkkhT}\n}", "github": "[![github](/images/github_icon.svg) boschresearch/multiplicative-filter-networks](https://github.com/boschresearch/multiplicative-filter-networks) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=OmtmcPkkhT)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;6;8;9", "confidence": "4;3;3;4", "wc_review": "200;360;383;208", "wc_reply_reviewers": "0;0;106;0", "wc_reply_authors": "204;477;438;150", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.25, 1.299038105676658 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 287.75, 84.19137426126265 ], "wc_reply_reviewers_avg": [ 26.5, 45.89934640057525 ], "wc_reply_authors_avg": [ 317.25, 142.21352783754435 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.19245008972987526, "gs_citation": 179, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2172333322940729180&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=OmtmcPkkhT", "email": "cmu.edu;;bosch-ai.com;", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Carnegie Mellon University;Bosch Center for Artificial Intelligence", "aff_unique_dep": ";Center for Artificial Intelligence", "aff_unique_url": "https://www.cmu.edu;https://www.bosch-ai.com", "aff_unique_abbr": "CMU;BCAI", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Germany" }, { "id": "OodqmQT3fir", "title": "XLVIN: eXecuted Latent Value Iteration Nets", "track": "main", "status": "Reject", "tldr": "", "abstract": "Value Iteration Networks (VINs) have emerged as a popular method to perform implicit planning within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics. This came with several limitations, however: the model is not explicitly incentivised to perform meaningful planning computations, the underlying state space is assumed to be discrete, and the Markov decision process (MDP) is assumed fixed and known. We propose eXecuted Latent Value Iteration Networks (XLVINs), which combine recent developments across contrastive self-supervised learning, graph representation learning and neural algorithmic reasoning to alleviate all of the above limitations, successfully deploying VIN-style models on generic environments. XLVINs match the performance of VIN-like models when the underlying MDP is discrete, fixed and known, and provide significant improvements to model-free baselines across three general MDP setups.", "keywords": "value iteration;graph neural networks;reinforcement learning", "primary_area": "", "supplementary_material": "", "author": "Andreea Deac;Petar Veli\u010dkovi\u0107;Ognjen Milinkovic;Pierre-Luc Bacon;Jian Tang;Mladen Nikolic", "authorids": "~Andreea_Deac1;~Petar_Veli\u010dkovi\u01071;ognjen7amg@gmail.com;~Pierre-Luc_Bacon1;~Jian_Tang1;mladennik@gmail.com", "gender": "F;M;;;;", "homepage": ";https://petar-v.com;;;http://www.jian-tang.com;", "dblp": "222/3221;184/4786.html;;;181/2667-5;", "google_scholar": "E6zzj8kAAAAJ;https://scholar.google.co.uk/citations?user=kcTK_FAAAAAJ;;;https://scholar.google.ca/citations?user=1ir6WUEAAAAJ;", "orcid": ";0000-0002-2820-4692;;;;", "linkedin": "andreea-ioana-deac-76206510b;petarvelickovic;;;;", "or_profile": "~Andreea_Deac1;~Petar_Veli\u010dkovi\u01071;ognjen7amg@gmail.com;~Pierre-Luc_Bacon1;~Jian_Tang1;mladennik@gmail.com", "aff": "Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal;Google DeepMind;;;Mila, HEC Montreal;", "aff_domain": "mila.umontreal.ca;google.com;;;hec.ca;", "position": "PhD student;Senior Staff Research Scientist;;;Assistant Professor;", "bibtex": "@misc{\ndeac2021xlvin,\ntitle={{\\{}XLVIN{\\}}: eXecuted Latent Value Iteration Nets},\nauthor={Andreea Deac and Petar Veli{\\v{c}}kovi{\\'c} and Ognjen Milinkovic and Pierre-Luc Bacon and Jian Tang and Mladen Nikolic},\nyear={2021},\nurl={https://openreview.net/forum?id=OodqmQT3fir}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=OodqmQT3fir", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "2;3;3;4", "wc_review": "263;471;492;423", "wc_reply_reviewers": "0;37;40;102", "wc_reply_authors": "325;699;1079;817", "reply_reviewers": "0;1;1;1", "reply_authors": "2;2;3;3", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 412.25, 89.72562343054518 ], "wc_reply_reviewers_avg": [ 44.75, 36.615399765672365 ], "wc_reply_authors_avg": [ 730.0, 271.2729252984897 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.5, 0.5 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 22, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18220109284176804801&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Montreal;Google;HEC Montreal", "aff_unique_dep": "Montreal Institute for Learning Algorithms;Google DeepMind;HEC Business School", "aff_unique_url": "https://www.umontreal.ca;https://deepmind.com;https://www.hec.ca", "aff_unique_abbr": "UM;DeepMind;HEC", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Montreal;", "aff_country_unique_index": "0;1;0", "aff_country_unique": "Canada;United Kingdom" }, { "title": "Neural Topic Model via Optimal Transport", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3340", "id": "Oos98K9Lv-k", "poster": "", "openreview": "https://openreview.net/forum?id=Oos98K9Lv-k", "slides": "https://iclr.cc/virtual/2021/poster/3340", "video": "https://iclr.cc/virtual/2021/poster/3340", "author_site": "He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine", "tldr": "", "abstract": "Recently, Neural Topic Models (NTMs) inspired by variational autoencoders have obtained increasingly research interest due to their promising results on text analysis. However, it is usually hard for existing NTMs to achieve good document representation and coherent/diverse topics at the same time. Moreover, they often degrade their performance severely on short documents. The requirement of reparameterisation could also comprise their training quality and model flexibility. To address these shortcomings, we present a new neural topic model via the theory of optimal transport (OT). Specifically, we propose to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions. Importantly, the cost matrix of the OT distance models the weights between topics and words, which is constructed by the distances between topics and words in an embedding space. Our proposed model can be trained efficiently with a differentiable loss. Extensive experiments show that our framework significantly outperforms the state-of-the-art NTMs on discovering more coherent and diverse topics and deriving better document representations for both regular and short texts.", "keywords": "topic modelling;optimal transport;document analysis", "primary_area": "", "supplementary_material": "", "author": "He Zhao;Dinh Phung;Viet Huynh;Trung Le;Wray Buntine", "authorids": "~He_Zhao1;~Dinh_Phung2;~Viet_Huynh1;~Trung_Le2;~Wray_Buntine1", "gender": ";;M;M;M", "homepage": ";;;;https://bayesian-models.org/", "dblp": ";;161/2718;;72/3885", "google_scholar": ";;;https://scholar.google.com/citations?hl=en;J2pGGuAAAAAJ", "orcid": ";;;;0000-0001-9292-1015", "linkedin": ";;;;wray-buntine-07693921a/", "or_profile": "~He_Zhao1;~Dinh_Phung2;~Viet_Huynh1;~Trung_Le2;~Wray_Buntine1", "aff": ";;Monash University;Monash University;Monash University", "aff_domain": ";;monash.edu;monash.edu;monash.edu", "position": ";;Postdoc;Assistant Professor;Full Professor", "bibtex": "@inproceedings{\nzhao2021neural,\ntitle={Neural Topic Model via Optimal Transport},\nauthor={He Zhao and Dinh Phung and Viet Huynh and Trung Le and Wray Buntine},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Oos98K9Lv-k}\n}", "github": "[![github](/images/github_icon.svg) ethanhezhao/NeuralSinkhornTopicModel](https://github.com/ethanhezhao/NeuralSinkhornTopicModel)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "4;3;3;4", "wc_review": "338;393;267;413", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 352.75, 56.61437538293609 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 81, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=689828574745146932&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=Oos98K9Lv-k", "email": ";;monash.edu;monash.edu;monash.edu", "author_num": 5, "aff_unique_index": "0;0;0", "aff_unique_norm": "Monash University", "aff_unique_dep": "", "aff_unique_url": "https://www.monash.edu", "aff_unique_abbr": "Monash", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Australia" }, { "id": "OpUJ46CNv43", "title": "MOFA: Modular Factorial Design for Hyperparameter Optimization", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Automated hyperparameter optimization (HPO) has shown great power in many machine learning applications. While existing methods suffer from model selection, parallelism, or sample efficiency, this paper presents a new HPO method, MOdular FActorial Design (MOFA), to address these issues simultaneously. The major idea is to use techniques from Experimental Designs to improve sample efficiency of model-free methods. Particularly, MOFA runs with four modules in each iteration: (1) an Orthogonal Latin Hypercube (OLH)-based sampler preserving both univariate projection uniformity and orthogonality; (2) a highly parallelized evaluator; (3) a transformer to collapse the OLH performance table into a specified Fractional Factorial Design--Orthogonal Array (OA); (4) an analyzer including Factorial Performance Analysis and Factorial Importance Analysis to narrow down the search space. We theoretically and empirically show that MOFA has great advantages over existing model-based and model-free methods. ", "keywords": "Automated Hyperparameter Optimization;Factorial Analysis;Model-Free;Sample Efficiency;Orthogonal Latin Hypercubes", "primary_area": "", "supplementary_material": "/attachment/7e06075aa4c51b6b8d6f24bc7e367f41383bed2b.zip", "author": "Bo Xiong;Yimin Huang;Steffen Staab;Zhenguo Li", "authorids": "~Bo_Xiong3;~Yimin_Huang2;~Steffen_Staab2;~Zhenguo_Li1", "gender": "M;M;M;M", "homepage": ";;https://www.ki.uni-stuttgart.de/de/institut/team/Staab-00004/;http://www.ee.columbia.edu/~zgli/", "dblp": ";https://dblp.uni-trier.de/pers/hd/h/Huang:Yimin;s/SteffenStaab;23/6479", "google_scholar": "lmBXicIAAAAJ;;https://scholar.google.com/citations?hl=de;XboZC1AAAAAJ", "orcid": ";;0000-0002-0780-4154;", "linkedin": ";;;", "or_profile": "~Bo_Xiong3;~Yimin_Huang2;~Steffen_Staab2;~Zhenguo_Li1", "aff": "University of Stuttgart;Huawei Technologies Ltd.;University of Southampton;Huawei Noah's Ark Lab", "aff_domain": "uni-stuttgart.de;huawei.com;soton.ac.uk;huawei.com", "position": "PhD student;Researcher;Full Professor;Principal Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=OpUJ46CNv43", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "5;4;4;4", "wc_review": "583;327;693;201", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 451.0, 196.12750954417385 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:igczVHVJ688J:scholar.google.com/&scioq=MOFA:+Modular+Factorial+Design+for+Hyperparameter+Optimization&hl=en&as_sdt=0,5", "gs_version_total": 3, "aff_unique_index": "0;1;2;1", "aff_unique_norm": "University of Stuttgart;Huawei;University of Southampton", "aff_unique_dep": ";Huawei Technologies;", "aff_unique_url": "https://www.uni-stuttgart.de;https://www.huawei.com;https://www.southampton.ac.uk", "aff_unique_abbr": "USTuttgart;Huawei;Southampton", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;2;1", "aff_country_unique": "Germany;China;United Kingdom" }, { "id": "Oq79NOiZB1H", "title": "On the Importance of Sampling in Training GCNs: Convergence Analysis and Variance Reduction", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph Convolutional Networks (GCNs) have achieved impressive empirical advancement across a wide variety of graph-related applications. Despite their great success, training GCNs on large graphs suffers from computational and memory issues. A potential path to circumvent these obstacles is sampling-based methods, where at each layer a subset of nodes is sampled. Although recent studies have empirically demonstrated the effectiveness of sampling-based methods, these works lack theoretical convergence guarantees under realistic settings and cannot fully leverage the information of evolving parameters during optimization. In this paper, we describe and analyze a general \\textbf{\\textit{doubly variance reduction}} schema that can accelerate any sampling method under the memory budget. The motivating impetus for the proposed schema is a careful analysis for the variance of sampling methods where it is shown that the induced variance can be decomposed into node embedding approximation variance (\\emph{zeroth-order variance}) during forward propagation and layerwise-gradient variance (\\emph{first-order variance}) during backward propagation. We theoretically analyze the convergence of the proposed schema and show that it enjoys an $\\mathcal{O}(1/T)$ convergence rate. We complement our theoretical results by integrating the proposed schema in different sampling methods and applying them to different large real-world graphs.", "keywords": "Graph neural network;large-scale machine learning;convergence analysis", "primary_area": "", "supplementary_material": "/attachment/3da811f4026c84786b8cfef149ef06ff93f54a15.zip", "author": "Weilin Cong;Morteza Ramezani;Mehrdad Mahdavi", "authorids": "~Weilin_Cong1;~Morteza_Ramezani1;~Mehrdad_Mahdavi2", "gender": "M;M;M", "homepage": "https://congweilin.github.io/CongWeilin.io/;http://morteza.me;http://www.cse.psu.edu/~mzm616/", "dblp": "203/8227;149/4523;88/4321", "google_scholar": "yYHxZ6MAAAAJ;;HzxnwocAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Weilin_Cong1;~Morteza_Ramezani1;~Mehrdad_Mahdavi2", "aff": "Meta Facebook;Pennsylvania State University;Toyota Technological Institute at Chicago", "aff_domain": "fb.com;psu.edu;ttic.edu", "position": "Intern;PhD student;Researcher", "bibtex": "@misc{\ncong2021on,\ntitle={On the Importance of Sampling in Training {\\{}GCN{\\}}s: Convergence Analysis and Variance Reduction},\nauthor={Weilin Cong and Morteza Ramezani and Mehrdad Mahdavi},\nyear={2021},\nurl={https://openreview.net/forum?id=Oq79NOiZB1H}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=Oq79NOiZB1H", "pdf_size": 0, "rating": "4;4;7;7", "confidence": "4;5;4;3", "wc_review": "368;269;210;458", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "871;724;154;356", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 326.25, 94.72691011534157 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 526.25, 285.26862340608017 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3122297942196367119&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Meta;Pennsylvania State University;Toyota Technological Institute at Chicago", "aff_unique_dep": "Meta Platforms, Inc.;;", "aff_unique_url": "https://meta.com;https://www.psu.edu;https://www.tti-chicago.org", "aff_unique_abbr": "Meta;PSU;TTI Chicago", "aff_campus_unique_index": "1", "aff_campus_unique": ";Chicago", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "title": "Exploring Balanced Feature Spaces for Representation Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2977", "id": "OqtLIabPTit", "poster": "", "openreview": "https://openreview.net/forum?id=OqtLIabPTit", "slides": "https://iclr.cc/virtual/2021/poster/2977", "video": "https://iclr.cc/virtual/2021/poster/2977", "author_site": "Bingyi Kang, Yu Li, Sain Xie, Zehuan Yuan, Jiashi Feng", "tldr": "", "abstract": "Existing self-supervised learning (SSL) methods are mostly applied for training representation models from artificially balanced datasets (e.g., ImageNet). It is unclear how well they will perform in the practical scenarios where datasets are often imbalanced w.r.t. the classes. Motivated by this question, we conduct a series of studies on the performance of self-supervised contrastive learning and supervised learning methods over multiple datasets where training instance distributions vary from a balanced one to a long-tailed one. Our findings are quite intriguing. Different from supervised methods with large performance drop, the self-supervised contrastive learning methods perform stably well even when the datasets are heavily imbalanced. This motivates us to explore the balanced feature spaces learned by contrastive learning, where the feature representations present similar linear separability w.r.t. all the classes. Our further experiments reveal that a representation model generating a balanced feature space can generalize better than that yielding an imbalanced one across multiple settings. Inspired by these insights, we develop a novel representation learning method, called $k$-positive contrastive learning. It effectively combines strengths of the supervised method and the contrastive learning method to learn representations that are both discriminative and balanced. Extensive experiments demonstrate its superiority on multiple recognition tasks. Remarkably, it achieves new state-of-the-art on challenging long-tailed recognition benchmarks. Code and models will be released.", "keywords": "Representation Learning;Contrastive Learning;Long-Tailed Recognition", "primary_area": "", "supplementary_material": "", "author": "Bingyi Kang;Yu Li;Sa Xie;Zehuan Yuan;Jiashi Feng", "authorids": "~Bingyi_Kang1;~Yu_Li7;~Sa_Xie1;~Zehuan_Yuan1;~Jiashi_Feng1", "gender": ";F;M;M;", "homepage": "https://bingykang.github.io/;;https://shallowyuan.github.io/;https://sites.google.com/site/jshfeng/;https://openreview.net", "dblp": ";;227/3298;56/8278;", "google_scholar": "https://scholar.google.com.sg/citations?user=NmHgX-wAAAAJ;4-1R-bMAAAAJ;;https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ;", "orcid": ";;;0000-0001-6843-0064;", "linkedin": ";;;;", "or_profile": "~Bingyi_Kang1;~Yu_Li7;~Zehuan_Yuan1;~Jiashi_Feng2;~Sain_Xie2", "aff": "National University of Singapore;;ByteDance Inc.;National University of Singapore;Open Review", "aff_domain": "u.nus.edu;;bytedance.com;nus.edu.sg;openreview.net", "position": "PhD student;;Researcher;Assistant Professor;\u3164 \u3164", "bibtex": "@inproceedings{\nkang2021exploring,\ntitle={Exploring Balanced Feature Spaces for Representation Learning},\nauthor={Bingyi Kang and Yu Li and Sa Xie and Zehuan Yuan and Jiashi Feng},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OqtLIabPTit}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "5;5;6", "confidence": "5;5;5", "wc_review": "290;579;214", "wc_reply_reviewers": "0;316;0", "wc_reply_authors": "2104;1626;774", "reply_reviewers": "0;1;0", "reply_authors": "3;3;1", "rating_avg": [ 5.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 5.0, 0.0 ], "wc_review_avg": [ 361.0, 157.24079199325686 ], "wc_reply_reviewers_avg": [ 105.33333333333333, 148.963828569966 ], "wc_reply_authors_avg": [ 1501.3333333333333, 550.0795902008687 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.3333333333333335, 0.9428090415820634 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 331, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10287981258636055243&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "pdf": "https://openreview.net/pdf?id=OqtLIabPTit", "email": "u.nus.edu;;bytedance.com;nus.edu.sg;openreview.net", "author_num": 5, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "National University of Singapore;ByteDance;Open Review", "aff_unique_dep": ";;", "aff_unique_url": "https://www.nus.edu.sg;https://www.bytedance.com;https://openreview.net", "aff_unique_abbr": "NUS;ByteDance;Open Review", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0;2", "aff_country_unique": "Singapore;China;United States" }, { "id": "OtAnbr1OQAW", "title": "Diverse Exploration via InfoMax Options", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this paper, we study the problem of autonomously discovering temporally abstracted actions, or options, for exploration in reinforcement learning. For learning diverse options suitable for exploration, we introduce the infomax termination objective defined as the mutual information between options and their corresponding state transitions. We derive a scalable optimization scheme for maximizing this objective via the termination condition of options, yielding the InfoMax Option Critic (IMOC) algorithm. Through illustrative experiments, we empirically show that IMOC learns diverse options and utilizes them for exploration. Moreover, we show that IMOC scales well to continuous control tasks.\n", "keywords": "Reinforcement Learning;Hierachical Reinforcement Learning;Exploration", "primary_area": "", "supplementary_material": "/attachment/f5afdbac3d909e1490c634d984f1b469d7b113c9.zip", "author": "Yuji Kanagawa;Tomoyuki Kaneko", "authorids": "~Yuji_Kanagawa1;~Tomoyuki_Kaneko1", "gender": "M;M", "homepage": "https://kngwyu.github.io/;", "dblp": ";16/5774", "google_scholar": ";", "orcid": ";0000-0001-8051-2388", "linkedin": ";", "or_profile": "~Yuji_Kanagawa1;~Tomoyuki_Kaneko1", "aff": "The University of Tokyo;The University of Tokyo", "aff_domain": "u-tokyo.ac.jp;u-tokyo.ac.jp", "position": "MS student;Associate Professor", "bibtex": "@misc{\nkanagawa2021diverse,\ntitle={Diverse Exploration via InfoMax Options},\nauthor={Yuji Kanagawa and Tomoyuki Kaneko},\nyear={2021},\nurl={https://openreview.net/forum?id=OtAnbr1OQAW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=OtAnbr1OQAW", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "4;4;5;3", "wc_review": "423;398;580;432", "wc_reply_reviewers": "0;0;23;0", "wc_reply_authors": "328;195;560;255", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;2;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 458.25, 71.38758645590983 ], "wc_reply_reviewers_avg": [ 5.75, 9.959292143521045 ], "wc_reply_authors_avg": [ 334.5, 138.44944925856512 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3302331945569646560&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "title": "Mutual Information State Intrinsic Control", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2723", "id": "OthEq8I5v1", "poster": "", "openreview": "https://openreview.net/forum?id=OthEq8I5v1", "slides": "https://iclr.cc/virtual/2021/poster/2723", "video": "https://iclr.cc/virtual/2021/poster/2723", "author_site": "Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu", "tldr": "", "abstract": "Reinforcement learning has been shown to be highly successful at many challenging tasks. However, success heavily relies on well-shaped rewards. Intrinsically motivated RL attempts to remove this constraint by defining an intrinsic reward function. Motivated by the self-consciousness concept in psychology, we make a natural assumption that the agent knows what constitutes itself, and propose a new intrinsic objective that encourages the agent to have maximum control on the environment. We mathematically formalize this reward as the mutual information between the agent state and the surrounding state under the current agent policy. With this new intrinsic motivation, we are able to outperform previous methods, including being able to complete the pick-and-place task for the first time without using any task reward. A video showing experimental results is available at https://youtu.be/AUCwc9RThpk.", "keywords": "Intrinsically Motivated Reinforcement Learning;Intrinsic Reward;Intrinsic Motivation;Deep Reinforcement Learning;Reinforcement Learning", "primary_area": "", "supplementary_material": "/attachment/7348478f3c2722df8bc5f8e3564c797196ce101d.zip", "author": "Rui Zhao;Yang Gao;Pieter Abbeel;Volker Tresp;Wei Xu", "authorids": "~Rui_Zhao1;~Yang_Gao1;~Pieter_Abbeel2;~Volker_Tresp1;~Wei_Xu13", "gender": "M;M;M;M;M", "homepage": "https://ruizhaogit.github.io;http://yang-gao.weebly.com;https://people.eecs.berkeley.edu/~pabbeel/;https://www.dbs.ifi.lmu.de/~tresp/;", "dblp": "26/2578-11;89/4402-29;;t/VolkerTresp;", "google_scholar": "N1yNDnQAAAAJ;https://scholar.google.com/citations?hl=en;https://scholar.google.com.tw/citations?user=vtwH6GkAAAAJ;xIJHTUwAAAAJ;Gxz1fqwAAAAJ", "orcid": ";;;0000-0001-9428-3686;", "linkedin": "rui-zhao-profile/;yang-gao-45245348/;;volker-tresp-8110a118/;", "or_profile": "~Rui_Zhao1;~Yang_Gao1;~Pieter_Abbeel2;~Volker_Tresp1;~Wei_Xu13", "aff": "Tencent AI Lab;Tsinghua University;Covariant;Siemens Corporate Research;Horizon Robotics", "aff_domain": "tencent.com;tsinghua.edu.cn;covariant.ai;siemens.com;horizon.auto", "position": "Researcher;Assistant Professor;Founder;Principal Researcher;Researcher", "bibtex": "@inproceedings{\nzhao2021mutual,\ntitle={Mutual Information State Intrinsic Control},\nauthor={Rui Zhao and Yang Gao and Pieter Abbeel and Volker Tresp and Wei Xu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=OthEq8I5v1}\n}", "github": "[![github](/images/github_icon.svg) ruizhaogit/music](https://github.com/ruizhaogit/music) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=OthEq8I5v1)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "pdf_size": 0, "rating": "7;7;7;8", "confidence": "5;3;3;5", "wc_review": "723;329;379;545", "wc_reply_reviewers": "123;0;20;0", "wc_reply_authors": "565;370;260;261", "reply_reviewers": "1;0;1;0", "reply_authors": "2;1;2;1", "rating_avg": [ 7.25, 0.4330127018922193 ], "confidence_avg": [ 4.0, 1.0 ], "wc_review_avg": [ 494.0, 154.50889942006577 ], "wc_reply_reviewers_avg": [ 35.75, 51.031240431719866 ], "wc_reply_authors_avg": [ 364.0, 124.36036346038878 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 32, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17311257108388103067&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=OthEq8I5v1", "email": "tencent.com;tsinghua.edu.cn;covariant.ai;siemens.com;horizon.auto", "author_num": 5, "aff_unique_index": "0;1;2;3;4", "aff_unique_norm": "Tencent;Tsinghua University;Covariant;Siemens AG;Horizon Robotics", "aff_unique_dep": "Tencent AI Lab;;;Corporate Research;", "aff_unique_url": "https://ai.tencent.com;https://www.tsinghua.edu.cn;;https://www.siemens.com/research;https://www.horizon-robotics.com/", "aff_unique_abbr": "Tencent AI Lab;THU;;Siemens;Horizon Robotics", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;2;0", "aff_country_unique": "China;;Germany" }, { "title": "Semantic Re-tuning with Contrastive Tension", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2937", "id": "Ov_sMNau-PF", "poster": "", "openreview": "https://openreview.net/forum?id=Ov_sMNau-PF", "slides": "https://iclr.cc/virtual/2021/poster/2937", "video": "https://iclr.cc/virtual/2021/poster/2937", "author_site": "Fredrik Carlsson, Amaru C Gyllensten, Evangelia Gogoulou, Erik Y Hellqvist, Magnus Sahlgren", "tldr": "", "abstract": "Extracting semantically useful natural language sentence representations from pre-trained deep neural networks such as Transformers remains a challenge. We first demonstrate that pre-training objectives impose a significant task bias onto the final layers of models with a layer-wise survey of the Semantic Textual Similarity (STS) correlations for multiple common Transformer language models. We then propose a new self-supervised method called Contrastive Tension (CT) to counter such biases. CT frames the training objective as a noise-contrastive task between the final layer representations of two independent models, in turn making the final layer representations suitable for feature extraction. Results from multiple common unsupervised and supervised STS tasks indicate that CT outperforms previous State Of The Art (SOTA), and when combining CT with supervised data we improve upon previous SOTA results with large margins. ", "keywords": "Semantic Textual Similarity;Transformers;Language Modelling;Sentence Embeddings;Sentence Representations;Pre-training;Fine-tuning", "primary_area": "", "supplementary_material": "", "author": "Fredrik Carlsson;Amaru Cuba Gyllensten;Evangelia Gogoulou;Erik Ylip\u00e4\u00e4 Hellqvist;Magnus Sahlgren", "authorids": "~Fredrik_Carlsson1;~Amaru_Cuba_Gyllensten2;~Evangelia_Gogoulou1;~Erik_Ylip\u00e4\u00e4_Hellqvist1;~Magnus_Sahlgren1", "gender": "M;M;;M;M", "homepage": "https://www.ri.se/sv/fredrik-carlsson;;;;", "dblp": ";;;;76/3617", "google_scholar": ";nql2ay0AAAAJ;;;Nf2NNVwAAAAJ", "orcid": ";;;0000-0001-5027-1552;0000-0001-5100-0535", "linkedin": ";;;;magnus-sahlgren-0a12b2/", "or_profile": "~Fredrik_Carlsson1;~Amaru_Cuba_Gyllensten2;~Evangelia_Gogoulou1;~Erik_Ylip\u00e4\u00e4_Hellqvist1;~Magnus_Sahlgren1", "aff": "KTH Royal Institute of Technology, Stockholm, Sweden;KTH Royal Institute of Technology, Stockholm, Sweden;;;Research institutes of Sweden", "aff_domain": "kth.se;kth.se;;;ri.se", "position": "MS student;PhD student;;;Research scientist", "bibtex": "@inproceedings{\ncarlsson2021semantic,\ntitle={Semantic Re-tuning with Contrastive Tension},\nauthor={Fredrik Carlsson and Amaru Cuba Gyllensten and Evangelia Gogoulou and Erik Ylip{\\\"a}{\\\"a} Hellqvist and Magnus Sahlgren},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ov_sMNau-PF}\n}", "github": "[![github](/images/github_icon.svg) FreddeFrallan/Contrastive-Tension](https://github.com/FreddeFrallan/Contrastive-Tension)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "5;6;7;9", "confidence": "4;4;5;5", "wc_review": "421;267;394;392", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "580;159;301;104", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 1.479019945774904 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 368.5, 59.70971445250764 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 286.0, 184.33257986585008 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8451542547285166, "gs_citation": 98, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15789796214682393003&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Ov_sMNau-PF", "email": "kth.se;kth.se;;;ri.se", "author_num": 5, "aff_unique_index": "0;0;1", "aff_unique_norm": "KTH Royal Institute of Technology;Research Institutes of Sweden", "aff_unique_dep": ";", "aff_unique_url": "https://www.kth.se;https://www.ri.se/en", "aff_unique_abbr": "KTH;RISE", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Stockholm;", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Sweden" }, { "title": "Negative Data Augmentation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2546", "id": "Ovp8dvB8IBH", "poster": "", "openreview": "https://openreview.net/forum?id=Ovp8dvB8IBH", "slides": "https://iclr.cc/virtual/2021/poster/2546", "video": "https://iclr.cc/virtual/2021/poster/2546", "author_site": "Abhishek Sinha, Kumar Ayush, Jiaming Song, Burak Uzkent, Hongxia Jin, Stefano Ermon", "tldr": "", "abstract": "Data augmentation is often used to enlarge datasets with synthetic samples generated in accordance with the underlying data distribution. To enable a wider range of augmentations, we explore negative data augmentation strategies (NDA) that intentionally create out-of-distribution samples. We show that such negative out-of-distribution samples provide information on the support of the data distribution, and can be leveraged for generative modeling and representation learning. We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator. We prove that under suitable conditions, optimizing the resulting objective still recovers the true data distribution but can directly bias the generator towards avoiding samples that lack the desired structure. Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities. Further, we incorporate the same negative data augmentation strategy in a contrastive learning framework for self-supervised representation learning on images and videos, achieving improved performance on downstream image classification, object detection, and action recognition tasks. These results suggest that prior knowledge on what does not constitute valid data is an effective form of weak supervision across a range of unsupervised learning tasks.", "keywords": "generative models;self-supervised learning;data augmentation;anomaly detection", "primary_area": "", "supplementary_material": "", "author": "Abhishek Sinha;Kumar Ayush;Jiaming Song;Burak Uzkent;Hongxia Jin;Stefano Ermon", "authorids": "~Abhishek_Sinha1;~Kumar_Ayush2;~Jiaming_Song1;~Burak_Uzkent1;~Hongxia_Jin1;~Stefano_Ermon1", "gender": "M;M;M;M;;M", "homepage": "https://a7b23.github.io/;https://kmrayush.github.io/;http://tsong.me;https://uzkent.github.io;;http://cs.stanford.edu/~ermon/", "dblp": "47/9175;170/0024;173/5104;73/9345;;47/8135", "google_scholar": "https://scholar.google.com/citations?hl=en;gIlnMF8AAAAJ;;-Es6xrgAAAAJ;;", "orcid": ";0000-0002-9680-2061;;;;", "linkedin": "abhisheksinha94/;kumar-ayush-a19534a5/;jiamings/;;;", "or_profile": "~Abhishek_Sinha1;~Kumar_Ayush2;~Jiaming_Song1;~Burak_Uzkent1;~Hongxia_Jin1;~Stefano_Ermon1", "aff": "Stanford University;Computer Science Department, Stanford University;Computer Science Department, Stanford University;;;Stanford University", "aff_domain": "stanford.edu;cs.stanford.edu;cs.stanford.edu;;;stanford.edu", "position": "MS student;MS student;PhD student;;;Assistant Professor", "bibtex": "@inproceedings{\nsinha2021negative,\ntitle={Negative Data Augmentation },\nauthor={Abhishek Sinha and Kumar Ayush and Jiaming Song and Burak Uzkent and Hongxia Jin and Stefano Ermon},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ovp8dvB8IBH}\n}", "github": "[![github](/images/github_icon.svg) ermongroup/NDA](https://github.com/ermongroup/NDA) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=Ovp8dvB8IBH)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "5;6;7;9", "confidence": "4;4;4;4", "wc_review": "282;696;255;129", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "532;865;267;249", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.75, 1.479019945774904 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 340.5, 213.2164393286784 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 478.25, 249.82331256309928 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 102, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1155111694700482040&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Ovp8dvB8IBH", "email": "stanford.edu;cs.stanford.edu;cs.stanford.edu;;;stanford.edu", "author_num": 6, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "Stanford University", "aff_unique_dep": "", "aff_unique_url": "https://www.stanford.edu", "aff_unique_abbr": "Stanford", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Stanford", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "OyDjznG-x2e", "title": "Graph Permutation Selection for Decoding of Error Correction Codes using Self-Attention", "track": "main", "status": "Reject", "tldr": "", "abstract": "Error correction codes are an integral part of communication applications and boost the reliability of transmission. The optimal decoding of transmitted codewords is the maximum likelihood rule, which is NP-hard. For practical realizations, suboptimal decoding algorithms are employed; however, the lack of theoretical insights currently impedes the exploitation of the full potential of these algorithms. One key insight is the choice of permutation in permutation decoding. We present a data-driven framework for permutation selection combining domain knowledge with machine learning concepts such as node embedding and self-attention. Significant and consistent improvements in the bit error rate are shown for the simulated Bose Chaudhuri Hocquenghem (BCH) code as compared to the baseline decoders. To the best of our knowledge, this work is the first to leverage the benefits of self-attention networks in physical layer communication systems.\n", "keywords": "decoding;error correcting codes;belief propagation;deep learning", "primary_area": "", "supplementary_material": "", "author": "Nir Raviv;Avi Caciularu;Tomer Raviv;Jacob Goldberger;Yair Be'ery", "authorids": "nirraviv89@gmail.com;~Avi_Caciularu1;tomerraviv95@gmail.com;~Jacob_Goldberger1;ybeery@post.tau.ac.il", "gender": ";M;;M;", "homepage": ";http://aviclu.github.io/;;http://www.eng.biu.ac.il/goldbej/;", "dblp": ";https://dblp.uni-trier.de/pid/207/8509;;65/6574;", "google_scholar": ";https://scholar.google.co.il/citations?user=fPG_0aQAAAAJ;;https://scholar.google.co.il/citations?user=vgzrOK4AAAAJ;", "orcid": ";;;;", "linkedin": ";avicaciularu/;;;", "or_profile": "nirraviv89@gmail.com;~Avi_Caciularu1;tomerraviv95@gmail.com;~Jacob_Goldberger1;ybeery@post.tau.ac.il", "aff": ";;;Bar-Ilan University;", "aff_domain": ";;;biu.ac.il;", "position": ";;;Full Professor;", "bibtex": "@misc{\nraviv2021graph,\ntitle={Graph Permutation Selection for Decoding of Error Correction Codes using Self-Attention},\nauthor={Nir Raviv and Avi Caciularu and Tomer Raviv and Jacob Goldberger and Yair Be'ery},\nyear={2021},\nurl={https://openreview.net/forum?id=OyDjznG-x2e}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=OyDjznG-x2e", "pdf_size": 0, "rating": "4;5;5;6;6", "confidence": "3;4;4;3;4", "wc_review": "315;194;392;1097;473", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "284;120;293;394;294", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.2, 0.7483314773547882 ], "confidence_avg": [ 3.6, 0.4898979485566356 ], "wc_review_avg": [ 494.2, 315.11737495733234 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 277.0, 88.24058023381306 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.21821789023599233, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:MrGdB3PobBwJ:scholar.google.com/&scioq=Graph+Permutation+Selection+for+Decoding+of+Error+Correction+Codes+using+Self-Attention&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Bar-Ilan University", "aff_unique_dep": "", "aff_unique_url": "https://www.biu.ac.il", "aff_unique_abbr": "BIU", "aff_country_unique_index": "0", "aff_country_unique": "Israel" }, { "id": "Oz_4sa7hKhl", "title": "Cluster & Tune: Enhance BERT Performance in Low Resource Text Classification", "track": "main", "status": "Reject", "tldr": "", "abstract": "In data-constrained cases, the common practice of fine-tuning BERT for a target text classification task is prone to producing poor performance. In such low resources scenarios, we suggest performing an unsupervised classification task prior to fine-tuning on the target task. \nSpecifically, as such an intermediate task, we perform unsupervised clustering, training BERT on predicting the cluster labels. We test this hypothesis on various data sets, and show that this additional classification step can reduce the demand for labeled examples.\nWe further discuss under which conditions this task is helpful and why. ", "keywords": "low resource;BERT;clustering", "primary_area": "", "supplementary_material": "", "author": "Eyal Shnarch;Ariel Gera;Alon Halfon;Lena Dankin;Leshem Choshen;Ranit Aharonov;Noam Slonim", "authorids": "~Eyal_Shnarch1;arielge@il.ibm.com;alonhal@il.ibm.com;lenad@il.ibm.com;~Leshem_Choshen1;~Ranit_Aharonov2;noams@il.ibm.com", "gender": "M;;;;Not Specified;F;", "homepage": "https://researcher.watson.ibm.com/researcher/view.php?person=il-EYALS;;;;https://ktilana.wixsite.com/leshem-choshen;;", "dblp": "67/2631;;;;218/5237;https://dblp.org/pers/a/Aharonov:Ranit.html;", "google_scholar": "https://scholar.google.co.il/citations?user=UHLsHeMAAAAJ;;;;https://scholar.google.com/citations?hl=en;https://scholar.google.co.il/citations?user=f0t-8dgAAAAJ;", "orcid": ";;;;0000-0002-0085-6496;;", "linkedin": ";;;;leshemchoshen/;ranit-aharonov-a1b4231/;", "or_profile": "~Eyal_Shnarch1;arielge@il.ibm.com;alonhal@il.ibm.com;lenad@il.ibm.com;~Leshem_Choshen1;~Ranit_Aharonov2;noams@il.ibm.com", "aff": "International Business Machines;;;;hebrew university jerusalem israel;;", "aff_domain": "ibm.com;;;;huji.ac.il;;", "position": "Principal Researcher;;;;PhD student;;", "bibtex": "@misc{\nshnarch2021cluster,\ntitle={Cluster {\\&} Tune: Enhance {\\{}BERT{\\}} Performance in Low Resource Text Classification},\nauthor={Eyal Shnarch and Ariel Gera and Alon Halfon and Lena Dankin and Leshem Choshen and Ranit Aharonov and Noam Slonim},\nyear={2021},\nurl={https://openreview.net/forum?id=Oz_4sa7hKhl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=Oz_4sa7hKhl", "pdf_size": 0, "rating": "3;6;6;8", "confidence": "4;4;3;3", "wc_review": "187;477;248;260", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "404;544;424;92", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 1.7853571071357126 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 293.0, 109.77932410067025 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 366.0, 167.00898179439332 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.7001400420140049, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11440723124985110105&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "International Business Machines Corporation;Hebrew University of Jerusalem", "aff_unique_dep": ";", "aff_unique_url": "https://www.ibm.com;https://www.huji.ac.il", "aff_unique_abbr": "IBM;HUJI", "aff_campus_unique_index": "1", "aff_campus_unique": ";Jerusalem", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Israel" }, { "title": "CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2549", "id": "Ozk9MrX1hvA", "poster": "", "openreview": "https://openreview.net/forum?id=Ozk9MrX1hvA", "slides": "https://iclr.cc/virtual/2021/poster/2549", "video": "https://iclr.cc/virtual/2021/poster/2549", "author_site": "Yanru Qu, Dinghan Shen, Yelong Shen, Sandra Sajeev, Weizhu Chen, Jiawei Han", "tldr": "", "abstract": "Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends to be more challenging. In this paper, we propose a novel data augmentation frame-work dubbed CoDA, which synthesizes diverse and informative augmented examples by integrating multiple transformations organically. Moreover, a contrastive regularization is introduced to capture the global relationship among all the data samples. A momentum encoder along with a memory bank is further leveraged to better estimate the contrastive loss. To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks. On the GLUE benchmark, CoDA gives rise to an average improvement of 2.2%while applied to the Roberta-large model. More importantly, it consistently exhibits stronger results relative to several competitive data augmentation and adversarial training baselines (including the low-resource settings). Extensive experiments show that the proposed contrastive objective can be flexibly combined with various data augmentation approaches to further boost their performance, highlighting the wide applicability of the CoDA framework.", "keywords": "data augmentation;natural language understanding;consistency training;contrastive learning", "primary_area": "", "supplementary_material": "", "author": "Yanru Qu;Dinghan Shen;Yelong Shen;Sandra Sajeev;Weizhu Chen;Jiawei Han", "authorids": "~Yanru_Qu1;~Dinghan_Shen1;~Yelong_Shen2;ssajeev@microsoft.com;~Weizhu_Chen1;~Jiawei_Han1", "gender": "M;M;;;M;M", "homepage": "https://yanruqu.com/;https://sites.google.com/view/dinghanshen;;;https://www.microsoft.com/en-us/research/people/wzchen/;http://hanj.cs.illinois.edu/", "dblp": "180/3336;202/2287;37/9376;;79/2536;h/JiaweiHan.html", "google_scholar": "W-o1VXEAAAAJ;;;;LG_E-4EAAAAJ;https://scholar.google.com.tw/citations?user=Kv9AbjMAAAAJ", "orcid": ";;;;;0000-0002-3629-2696", "linkedin": ";;;;;", "or_profile": "~Yanru_Qu1;~Dinghan_Shen1;~Yelong_Shen2;ssajeev@microsoft.com;~Weizhu_Chen1;~Jiawei_Han1", "aff": "University of Illinois, Urbana Champaign;Microsoft;;;Microsoft GenAI;University of Illinois at Urbana-Champaign (UIUC)", "aff_domain": "illinois.edu;microsoft.com;;;microsoft.com;illinois.edu", "position": "PhD student;Researcher;;;Vice President;Full Professor", "bibtex": "@inproceedings{\nqu2021coda,\ntitle={Co{\\{}DA{\\}}: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding},\nauthor={Yanru Qu and Dinghan Shen and Yelong Shen and Sandra Sajeev and Weizhu Chen and Jiawei Han},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ozk9MrX1hvA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7", "confidence": "4;4;5", "wc_review": "292;358;381", "wc_reply_reviewers": "0;269;0", "wc_reply_authors": "1143;833;345", "reply_reviewers": "0;1;0", "reply_authors": "2;2;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 343.6666666666667, 37.72119946248911 ], "wc_reply_reviewers_avg": [ 89.66666666666667, 126.8078160927875 ], "wc_reply_authors_avg": [ 773.6666666666666, 328.47255931389793 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 98, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17452045608867689681&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=Ozk9MrX1hvA", "email": "illinois.edu;microsoft.com;;;microsoft.com;illinois.edu", "author_num": 6, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "University of Illinois Urbana-Champaign;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "https://illinois.edu;https://www.microsoft.com", "aff_unique_abbr": "UIUC;Microsoft", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Urbana-Champaign;", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Variational Intrinsic Control Revisited", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3149", "id": "P0p33rgyoE", "poster": "", "openreview": "https://openreview.net/forum?id=P0p33rgyoE", "slides": "https://iclr.cc/virtual/2021/poster/3149", "video": "https://iclr.cc/virtual/2021/poster/3149", "tldr": "", "abstract": "In this paper, we revisit variational intrinsic control (VIC), an unsupervised reinforcement learning method for finding the largest set of intrinsic options available to an agent. In the original work by Gregor et al. (2016), two VIC algorithms were proposed: one that represents the options explicitly, and the other that does it implicitly. We show that the intrinsic reward used in the latter is subject to bias in stochastic environments, causing convergence to suboptimal solutions. To correct this behavior, we propose two methods respectively based on the transitional probability model and Gaussian Mixture Model. We substantiate our claims through rigorous mathematical derivations and experimental analyses. ", "keywords": "Unsupervised reinforcement learning;Information theory", "primary_area": "", "supplementary_material": "", "author": "Taehwan Kwon", "authorids": "~Taehwan_Kwon1", "gender": "", "homepage": "https://github.com/TaehwanKwon", "dblp": "", "google_scholar": "", "orcid": "", "linkedin": "", "or_profile": "~Taehwan_Kwon1", "aff": "", "aff_domain": "", "position": "", "bibtex": "@inproceedings{\nkwon2021variational,\ntitle={Variational Intrinsic Control Revisited},\nauthor={Taehwan Kwon},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=P0p33rgyoE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;4;4;4", "wc_review": "423;905;525;569", "wc_reply_reviewers": "162;328;0;81", "wc_reply_authors": "518;1698;446;543", "reply_reviewers": "1;3;0;1", "reply_authors": "2;5;2;3", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 605.5, 180.8445465033436 ], "wc_reply_reviewers_avg": [ 142.75, 121.32471924550248 ], "wc_reply_authors_avg": [ 801.25, 518.962125303957 ], "reply_reviewers_avg": [ 1.25, 1.0897247358851685 ], "reply_authors_avg": [ 3.0, 1.224744871391589 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 16, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9452105900163714394&as_sdt=400005&sciodt=0,14&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=P0p33rgyoE", "email": "", "author_num": 1 }, { "id": "P3WG6p6Jnb", "title": "Offline Policy Optimization with Variance Regularization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization can be used to augment any existing offline policy optimization algorithms. We show that the regularizer leads to a lower bound to the offline policy optimization objective, which can help avoid over-estimation errors, and explains the benefits of our approach across a range of continuous control domains when compared to existing algorithms. ", "keywords": "reinforcement learning;offline batch RL;off-policy;policy optimization;variance regularization", "primary_area": "", "supplementary_material": "", "author": "Riashat Islam;Samarth Sinha;Homanga Bharadhwaj;Samin Yeasar Arnob;Zhuoran Yang;Zhaoran Wang;Animesh Garg;Lihong Li;Doina Precup", "authorids": "~Riashat_Islam1;~Samarth_Sinha1;~Homanga_Bharadhwaj1;~Samin_Yeasar_Arnob1;~Zhuoran_Yang1;~Zhaoran_Wang1;~Animesh_Garg1;~Lihong_Li1;~Doina_Precup1", "gender": "M;M;M;M;M;Not Specified;M;;F", "homepage": "https://riashat.github.io/;https://samsinha.me;https://homangab.github.io/;https://www.linkedin.com/in/samin-yeasar-arnob/;https://zhuoranyang.github.io/;https://zhaoranwang.github.io/;http://animesh.garg.tech;https://lihongli.github.io;http://cs.mcgill.ca/~dprecup/", "dblp": "198/0459;;223/5842;;;117/2756;123/5728;l/LihongLi.html;p/DoinaPrecup", "google_scholar": "https://scholar.google.ca/citations?user=2_4Rs44AAAAJ;https://scholar.google.ca/citations?user=lnCKs0AAAAAJ;https://scholar.google.ca/citations?user=wwW4HRQAAAAJ;RMPv4RQAAAAJ;;https://scholar.google.com.tw/citations?user=HSx0BgQAAAAJ;zp8V7ZMAAAAJ;Rqy5KDEAAAAJ;https://scholar.google.com.tw/citations?user=j54VcVEAAAAJ", "orcid": ";;;;;;0000-0003-0482-4296;;", "linkedin": ";;;;;;animeshgarg/;lihong-li-9620164;", "or_profile": "~Riashat_Islam1;~Samarth_Sinha1;~Homanga_Bharadhwaj1;~Samin_Yeasar_Arnob1;~Zhuoran_Yang1;~Zhaoran_Wang1;~Animesh_Garg1;~Lihong_Li1;~Doina_Precup1", "aff": "Mila - Quebec AI Institute;University of Toronto, Toronto University;Google Brain;McGill University;University of California, Berkeley;;University of Toronto;Amazon;McGill University", "aff_domain": "mcgill.ca;ece.utoronto.ca;google.com;mcgill.ca;berkeley.edu;;toronto.edu;amazon.com;mcgill.ca", "position": "PhD student;Undergrad student;Research Intern;PhD student;Postdoc;;Assistant Professor;Senior Principal Scientist;Associate Professor", "bibtex": "@misc{\nislam2021offline,\ntitle={Offline Policy Optimization with Variance Regularization},\nauthor={Riashat Islam and Samarth Sinha and Homanga Bharadhwaj and Samin Yeasar Arnob and Zhuoran Yang and Zhaoran Wang and Animesh Garg and Lihong Li and Doina Precup},\nyear={2021},\nurl={https://openreview.net/forum?id=P3WG6p6Jnb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=P3WG6p6Jnb", "pdf_size": 0, "rating": "3;4;4", "confidence": "5;4;3", "wc_review": "706;1131;781", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1258;1107;792", "reply_reviewers": "0;0;0", "reply_authors": "4;2;1", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 872.6666666666666, 185.21758975744058 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1052.3333333333333, 194.1311126246612 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.3333333333333335, 1.247219128924647 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:umNH80_jG40J:scholar.google.com/&scioq=Offline+Policy+Optimization+with+Variance+Regularization&hl=en&as_sdt=0,5", "gs_version_total": 3, "aff_unique_index": "0;1;2;3;4;1;5;3", "aff_unique_norm": "Quebec AI Institute;University of Toronto;Google;McGill University;University of California, Berkeley;Amazon", "aff_unique_dep": "AI Institute;;Google Brain;;;Amazon.com, Inc.", "aff_unique_url": "https://mila.quebec;https://www.utoronto.ca;https://brain.google.com;https://www.mcgill.ca;https://www.berkeley.edu;https://www.amazon.com", "aff_unique_abbr": "Mila;U of T;Google Brain;McGill;UC Berkeley;Amazon", "aff_campus_unique_index": "1;2;3", "aff_campus_unique": ";Toronto;Mountain View;Berkeley", "aff_country_unique_index": "0;0;1;0;1;0;1;0", "aff_country_unique": "Canada;United States" }, { "id": "P42rXLGZQ07", "title": "Direct Evolutionary Optimization of Variational Autoencoders with Binary Latents", "track": "main", "status": "Reject", "tldr": "", "abstract": "Discrete latent variables are considered important to model the generation process of real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones.\nHere we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The studied approach is consequently strongly diverting from standard VAE training by altogether sidestepping absolute standard VAE mechanisms such as sampling approximation, reparameterization trick and amortization. \n\nDiscrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms (using a recently suggested approach). For VAEs with binary latents, we first show how such a discrete variational method (A)~ties into gradient ascent for network weights and (B)~uses the decoder network to select latent states for training. \n \nMore conventional amortized training is, as may be expected, more efficient than direct discrete optimization, and applicable to large neural networks.\nHowever, we here find direct optimization to be efficiently scalable to hundreds of latent variables using smaller networks.\nMore importantly, we find the effectiveness of direct optimization to be highly competitive in 'zero-shot' learning (where high effectiveness for small networks is required).\nIn contrast to large supervised neural networks, the here investigated VAEs can denoise a single image without previous training on clean data and/or training on large image datasets. \n\nMore generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE training in general. In the regime of few data, direct optimization, furthermore, makes VAEs competitive for denoising where they have previously been outperformed by non-generative approaches.", "keywords": "variational optimization;variational autoencoders;denoising;evolutionary algorithms", "primary_area": "", "supplementary_material": "/attachment/0056cc4e5ea3ebb8a4af7439a6d52f37395d35bd.zip", "author": "Enrico Guiraud;Jakob Drefs;Jorg Lucke", "authorids": "~Enrico_Guiraud1;~Jakob_Drefs1;~Jorg_Lucke1", "gender": "M;M;M", "homepage": ";https://uol.de/en/machine-learning/;http://uol.de/ml", "dblp": ";;http://dblp.uni-trier.de/pers/hd/l/L=uuml=cke:J=ouml=rg", "google_scholar": ";;h-NXaIsAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Enrico_Guiraud1;~Jakob_Drefs1;~Jorg_Lucke1", "aff": ";Carl von Ossietzky Universit\u00e4t Oldenburg;University of Oldenburg", "aff_domain": ";uol.de;uni-oldenburg.de", "position": ";PhD student;Associate Professor", "bibtex": "@misc{\nguiraud2021direct,\ntitle={Direct Evolutionary Optimization of Variational Autoencoders with Binary Latents},\nauthor={Enrico Guiraud and Jakob Drefs and Jorg Lucke},\nyear={2021},\nurl={https://openreview.net/forum?id=P42rXLGZQ07}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=P42rXLGZQ07", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;3;3;4", "wc_review": "481;335;524;325", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "290;321;124;443", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 416.25, 87.65094123852863 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 294.5, 113.84748569907023 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5562816539052001738&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "aff_unique_index": "0;1", "aff_unique_norm": "Carl von Ossietzky University of Oldenburg;University of Oldenburg", "aff_unique_dep": ";", "aff_unique_url": "https://www.uni-oldenburg.de/;https://www.uni-oldenburg.de/", "aff_unique_abbr": "UvO;UOL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "id": "P5RQfyAmrU", "title": "Model-centric data manifold: the data through the eyes of the model", "track": "main", "status": "Reject", "tldr": "", "abstract": "We discover that deep ReLU neural network classifiers\u00a0can see a low-dimensional Riemannian manifold structure on data. Such structure comes via the local data matrix, a variation of the Fisher information matrix, where the role of the model parameters is taken by the data variables. We obtain a foliation of the data domain and we show that the dataset on which the model is trained lies on a leaf, the data leaf, whose dimension is bounded by the number of\u00a0classification labels. We validate our results with some experiments with the MNIST dataset: paths on the data leaf connect valid images, while other leaves cover noisy images.\n", "keywords": "Deep Learning;Information Geometry;Data Manifold;Fisher matrix", "primary_area": "", "supplementary_material": "/attachment/7fc18e003434697cc9d05740086e469486808820.zip", "author": "Luca Grementieri;Rita Fioresi", "authorids": "lgrementieri@nextbit.it;~Rita_Fioresi1", "gender": ";F", "homepage": ";https://www.unibo.it/sitoweb/rita.fioresi", "dblp": ";", "google_scholar": ";DwTxLXAAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "lgrementieri@nextbit.it;~Rita_Fioresi1", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@misc{\ngrementieri2021,\ntitle={ Model-centric data manifold: the data through the eyes of the model},\nauthor={Luca Grementieri and Rita Fioresi},\nyear={2021},\nurl={https://openreview.net/forum?id=P5RQfyAmrU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=P5RQfyAmrU", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;3;3;5", "wc_review": "507;424;586;471", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "181;472;238;204", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 497.0, 59.21570737566174 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 273.75, 116.24193520412503 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4264014327112209, "gs_citation": 17, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16836839823706951910&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8 }, { "id": "P63SQE0fVa", "title": "ScheduleNet: Learn to Solve MinMax mTSP Using Reinforcement Learning with Delayed Reward", "track": "main", "status": "Reject", "tldr": "", "abstract": "Combinatorial Optimization (CO) problems are theoretically challenging yet crucial in practice. Numerous works used Reinforcement Learning (RL) to tackle these CO problems. As current approaches mainly focus on single-worker CO problems such as the famous Travelling Salesman Problem (TSP), we focus on more practical extension of TSP to multi-worker (salesmen) setting, specifically MinMax mTSP. From the RL perspective, Minmax mTSP raises several significant challenges, such as the cooperation of multiple workers and the need for a well-engineered reward function. In this paper, we present the RL framework with (1) worker-task heterograph and type-aware Graph Neural Network, and (2) the RL training method that is stable, has fast convergence speed, and directly optimizes the objective of MinMax mTSP in a delayed reward setting. We achieve comparable performance to a highly optimized meta-heuristic baseline, OR-Tools, and outperforms it in 10% of the cases, both on in-training and out-of-training problem distributions. Moreover, our problem formulation enables us to solve problems with any number of salesmen (workers) and cities.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Junyoung Park;Sanzhar Bakhtiyarov;Jinkyoo Park", "authorids": "~Junyoung_Park1;~Sanzhar_Bakhtiyarov1;~Jinkyoo_Park1", "gender": ";M;M", "homepage": ";https://github.com/bakhsanzh;http://silab.kaist.ac.kr/", "dblp": ";;156/7535", "google_scholar": ";;sH2a0nkAAAAJ", "orcid": ";;0000-0003-2620-1479", "linkedin": ";;", "or_profile": "~Junyoung_Park1;~Sanzhar_Bakhtiyarov1;~Jinkyoo_Park1", "aff": ";;Korea Advanced Institute of Science & Technology", "aff_domain": ";;kaist.ac.kr", "position": ";;Associate Professor", "bibtex": "@misc{\npark2021schedulenet,\ntitle={ScheduleNet: Learn to Solve MinMax m{\\{}TSP{\\}} Using Reinforcement Learning with Delayed Reward},\nauthor={Junyoung Park and Sanzhar Bakhtiyarov and Jinkyoo Park},\nyear={2021},\nurl={https://openreview.net/forum?id=P63SQE0fVa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=P63SQE0fVa", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "5;3;4;5", "wc_review": "910;191;937;366", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1291;765;1329;659", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 601.0, 328.5201668086755 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1011.0, 301.63885691336253 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6901099896546936926&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_country_unique_index": "0", "aff_country_unique": "South Korea" }, { "title": "Learning Safe Multi-agent Control with Decentralized Neural Barrier Certificates", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2820", "id": "P6_q1BRxY8Q", "poster": "", "openreview": "https://openreview.net/forum?id=P6_q1BRxY8Q", "slides": "https://iclr.cc/virtual/2021/poster/2820", "video": "https://iclr.cc/virtual/2021/poster/2820", "author_site": "Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen, Chuchu Fan", "tldr": "", "abstract": "We study the multi-agent safe control problem where agents should avoid collisions to static obstacles and collisions with each other while reaching their goals. Our core idea is to learn the multi-agent control policy jointly with learning the control barrier functions as safety certificates. We propose a new joint-learning framework that can be implemented in a decentralized fashion, which can adapt to an arbitrarily large number of agents. Building upon this framework, we further improve the scalability by incorporating neural network architectures that are invariant to the quantity and permutation of neighboring agents. In addition, we propose a new spontaneous policy refinement method to further enforce the certificate condition during testing. We provide extensive experiments to demonstrate that our method significantly outperforms other leading multi-agent control approaches in terms of maintaining safety and completing original tasks. Our approach also shows substantial generalization capability in that the control policy can be trained with 8 agents in one scenario, while being used on other scenarios with up to 1024 agents in complex multi-agent environments and dynamics. Videos and source code can be found at https://realm.mit.edu/blog/learning-safe-multi-agent-control-decentralized-neural-barrier-certificates.", "keywords": "Multi-agent;safe;control barrier function;reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/b4608dcefced74800b8aa4789255d038eeb7c569.zip", "author": "Zengyi Qin;Kaiqing Zhang;Yuxiao Chen;Jingkai Chen;Chuchu Fan", "authorids": "~Zengyi_Qin1;~Kaiqing_Zhang3;~Yuxiao_Chen1;~Jingkai_Chen2;~Chuchu_Fan2", "gender": "M;;M;F;M", "homepage": ";;http://jkchengh.github.io;https://chuchu.mit.edu;https://kzhang66.github.io/", "dblp": "230/7736;158/4934;;127/1756;", "google_scholar": ";;FK-l688AAAAJ;J-dq_8EAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": ";;;;", "linkedin": ";;;chuchu-fan/;", "or_profile": "~Zengyi_Qin1;~Yuxiao_Chen1;~Jingkai_Chen2;~Chuchu_Fan2;~kaiqing_zhang1", "aff": "Massachusetts Institute of Technology;;Massachusetts Institute of Technology;Massachusetts Institute of Technology;University of Illinois, Urbana Champaign", "aff_domain": "mit.edu;;mit.edu;mit.edu;illinois.edu", "position": "Graduate student;;PhD student;Assistant Professor;PhD student", "bibtex": "@inproceedings{\nqin2021learning,\ntitle={Learning Safe Multi-agent Control with Decentralized Neural Barrier Certificates},\nauthor={Zengyi Qin and Kaiqing Zhang and Yuxiao Chen and Jingkai Chen and Chuchu Fan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=P6_q1BRxY8Q}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer5;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "4;6;7;8;8", "confidence": "4;3;3;4;2", "wc_review": "390;333;264;634;729", "wc_reply_reviewers": "327;13;22;0;172", "wc_reply_authors": "1785;984;457;764;807", "reply_reviewers": "2;1;1;0;1", "reply_authors": "4;4;2;2;2", "rating_avg": [ 6.6, 1.4966629547095764 ], "confidence_avg": [ 3.2, 0.7483314773547882 ], "wc_review_avg": [ 470.0, 179.76762778654003 ], "wc_reply_reviewers_avg": [ 106.8, 126.59763030957569 ], "wc_reply_authors_avg": [ 959.4, 446.3257106643085 ], "reply_reviewers_avg": [ 1.0, 0.6324555320336759 ], "reply_authors_avg": [ 2.8, 0.9797958971132712 ], "replies_avg": [ 29, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.46428571428571425, "gs_citation": 175, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14748027293867959252&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=P6_q1BRxY8Q", "email": "mit.edu;;mit.edu;mit.edu;illinois.edu", "author_num": 5, "aff_unique_index": "0;0;0;1", "aff_unique_norm": "Massachusetts Institute of Technology;University of Illinois Urbana-Champaign", "aff_unique_dep": ";", "aff_unique_url": "https://web.mit.edu;https://illinois.edu", "aff_unique_abbr": "MIT;UIUC", "aff_campus_unique_index": "1", "aff_campus_unique": ";Urbana-Champaign", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "P84ryxVG6tR", "title": "REPAINT: Knowledge Transfer in Deep Actor-Critic Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Accelerating the learning processes for complex tasks by leveraging previously learned tasks has been one of the most challenging problems in reinforcement learning, especially when the similarity between source and target tasks is low or unknown. In this work, we propose a REPresentation-And-INstance Transfer algorithm (REPAINT) for deep actor-critic reinforcement learning paradigm. In representation transfer, we adopt a kickstarted training method using a pre-trained teacher policy by introducing an auxiliary cross-entropy loss. In instance transfer, we develop a sampling approach, i.e., advantage-based experience replay, on transitions collected following the teacher policy, where only the samples with high advantage estimates are retained for policy update. We consider both learning an unseen target task by transferring from previously learned teacher tasks and learning a partially unseen task composed of multiple sub-tasks by transferring from a pre-learned teacher sub-task. In several benchmark experiments, REPAINT significantly reduces the total training time and improves the asymptotic performance compared to training with no prior knowledge and other baselines.", "keywords": "reinforcement learning;transfer learning;actor-critic RL;representation transfer;instance transfer;task similarity;MuJoCo;DeepRacer", "primary_area": "", "supplementary_material": "/attachment/41d8b352c6e5214430c0a7d1cd78d67fc6e5843c.zip", "author": "Yunzhe Tao;Sahika Genc;TAO SUN;Sunil Mallya", "authorids": "~Yunzhe_Tao2;~Sahika_Genc1;~TAO_SUN4;~Sunil_Mallya1", "gender": "M;F;M;M", "homepage": ";;;", "dblp": ";05/5914.html;;", "google_scholar": "gldelC4AAAAJ;qULIvBsAAAAJ;;", "orcid": ";;;", "linkedin": "yunzhe-tao/;sahika/;;", "or_profile": "~Yunzhe_Tao2;~Sahika_Genc1;~TAO_SUN4;~Sunil_Mallya1", "aff": "Amazon;;Amazon;", "aff_domain": "amazon.com;;amazon.com;", "position": "Applied Scientist;;Applied Scientist;", "bibtex": "@misc{\ntao2021repaint,\ntitle={{\\{}REPAINT{\\}}: Knowledge Transfer in Deep Actor-Critic Reinforcement Learning},\nauthor={Yunzhe Tao and Sahika Genc and TAO SUN and Sunil Mallya},\nyear={2021},\nurl={https://openreview.net/forum?id=P84ryxVG6tR}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=P84ryxVG6tR", "pdf_size": 0, "rating": "4;4;6;7", "confidence": "4;4;4;4", "wc_review": "365;202;440;944", "wc_reply_reviewers": "0;0;5;0", "wc_reply_authors": "815;686;743;1226", "reply_reviewers": "0;0;1;0", "reply_authors": "2;1;2;3", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 487.75, 277.1122290697399 ], "wc_reply_reviewers_avg": [ 1.25, 2.165063509461097 ], "wc_reply_authors_avg": [ 867.5, 211.96756827401686 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3129441518170328327&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Amazon", "aff_unique_dep": "Amazon.com, Inc.", "aff_unique_url": "https://www.amazon.com", "aff_unique_abbr": "Amazon", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "PAsd7_vP4_", "title": "Adaptive Discretization for Continuous Control using Particle Filtering Policy Network", "track": "main", "status": "Reject", "tldr": "", "abstract": "Controlling the movements of highly articulated agents and robots has been a long-standing challenge to model-free deep reinforcement learning. In this paper, we propose a simple, yet general, framework for improving the performance of policy gradient algorithms by discretizing the continuous action space. Instead of using a fixed set of predetermined atomic actions, we exploit particle filtering to adaptively discretize actions during training and track the posterior policy represented as a mixture distribution. The resulting policy can replace the original continuous policy of any given policy gradient algorithm without changing its underlying model architecture. We demonstrate the applicability of our approach to state-of-the-art on-policy and off-policy baselines in challenging control tasks. Baselines using our particle-based policies achieve better final performance and speed of convergence as compared to corresponding continuous implementations and implementations that rely on fixed discretization schemes. ", "keywords": "Reinforcement Learning;Continuous Control;Action Space Discretization;Policy Gradient", "primary_area": "", "supplementary_material": "/attachment/a41109f093218a06db40381464280f84c808a545.zip", "author": "Pei Xu;Ioannis Karamouzas", "authorids": "~Pei_Xu1;~Ioannis_Karamouzas1", "gender": "M;M", "homepage": "https://pei-xu.github.io;https://people.cs.clemson.edu/~ioannis/", "dblp": ";99/7002", "google_scholar": "LNaO-EYAAAAJ;WDv3eyUAAAAJ", "orcid": "0000-0001-7851-3971;", "linkedin": ";", "or_profile": "~Pei_Xu1;~Ioannis_Karamouzas1", "aff": "Clemson University;Clemson University", "aff_domain": "clemson.edu;clemson.edu", "position": "PhD student;Assistant Professor", "bibtex": "@misc{\nxu2021adaptive,\ntitle={Adaptive Discretization for Continuous Control using Particle Filtering Policy Network},\nauthor={Pei Xu and Ioannis Karamouzas},\nyear={2021},\nurl={https://openreview.net/forum?id=PAsd7_vP4_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=PAsd7_vP4_", "pdf_size": 0, "rating": "4;5;5;7", "confidence": "3;3;4;3", "wc_review": "306;501;195;507", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "808;614;962;1173", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;2;2", "rating_avg": [ 5.25, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 377.25, 132.7033816449302 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 889.25, 205.04313570563633 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.13245323570650439, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:xDOteCaIpq8J:scholar.google.com/&scioq=Adaptive+Discretization+for+Continuous+Control+using+Particle+Filtering+Policy+Network&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Clemson University", "aff_unique_dep": "", "aff_unique_url": "https://www.clemson.edu", "aff_unique_abbr": "Clemson", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "PBfaUXYZzU", "title": "Class-Weighted Evaluation Metrics for Imbalanced Data Classification", "track": "main", "status": "Reject", "tldr": "", "abstract": "Class distribution skews in imbalanced datasets may lead to models with prediction bias towards majority classes, making fair assessment of classifiers a challenging task. Balanced Accuracy is a popular metric used to evaluate a classifier\u2019s prediction performance under such scenarios. However, this metric falls short when classes vary in importance, especially when class importance is skewed differently from class cardinality distributions. In this paper, we propose a simple and general-purpose evaluation framework for imbalanced data classification that is sensitive to arbitrary skews in class cardinalities and importances. Experiments with several state-of-the-art classifiers tested on real-world datasets and benchmarks from two different domains show that our new framework is more effective than Balanced Accuracy \u2013- not only in evaluating and ranking model predictions, but also in training the models themselves.", "keywords": "Imbalanced data classification;Evaluation metrics;Log parsing;Sentiment analysis", "primary_area": "", "supplementary_material": "/attachment/1461772b7250658874a316da72d635224648e496.zip", "author": "Akhilesh Gupta;Nesime Tatbul;Ryan Marcus;Shengtian Zhou;Insup Lee;Justin Gottschlich", "authorids": "akhileshgupta@alumni.upenn.edu;~Nesime_Tatbul1;~Ryan_Marcus1;~Shengtian_Zhou1;~Insup_Lee1;~Justin_Gottschlich1", "gender": ";;M;M;;", "homepage": ";https://people.csail.mit.edu/tatbul/;https://rmarcus.info;;https://www.cis.upenn.edu/~lee/;", "dblp": ";t/NesimeTatbul;https://dblp.uni-trier.de/pid/175/1473.html;;l/InsupLee.html;", "google_scholar": ";YlsHgYQAAAAJ;vPOl-IwAAAAJ;2z2FiKAAAAAJ;qPlUgrgAAAAJ;", "orcid": ";0000-0002-0416-7022;0000-0002-1279-1124;;0000-0003-2672-1132;", "linkedin": ";nesime-tatbul-0724964;;shengtian-zhou/;;", "or_profile": "akhileshgupta@alumni.upenn.edu;~Nesime_Tatbul1;~Ryan_Marcus1;~Shengtian_Zhou1;~Insup_Lee1;~Justin_Gottschlich1", "aff": ";Massachusetts Institute of Technology;Computer Science and Artificial Intelligence Laboratory, Electrical Engineering & Computer Science;Intel;University of Pennsylvania;", "aff_domain": ";mit.edu;csail.mit.edu;intel.com;upenn.edu;", "position": ";Sr. Research Scientist;Postdoc;Researcher;Full Professor;", "bibtex": "@misc{\ngupta2021classweighted,\ntitle={Class-Weighted Evaluation Metrics for Imbalanced Data Classification},\nauthor={Akhilesh Gupta and Nesime Tatbul and Ryan Marcus and Shengtian Zhou and Insup Lee and Justin Gottschlich},\nyear={2021},\nurl={https://openreview.net/forum?id=PBfaUXYZzU}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer6;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=PBfaUXYZzU", "pdf_size": 0, "rating": "3;3;4;6", "confidence": "5;4;4;4", "wc_review": "492;151;211;536", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 1.224744871391589 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 347.5, 168.56526925793463 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.4714045207910316, "gs_citation": 33, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16914361965458291751&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0;0;1;2", "aff_unique_norm": "Massachusetts Institute of Technology;Intel;University of Pennsylvania", "aff_unique_dep": ";Intel Corporation;", "aff_unique_url": "https://web.mit.edu;https://www.intel.com;https://www.upenn.edu", "aff_unique_abbr": "MIT;Intel;UPenn", "aff_campus_unique_index": "1", "aff_campus_unique": ";Cambridge", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "PDLPdWHdp-h", "title": "Understanding Adversarial Attacks on Autoencoders", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Adversarial vulnerability is a fundamental limitation of deep neural networks which remains poorly understood. Recent work suggests that adversarial attacks on deep neural network classifiers exploit the fact that non-robust models rely on superficial statistics to form predictions. While such features are semantically meaningless, they are strongly predictive of the input\u2019s label, allowing non-robust networks to achieve good generalization on unperturbed test inputs. However, this hypothesis fails to explain why autoencoders are also vulnerable to adversarial attacks, despite achieving low reconstruction error on clean inputs. We show that training an autoencoder on adversarial input-target pairs leads to low reconstruction error on the standard test set, suggesting that adversarial attacks on autoencoders are predictive. In this work, we study the predictive power of adversarial examples on autoencoders through the lens of compressive sensing. We characterize the relationship between adversarial perturbations and target inputs and reveal that training autoencoders on adversarial input-target pairs is a form of knowledge distillation, achieved by learning to attenuate structured noise.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Elsa Riachi;Frank Rudzicz", "authorids": "~Elsa_Riachi1;~Frank_Rudzicz2", "gender": "F;M", "homepage": ";http://www.cs.toronto.edu/~frank", "dblp": ";36/6505", "google_scholar": ";https://scholar.google.ca/citations?user=elXOB1sAAAAJ", "orcid": ";0000-0002-1139-3423", "linkedin": "elsa-riachi/;", "or_profile": "~Elsa_Riachi1;~Frank_Rudzicz2", "aff": ";Vector Institute for Artificial Intelligence", "aff_domain": ";vectorinstitute.ai", "position": ";Faculty", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=PDLPdWHdp-h", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "4;4;2;3", "wc_review": "263;336;200;262", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 265.25, 48.16313424186595 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.560611910581388, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:V0_N7Ccp7psJ:scholar.google.com/&scioq=Understanding+Adversarial+Attacks+on+Autoencoders&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Vector Institute for Artificial Intelligence", "aff_unique_dep": "", "aff_unique_url": "https://vectorinstitute.ai/", "aff_unique_abbr": "Vector Institute", "aff_country_unique_index": "0", "aff_country_unique": "Canada" }, { "id": "PEcNk5Bad7z", "title": "Learning Irreducible Representations of Noncommutative Lie Groups", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent work has constructed neural networks that are equivariant to continuous symmetry groups such as 2D and 3D rotations. This is accomplished using explicit group representations to derive the equivariant kernels and nonlinearities. We present two contributions motivated by frontier applications of equivariance beyond rotations and translations. First, we relax the requirement for explicit Lie group representations, presenting a novel algorithm that finds irreducible representations of noncommutative Lie groups given only the structure constants of the associated Lie algebra. Second, we demonstrate that Lorentz-equivariance is a useful prior for object-tracking tasks and construct the first object-tracking model equivariant to the Poincar\u00e9 group.", "keywords": "equivariance;object tracking;equivariant neural networks;deep learning;point cloud;lie group;lie algebra;lorentz group;poincar\u00e9 group", "primary_area": "", "supplementary_material": "/attachment/5cf8a715afafea6d1f2012d67c84a25da0c81d5c.zip", "author": "Noah Shutty;Casimir Wierzynski", "authorids": "~Noah_Shutty1;casimir.wierzynski@intel.com", "gender": ";", "homepage": "https://stanford.edu/~noaj;", "dblp": "227/2278.html;", "google_scholar": "Lm6IeigAAAAJ;", "orcid": "0000-0002-6035-2812;", "linkedin": "noah-shutty-b7602bbb/;", "or_profile": "~Noah_Shutty1;casimir.wierzynski@intel.com", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@misc{\nshutty2021learning,\ntitle={Learning Irreducible Representations of Noncommutative Lie Groups},\nauthor={Noah Shutty and Casimir Wierzynski},\nyear={2021},\nurl={https://openreview.net/forum?id=PEcNk5Bad7z}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=PEcNk5Bad7z", "pdf_size": 0, "rating": "4;5;5", "confidence": "3;2;4", "wc_review": "330;327;547", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "237;321;231", "reply_reviewers": "0;0;0", "reply_authors": "2;1;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 401.3333333333333, 103.00916895543274 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 263.0, 41.08527716834828 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12091634309795390196&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2 }, { "id": "PGmqOzKEPZN", "title": "Non-Negative Bregman Divergence Minimization for Deep Direct Density Ratio Estimation", "track": "main", "status": "Reject", "tldr": "", "abstract": "The estimation of the ratio of two probability densities has garnered attention as the density ratio is useful in various machine learning tasks, such as anomaly detection and domain adaptation. To estimate the density ratio, methods collectively known as direct density ratio estimation (DRE) have been explored. These methods are based on the minimization of the Bregman (BR) divergence between a density ratio model and the true density ratio. However, existing direct DRE suffers from serious overfitting when using flexible models such as neural networks. In this paper, we introduce a non-negative correction for empirical risk using only the prior knowledge of the upper bound of the density ratio. This correction makes a DRE method more robust against overfitting and enables the use of flexible models. In the theoretical analysis, we discuss the consistency of the empirical risk. In our experiments, the proposed estimators show favorable performance in inlier-based outlier detection and covariate shift adaptation.", "keywords": "density ratio estimation;bregman divergence", "primary_area": "", "supplementary_material": "/attachment/d6c67b03bc3af7274ed1575de42ba28e87f7b5cb.zip", "author": "Masahiro Kato;Takeshi Teshima", "authorids": "~Masahiro_Kato1;~Takeshi_Teshima1", "gender": "M;M", "homepage": "https://masakat0.github.io/;https://takeshi-teshima.info", "dblp": ";227/2570", "google_scholar": "https://scholar.google.co.jp/schhp?hl=ja;https://scholar.google.co.jp/citations?user=JZoQoDsAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Masahiro_Kato1;~Takeshi_Teshima1", "aff": "Cyberagent;The University of Tokyo", "aff_domain": "cyberagent.co.jp;u-tokyo.ac.jp", "position": "Researcher;PhD student", "bibtex": "@misc{\nkato2021nonnegative,\ntitle={Non-Negative Bregman Divergence Minimization for Deep Direct Density Ratio Estimation},\nauthor={Masahiro Kato and Takeshi Teshima},\nyear={2021},\nurl={https://openreview.net/forum?id=PGmqOzKEPZN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=PGmqOzKEPZN", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;4;3;2", "wc_review": "1123;335;642;274", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "770;743;822;714", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 593.5, 336.0152526299959 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 762.25, 39.776720578750584 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 42, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10575793668423594372&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1", "aff_unique_norm": "CyberAgent Inc.;University of Tokyo", "aff_unique_dep": ";", "aff_unique_url": "https://www.cyberagent.co.jp;https://www.u-tokyo.ac.jp", "aff_unique_abbr": "CyberAgent;UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "title": "Generating Adversarial Computer Programs using Optimized Obfuscations", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3346", "id": "PH5PH9ZO_4", "poster": "", "openreview": "https://openreview.net/forum?id=PH5PH9ZO_4", "slides": "https://iclr.cc/virtual/2021/poster/3346", "video": "https://iclr.cc/virtual/2021/poster/3346", "author_site": "Shashank Srikant, Sijia Liu, Tamara Mitrovska, Shiyu Chang, Quanfu Fan, Gaoyuan Zhang, Una-May O'Reilly", "tldr": "", "abstract": "Machine learning (ML) models that learn and predict properties of computer programs are increasingly being adopted and deployed. \nThese models have demonstrated success in applications such as auto-completing code, summarizing large programs, and detecting bugs and malware in programs. \nIn this work, we investigate principled ways to adversarially perturb a computer program to fool such learned models, and thus determine their adversarial robustness. We use program obfuscations, which have conventionally been used to avoid attempts at reverse engineering programs, as adversarial perturbations. These perturbations modify programs in ways that do not alter their functionality but can be crafted to deceive an ML model when making a decision. We provide a general formulation for an adversarial program that allows applying multiple obfuscation transformations to a program in any language. We develop first-order optimization algorithms to efficiently determine two key aspects -- which parts of the program to transform, and what transformations to use. We show that it is important to optimize both these aspects to generate the best adversarially perturbed program. Due to the discrete nature of this problem, we also propose using randomized smoothing to improve the attack loss landscape to ease optimization. \nWe evaluate our work on Python and Java programs on the problem of program summarization. \nWe show that our best attack proposal achieves a $52\\%$ improvement over a state-of-the-art attack generation approach for programs trained on a \\textsc{seq2seq} model.\nWe further show that our formulation is better at training models that are robust to adversarial attacks.", "keywords": "Machine Learning (ML) for Programming Languages (PL)/Software Engineering (SE);Adversarial computer programs;Program obfuscation;Combinatorial optimization;Differentiable program generator;Models for code", "primary_area": "", "supplementary_material": "", "author": "Shashank Srikant;Sijia Liu;Tamara Mitrovska;Shiyu Chang;Quanfu Fan;Gaoyuan Zhang;Una-May O'Reilly", "authorids": "~Shashank_Srikant1;~Sijia_Liu1;tamaram@mit.edu;~Shiyu_Chang2;~Quanfu_Fan1;~Gaoyuan_Zhang1;~Una-May_O'Reilly1", "gender": ";M;;Unspecified;M;M;F", "homepage": ";https://lsjxjtu.github.io/;;http://people.csail.mit.edu/chang87/;;;https://alfagroup.csail.mit.edu/unamay", "dblp": "52/8772;128/6972-1;;28/9988;66/3950;;o/UnaMayOReilly", "google_scholar": ";C7dO_UgAAAAJ;;r21asW4AAAAJ;kCxHiwUAAAAJ;;https://scholar.google.com/citations?hl=en", "orcid": ";;;;;;0000-0001-6923-8445", "linkedin": ";;;;;;", "or_profile": "~Shashank_Srikant1;~Sijia_Liu1;tamaram@mit.edu;~Shiyu_Chang2;~Quanfu_Fan1;~Gaoyuan_Zhang1;~Una-May_O'Reilly1", "aff": "Massachusetts Institute of Technology;Michigan State University;;International Business Machines;MIT-IBM Watson AI Lab;International Business Machines;Massachusetts Institute of Technology", "aff_domain": "mit.edu;msu.edu;;ibm.com;us.ibm.com;ibm.com;mit.edu", "position": "PhD student;Assistant Professor;;Researcher;Researcher;Research engineer;Principal Researcher", "bibtex": "@inproceedings{\nsrikant2021generating,\ntitle={Generating Adversarial Computer Programs using Optimized Obfuscations},\nauthor={Shashank Srikant and Sijia Liu and Tamara Mitrovska and Shiyu Chang and Quanfu Fan and Gaoyuan Zhang and Una-May O'Reilly},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PH5PH9ZO_4}\n}", "github": "[![github](/images/github_icon.svg) ALFA-group/adversarial-code-generation](https://github.com/ALFA-group/adversarial-code-generation)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;3;4", "wc_review": "844;325;432", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1474;696;1455", "reply_reviewers": "0;0;0", "reply_authors": "2;1;2", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 533.6666666666666, 223.74439781937681 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1208.3333333333333, 362.35740490417595 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 62, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1001230882267147217&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=PH5PH9ZO_4", "email": "mit.edu;msu.edu;;ibm.com;us.ibm.com;ibm.com;mit.edu", "author_num": 7, "aff_unique_index": "0;1;2;0;2;0", "aff_unique_norm": "Massachusetts Institute of Technology;Michigan State University;International Business Machines Corporation", "aff_unique_dep": ";;", "aff_unique_url": "https://web.mit.edu;https://www.msu.edu;https://www.ibm.com", "aff_unique_abbr": "MIT;MSU;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "PI_CwQparl_", "title": "Image Modeling with Deep Convolutional Gaussian Mixture Models", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this conceptual work, we present DCGMM, a deep hierarchical Gaussian Mixture Model (GMM) that is particularly suited for describing and generating images.\nVanilla (i.e., \"flat\") GMMs require a very large number of components to well describe images, leading to long training times and memory issues. \nDCGMMs avoid this by a stacked architecture of multiple GMM layers, linked by convolution and pooling operations. \nThis allows to exploit the compositionality of images in a similar way as deep CNNs do.\nThis sets them apart from vanilla GMMs which are trained by EM, requiring a prior k-means initialization which is infeasible in a layered structure.\nFor generating sharp images with DCGMM, we introduce a new gradient-based technique for sampling through non-invertible operations like convolution and pooling.\nBased on the MNIST and FashionMNIST datasets, we validate the DCGMM model by demonstrating its superiority over \"flat\" GMMs for clustering, sampling and outlier detection.\nWe additionally demonstrate the applicability of DCGMM to variant generation, in-painting and class-conditional sampling. ", "keywords": "Gaussian Mixture Model;Deep Learning;Unsupervised Representation Learning;Sampling", "primary_area": "", "supplementary_material": "", "author": "Alexander Gepperth;Benedikt Pf\u00fclb", "authorids": "~Alexander_Gepperth1;~Benedikt_Pf\u00fclb1", "gender": "M;M", "homepage": "http://www.gepperth.net;https://www.hs-fulda.de/", "dblp": "05/11166;", "google_scholar": "QR2zb3IAAAAJ;", "orcid": "0000-0003-2216-7808;", "linkedin": ";", "or_profile": "~Alexander_Gepperth1;~Benedikt_Pf\u00fclb1", "aff": "HAW Fulda;University of Applied Sciences Fulda", "aff_domain": "informatik.hs-fulda.de;cs.hs-fulda.de", "position": "Full Professor;PhD student", "bibtex": "@misc{\ngepperth2021image,\ntitle={Image Modeling with Deep Convolutional Gaussian Mixture Models},\nauthor={Alexander Gepperth and Benedikt Pf{\\\"u}lb},\nyear={2021},\nurl={https://openreview.net/forum?id=PI_CwQparl_}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=PI_CwQparl_", "pdf_size": 0, "rating": "2;3;3;4", "confidence": "5;4;4;4", "wc_review": "723;696;618;754", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 697.75, 50.410192421771214 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1714903863840408491&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 11, "aff_unique_index": "0;1", "aff_unique_norm": "Fulda University of Applied Sciences;University of Applied Sciences Fulda", "aff_unique_dep": ";", "aff_unique_url": "https://www.haw-fulda.de;https://www.hs-fulda.de", "aff_unique_abbr": "HAW Fulda;", "aff_campus_unique_index": "0", "aff_campus_unique": "Fulda;", "aff_country_unique_index": "0;0", "aff_country_unique": "Germany" }, { "title": "Rethinking Architecture Selection in Differentiable NAS", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2787", "id": "PKubaeJkw3", "poster": "", "openreview": "https://openreview.net/forum?id=PKubaeJkw3", "slides": "https://iclr.cc/virtual/2021/poster/2787", "video": "https://iclr.cc/virtual/2021/poster/2787", "author_site": "Ruochen Wang, Minhao Cheng, Xiangning Chen, Xiaocheng Tang, Cho-Jui Hsieh", "tldr": "", "abstract": "Differentiable Neural Architecture Search is one of the most popular Neural Architecture Search (NAS) methods for its search efficiency and simplicity, accomplished by jointly optimizing the model weight and architecture parameters in a weight-sharing supernet via gradient-based algorithms. At the end of the search phase, the operations with the largest architecture parameters will be selected to form the final architecture, with the implicit assumption that the values of architecture parameters reflect the operation strength. While much has been discussed about the supernet's optimization, the architecture selection process has received little attention. We provide empirical and theoretical analysis to show that the magnitude of architecture parameters does not necessarily indicate how much the operation contributes to the supernet's performance. We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet. We re-evaluate several differentiable NAS methods with the proposed architecture selection and find that it is able to extract significantly improved architectures from the underlying supernets consistently. Furthermore, we find that several failure modes of DARTS can be greatly alleviated with the proposed selection method, indicating that much of the poor generalization observed in DARTS can be attributed to the failure of magnitude-based architecture selection rather than entirely the optimization of its supernet.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/b86c1c071a99c3b71293e5d8caac2c8d83761409.zip", "author": "Ruochen Wang;Minhao Cheng;Xiangning Chen;Xiaocheng Tang;Cho-Jui Hsieh", "authorids": "~Ruochen_Wang2;~Minhao_Cheng1;~Xiangning_Chen1;~Xiaocheng_Tang1;~Cho-Jui_Hsieh1", "gender": "M;M;M;;M", "homepage": "https://ruocwang.github.io/;https://cmhcbb.github.io/;;https://mktal.github.io/;http://web.cs.ucla.edu/~chohsieh/index.html", "dblp": "33/120;174/1717;56/7393;03/6299;14/2770", "google_scholar": "8fXrlRAAAAAJ;_LkC1yoAAAAJ;vNcBx1sAAAAJ;fSrzDjIAAAAJ;Wy89g4IAAAAJ", "orcid": ";0000-0003-3965-4215;;;", "linkedin": "ruochen-wang-1699b1113/;;;xiaochengt/;", "or_profile": "~Ruochen_Wang2;~Minhao_Cheng1;~Xiangning_Chen1;~Xiaocheng_Tang1;~Cho-Jui_Hsieh1", "aff": "University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles;;University of California, Los Angeles", "aff_domain": "ucla.edu;ucla.edu;cs.ucla.edu;;ucla.edu", "position": "MS student;PhD student;PhD student;;Assistant Professor", "bibtex": "@inproceedings{\nwang2021rethinking,\ntitle={Rethinking Architecture Selection in Differentiable {NAS}},\nauthor={Ruochen Wang and Minhao Cheng and Xiangning Chen and Xiaocheng Tang and Cho-Jui Hsieh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PKubaeJkw3}\n}", "github": "[![github](/images/github_icon.svg) ruocwang/darts-pt](https://github.com/ruocwang/darts-pt)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "7;7;7;10", "confidence": "4;4;5;5", "wc_review": "180;569;610;619", "wc_reply_reviewers": "0;0;67;16", "wc_reply_authors": "415;1247;1182;475", "reply_reviewers": "0;0;1;1", "reply_authors": "1;3;3;1", "rating_avg": [ 7.75, 1.299038105676658 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 494.5, 182.55204737279723 ], "wc_reply_reviewers_avg": [ 20.75, 27.48977082479954 ], "wc_reply_authors_avg": [ 829.75, 386.01902479022976 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5773502691896257, "gs_citation": 233, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=803192450904020326&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=PKubaeJkw3", "email": "ucla.edu;ucla.edu;cs.ucla.edu;;ucla.edu", "author_num": 5, "aff_unique_index": "0;0;0;0", "aff_unique_norm": "University of California, Los Angeles", "aff_unique_dep": "", "aff_unique_url": "https://www.ucla.edu", "aff_unique_abbr": "UCLA", "aff_campus_unique_index": "0;0;0;0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "PO0SuuafSX", "title": "3D Scene Compression through Entropy Penalized Neural Representation Functions", "track": "main", "status": "Reject", "tldr": "", "abstract": "Some forms of novel visual media enable the viewer to explore a 3D scene from essentially arbitrary viewpoints, by interpolating between a discrete set of original views. Compared to 2D imagery, these types of applications require much larger amounts of storage space, which we seek to reduce. Existing approaches for compressing 3D scenes are based on a separation of compression and rendering: each of the original views is compressed using traditional 2D image formats; the receiver decompresses the views and then performs the rendering. We unify these steps by directly compressing an implicit representation of the scene, a function that maps spatial coordinates to a radiance vector field, which can then be queried to render arbitrary viewpoints. The function is implemented as a neural network and jointly trained for reconstruction as well as compressibility, in an end-to-end manner, with the use of an entropy penalty on the parameters. Our method significantly outperforms a state-of-the-art conventional approach for scene compression, achieving simultaneously higher quality reconstructions and lower bitrates. Furthermore, we show that the performance at lower bitrates can be improved by jointly representing multiple scenes using a soft form of parameter sharing.", "keywords": "scene representation;compression;neural rendering;entropy coding", "primary_area": "", "supplementary_material": "", "author": "Thomas Bird;Johannes Ball\u00e9;Saurabh Singh;Philip Chou", "authorids": "~Thomas_Bird1;~Johannes_Ball\u00e91;~Saurabh_Singh1;~Philip_Chou1", "gender": ";Non-Binary;M;", "homepage": ";https://balle.io;http://www.saurabhsingh.info;https://packet.media", "dblp": "https://dblp.uni-trier.de/pers/b/Bird:Thomas.html;84/4973;75/5436-5;c/PhilipAChou.html", "google_scholar": "https://scholar.google.com/citations?view_op=list_works;uKDe38UAAAAJ;L7fTK1MAAAAJ;BI4MThAAAAAJ", "orcid": ";0000-0003-0769-8985;;0000-0002-7242-0210", "linkedin": ";;;phchou/", "or_profile": "~Thomas_Bird1;~Johannes_Ball\u00e91;~Saurabh_Singh1;~Philip_Chou1", "aff": "University College London;Google;Google;", "aff_domain": "ucl.ac.uk;google.com;google.com;", "position": "PhD student;Research Scientist;Research Scientist;", "bibtex": "@misc{\nbird2021d,\ntitle={3D Scene Compression through Entropy Penalized Neural Representation Functions},\nauthor={Thomas Bird and Johannes Ball{\\'e} and Saurabh Singh and Philip Chou},\nyear={2021},\nurl={https://openreview.net/forum?id=PO0SuuafSX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=PO0SuuafSX", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "5;3;4;2", "wc_review": "411;352;381;253", "wc_reply_reviewers": "107;0;0;0", "wc_reply_authors": "315;317;357;307", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 349.25, 59.356444468987526 ], "wc_reply_reviewers_avg": [ 26.75, 46.332359102467464 ], "wc_reply_authors_avg": [ 324.0, 19.4164878389476 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.4472135954999579, "gs_citation": 38, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11189551586158365027&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;1", "aff_unique_norm": "University College London;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.ucl.ac.uk;https://www.google.com", "aff_unique_abbr": "UCL;Google", "aff_campus_unique_index": "1;1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;1;1", "aff_country_unique": "United Kingdom;United States" }, { "title": "BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2596", "id": "POWv6hDd9XH", "poster": "", "openreview": "https://openreview.net/forum?id=POWv6hDd9XH", "slides": "https://iclr.cc/virtual/2021/poster/2596", "video": "https://iclr.cc/virtual/2021/poster/2596", "author_site": "Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, fengwei yu, Wei Wang, Shi Gu", "tldr": "", "abstract": "We study the challenging task of neural network quantization without end-to-end retraining, called Post-training Quantization (PTQ). PTQ usually requires a small subset of training data but produces less powerful quantized models than Quantization-Aware Training (QAT). In this work, we propose a novel PTQ framework, dubbed BRECQ, which pushes the limits of bitwidth in PTQ down to INT2 for the first time. BRECQ leverages the basic building blocks in neural networks and reconstructs them one-by-one. In a comprehensive theoretical study of the second-order error, we show that BRECQ achieves a good balance between cross-layer dependency and generalization error. To further employ the power of quantization, the mixed precision technique is incorporated in our framework by approximating the inter-layer and intra-layer sensitivity. Extensive experiments on various handcrafted and searched neural architectures are conducted for both image classification and object detection tasks. And for the first time we prove that, without bells and whistles, PTQ can attain 4-bit ResNet and MobileNetV2 comparable with QAT and enjoy 240 times faster production of quantized models. Codes are available at https://github.com/yhhhli/BRECQ.", "keywords": "Post Training Quantization;Mixed Precision;Second-order analysis", "primary_area": "", "supplementary_material": "", "author": "Yuhang Li;Ruihao Gong;Xu Tan;Yang Yang;Peng Hu;Qi Zhang;Fengwei Yu;Wei Wang;Shi Gu", "authorids": "~Yuhang_Li1;~Ruihao_Gong1;~Xu_Tan3;~Yang_Yang22;~Peng_Hu3;~Qi_Zhang15;~Fengwei_Yu1;~Wei_Wang3;~Shi_Gu1", "gender": "M;M;M;;M;M;M;M;", "homepage": ";https://xhplus.github.io;;;https://github.com/MisakaCloud;;https://forwil.xyz;https://www.comp.nus.edu.sg/cs/bio/wangwei/;https://nangongwubu.github.io/", "dblp": ";247/1172;;;;;188/5764;;175/1269", "google_scholar": "3UzXL-AAAAAJ;8i7Z15kAAAAJ;;;;;qzWfLRIAAAAJ;;9_jlOXUAAAAJ", "orcid": ";0000-0002-6024-7086;;;;;;;0000-0003-2303-6770", "linkedin": ";;%E6%97%AD-%E8%B0%AD-a19042143/;yang-yang-ab085b166/;;%E7%90%A6-%E5%BC%A0-365687179/;;;", "or_profile": "~Yuhang_Li1;~Ruihao_Gong1;~Xu_Tan3;~Yang_Yang22;~Peng_Hu3;~Qi_Zhang15;~Fengwei_Yu1;~Wei_Wang3;~Shi_Gu1", "aff": "University of Electronic Science and Technology of China;Beihang University;;Sensetime;Beihang University, Tsinghua University;;;National University of Singapore;University of Electronic Science and Technology of China, Tsinghua University", "aff_domain": "uestc.edu.cn;buaa.edu;;sensetime.com;buaa.edu.cn;;;nus.edu.sg;uestc.edu.cn", "position": "Research Assistant;MS student;;Software Engineer;MS student;;;Assistant Professor;Full Professor", "bibtex": "@inproceedings{\nli2021brecq,\ntitle={{\\{}BRECQ{\\}}: Pushing the Limit of Post-Training Quantization by Block Reconstruction},\nauthor={Yuhang Li and Ruihao Gong and Xu Tan and Yang Yang and Peng Hu and Qi Zhang and Fengwei Yu and Wei Wang and Shi Gu},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=POWv6hDd9XH}\n}", "github": "[![github](/images/github_icon.svg) yhhhli/BRECQ](https://github.com/yhhhli/BRECQ) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=POWv6hDd9XH)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "4;4;1;4", "wc_review": "382;322;184;158", "wc_reply_reviewers": "339;0;0;0", "wc_reply_authors": "1909;573;345;182", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 1.299038105676658 ], "wc_review_avg": [ 261.5, 93.40637023244186 ], "wc_reply_reviewers_avg": [ 84.75, 146.79130594146235 ], "wc_reply_authors_avg": [ 752.25, 682.1361209465454 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 511, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4375514065793876125&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=POWv6hDd9XH", "email": "uestc.edu.cn;buaa.edu;;sensetime.com;buaa.edu.cn;;;nus.edu.sg;uestc.edu.cn", "author_num": 9, "aff_unique_index": "0;1;2;1;3;0", "aff_unique_norm": "University of Electronic Science and Technology of China;Beihang University;SenseTime;National University of Singapore", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.uestc.edu.cn;http://www.buaa.edu.cn/;https://www.sensetime.com;https://www.nus.edu.sg", "aff_unique_abbr": "UESTC;BUAA;SenseTime;NUS", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;1;0", "aff_country_unique": "China;Singapore" }, { "title": "Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2869", "id": "PObuuGVrGaZ", "poster": "", "openreview": "https://openreview.net/forum?id=PObuuGVrGaZ", "slides": "https://iclr.cc/virtual/2021/poster/2869", "video": "https://iclr.cc/virtual/2021/poster/2869", "author_site": "Zhiqiang Shen, Zhiqiang Shen, Dejia Xu, Zitian Chen, Kwang-Ting Cheng, Marios Savvides", "tldr": "", "abstract": "This work aims to empirically clarify a recently discovered perspective that label smoothing is incompatible with knowledge distillation. We begin by introducing the motivation behind on how this incompatibility is raised, i.e., label smoothing erases relative information between teacher logits. We provide a novel connection on how label smoothing affects distributions of semantically similar and dissimilar classes. Then we propose a metric to quantitatively measure the degree of erased information in sample's representation. After that, we study its one-sidedness and imperfection of the incompatibility view through massive analyses, visualizations and comprehensive experiments on Image Classification, Binary Networks, and Neural Machine Translation. Finally, we broadly discuss several circumstances wherein label smoothing will indeed lose its effectiveness.", "keywords": "label smoothing;knowledge distillation;image classification;neural machine translation;binary neural networks", "primary_area": "", "supplementary_material": "", "author": "Zhiqiang Shen;Zechun Liu;Dejia Xu;Zitian Chen;Kwang-Ting Cheng;Marios Savvides", "authorids": "~Zhiqiang_Shen1;~Zechun_Liu1;~Dejia_Xu1;~Zitian_Chen1;~Kwang-Ting_Cheng1;~Marios_Savvides1", "gender": ";;M;M;;", "homepage": ";;https://ir1d.github.io;http://chenzt.net/;;", "dblp": ";;264/5685;218/6728;;13/3793", "google_scholar": ";;ET0e93cAAAAJ;n6rhKWQAAAAJ;;", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Zhiqiang_Shen1;~Zechun_Liu1;~Dejia_Xu1;~Zitian_Chen1;~Kwang-Ting_Cheng1;~Marios_Savvides1", "aff": ";;Peking University, Tsinghua University;University of Massachusetts, Amherst;;Carnegie Mellon University", "aff_domain": ";;pku.edu.cn;umass.edu;;cmu.edu", "position": ";;Undergrad student;PhD student;;Full Professor", "bibtex": "@inproceedings{\nshen2021is,\ntitle={Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study},\nauthor={Zhiqiang Shen and Zechun Liu and Dejia Xu and Zitian Chen and Kwang-Ting Cheng and Marios Savvides},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PObuuGVrGaZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "6;6;6;8", "confidence": "3;5;3;4", "wc_review": "258;416;415;355", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "586;1090;547;30", "reply_reviewers": "0;0;0;0", "reply_authors": "1;3;1;1", "rating_avg": [ 6.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 361.0, 64.39332263519255 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 563.25, 375.0342484360595 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 101, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9652270117877911638&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=PObuuGVrGaZ", "email": ";;pku.edu.cn;umass.edu;;cmu.edu", "author_num": 6, "aff_unique_index": "0;1;2", "aff_unique_norm": "Peking University;University of Massachusetts Amherst;Carnegie Mellon University", "aff_unique_dep": ";;", "aff_unique_url": "http://www.pku.edu.cn;https://www.umass.edu;https://www.cmu.edu", "aff_unique_abbr": "Peking U;UMass Amherst;CMU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Amherst", "aff_country_unique_index": "0;1;1", "aff_country_unique": "China;United States" }, { "id": "PP4KyAaBoBK", "title": "Human Perception-based Evaluation Criterion for Ultra-high Resolution Cell Membrane Segmentation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Computer vision technology is widely used in biological and medical data analysis and understanding. However, there are still two major bottlenecks in the field of cell membrane segmentation, which seriously hinder further research: lack of sufficient high-quality data and lack of suitable evaluation criteria. In order to solve these two problems, this paper first introduces an Ultra-high Resolution Image Segmentation dataset for the Cell membrane, called U-RISC, the largest annotated EM dataset for the Cell membrane with multiple iterative annotations and uncompressed high-resolution raw data. During the analysis process of the U-RISC, we found that the current popular segmentation evaluation criteria are inconsistent with human perception. This interesting phenomenon is confirmed by a subjective experiment involving twenty people. Furthermore, to resolve this inconsistency, we propose a Perceptual Hausdorff Distance (PHD) evaluation criterion to measure the quality of cell membrane segmentation results. Detailed performance comparison and discussion of classic segmentation methods along with two iterative manual annotation results under existing criteria and PHD is given.", "keywords": "Neuroscience;Connectomics;Human perception;EM dataset;Membrane segmentation;Evaluation criterion", "primary_area": "", "supplementary_material": "/attachment/65364fa40637265a3f77d714b61d9434bc474733.zip", "author": "Ruohua Shi;Wenyao Wang;Zhixuan Li;Liuyuan He;Kaiwen Sheng;Lei Ma;Kai Du;Tingting Jiang;Tiejun Huang", "authorids": "~Ruohua_Shi1;~Wenyao_Wang1;~Zhixuan_Li1;~Liuyuan_He1;~Kaiwen_Sheng1;~Lei_Ma3;~Kai_Du1;~Tingting_Jiang2;~Tiejun_Huang1", "gender": "F;M;M;;;Not Specified;;F;M", "homepage": "http://www.vie.group/media/page/ruohua_x14v64O_uMPdclW.html?logo=media/image/pic.png;;https://zhixuanli.github.io/;https://pkuml.org/;https://github.com/holmosaint;https://nbic.pku.edu.cn/rcdw/kyry/02c5f5ce8e254b1e82a48bebd0a24c33.htm;;http://www.vie.group/ttj;https://idm.pku.edu.cn/~tjhuang/", "dblp": ";;;;;20/6534-8;;72/2833-1;h/TiejunHuang", "google_scholar": ";GTAWfXUAAAAJ;SoWWEB8AAAAJ;;2Lf-0vUAAAAJ;;;p6RJZj0AAAAJ;https://scholar.google.com.tw/citations?user=knvEK4AAAAAJ", "orcid": ";0000-0001-9897-2761;0000-0002-2558-508X;;0000-0003-3523-5267;0000-0001-6024-3854;;0000-0002-5372-0656;0000-0002-4234-6099", "linkedin": ";;zhixuan-li-03b742b3/;;;maleiwhat/;;;", "or_profile": "~Ruohua_Shi1;~Wenyao_Wang1;~Zhixuan_Li1;~Liuyuan_He1;~Kaiwen_Sheng1;~Lei_Ma3;~Kai_Du1;~Tingting_Jiang2;~Tiejun_Huang1", "aff": "Peking University;;Peking University;Peking University, Tsinghua University;University College London, University of London;Beijing Academy of Artifical Intelligence;;School of Computer Science, Peking University;Institute of Computing Technology, Chinese Academy of Sciences", "aff_domain": "pku.edu.cn;;pku.edu.cn;pku.edu.cn;ucl.ac.uk;baai.ac.cn;;pku.edu.cn;ict.ac.cn", "position": "PhD student;;PhD student;PhD student;MS student;Principal Researcher;;Associate Professor;Postdoc", "bibtex": "@misc{\nshi2021human,\ntitle={Human Perception-based Evaluation Criterion for Ultra-high Resolution Cell Membrane Segmentation},\nauthor={Ruohua Shi and Wenyao Wang and Zhixuan Li and Liuyuan He and Kaiwen Sheng and Lei Ma and Kai Du and Tingting Jiang and Tiejun Huang},\nyear={2021},\nurl={https://openreview.net/forum?id=PP4KyAaBoBK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=PP4KyAaBoBK", "pdf_size": 0, "rating": "3;4;6;7", "confidence": "5;4;3;3", "wc_review": "435;322;3550;220", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "993;897;6579;249", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;11;1", "rating_avg": [ 5.0, 1.5811388300841898 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 1131.75, 1398.246826386529 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 2179.5, 2556.121035866651 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 4.0, 4.06201920231798 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 9, 0 ], "corr_rating_confidence": -0.9534625892455922, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3910218338781750026&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0;0;1;2;0;3", "aff_unique_norm": "Peking University;University College London;Beijing Academy of Artificial Intelligence;Chinese Academy of Sciences", "aff_unique_dep": ";;;Institute of Computing Technology", "aff_unique_url": "http://www.pku.edu.cn;https://www.ucl.ac.uk;https://www.baaic.cn;http://www.ict.ac.cn", "aff_unique_abbr": "Peking U;UCL;BAAI;CAS", "aff_campus_unique_index": "1", "aff_campus_unique": ";Beijing", "aff_country_unique_index": "0;0;0;1;0;0;0", "aff_country_unique": "China;United Kingdom" }, { "id": "PQ2Cel-1rJh", "title": "Pea-KD: Parameter-efficient and accurate Knowledge Distillation", "track": "main", "status": "Reject", "tldr": "", "abstract": "How can we efficiently compress a model while maintaining its performance? Knowledge Distillation (KD) is one of the widely known methods for model compression. In essence, KD trains a smaller student model based on a larger teacher model and tries to retain the teacher model's level of performance as much as possible. However, the existing KD methods suffer from the following limitations. First, since the student model is small in absolute size, it inherently lacks model complexity. Second, the absence of an initial guide for the student model makes it difficult for the student to imitate the teacher model to its fullest. Conventional KD methods yield low performance due to these limitations.\n\nIn this paper, we propose Pea-KD (Parameter-efficient and accurate Knowledge Distillation), a novel approach to KD. Pea-KD consists of two main parts: Shuffled Parameter Sharing (SPS) and Pretraining with Teacher's Predictions (PTP). Using this combination, we are capable of alleviating the KD's limitations. SPS is a new parameter sharing method that allows greater model complexity for the student model. PTP is a KD-specialized initialization method, which can act as a good initial guide for the student. When combined, this method yields a significant increase in student model's performance. Experiments conducted on different datasets and tasks show that the proposed approach improves the student model's performance by 4.4% on average in four GLUE tasks, outperforming existing KD baselines by significant margins.\n", "keywords": "BERT;Deep Learning;Natural Language Processing;Transformer;Knowledge Distillation;Parameter Sharing", "primary_area": "", "supplementary_material": "/attachment/6ba2955df8bb8f79fe91bfbfc55d8343c3e74e7a.zip", "author": "IKHYUN CHO;U Kang", "authorids": "~IKHYUN_CHO1;~U_Kang1", "gender": "M;M", "homepage": "http://datalab.snu.ac.kr/~ukang;https://ihcho2.github.io/", "dblp": "13/7122;", "google_scholar": "https://scholar.google.com/citations?hl=en;", "orcid": "0000-0002-8774-6950;", "linkedin": ";%EC%9D%B5%ED%98%84-%EC%A1%B0-1705571b8/", "or_profile": "~U_Kang1;~IKHYUN_CHO2", "aff": "Seoul National University;Department of Computer Science", "aff_domain": "snu.ac.kr;cs.illinois.edu", "position": "Full Professor;PhD student", "bibtex": "@misc{\ncho2021peakd,\ntitle={Pea-{\\{}KD{\\}}: Parameter-efficient and accurate Knowledge Distillation},\nauthor={IKHYUN CHO and U Kang},\nyear={2021},\nurl={https://openreview.net/forum?id=PQ2Cel-1rJh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=PQ2Cel-1rJh", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;3;4;3", "wc_review": "443;213;1350;162", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "938;884;1880;93", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;3;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 542.0, 478.3581294386038 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 948.75, 633.2145667149485 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:RCvLT2eDjpsJ:scholar.google.com/&scioq=Pea-KD:+Parameter-efficient+and+accurate+Knowledge+Distillation&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Seoul National University;Unknown Institution", "aff_unique_dep": ";Department of Computer Science", "aff_unique_url": "https://www.snu.ac.kr;", "aff_unique_abbr": "SNU;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0", "aff_country_unique": "South Korea;" }, { "id": "PQlC91XxqK5", "title": "Segmenting Natural Language Sentences via Lexical Unit Analysis", "track": "main", "status": "Reject", "tldr": "", "abstract": "In this work, we present Lexical Unit Analysis (LUA), a framework for general sequence segmentation tasks. Given a natural language sentence, LUA scores all the valid segmentation candidates and utilizes dynamic programming (DP) to extract the maximum scoring one. LUA enjoys a number of appealing properties such as inherently guaranteeing the predicted segmentation to be valid and facilitating globally optimal training and inference. Besides, the practical time complexity of LUA can be reduced to linear time, which is very efficient. We have conducted extensive experiments on 5 tasks, including syntactic chunking, named entity recognition (NER), slot filling, Chinese word segmentation, and Chinese part-of-speech (POS) tagging, across 15 datasets. Our models have achieved the state-of-the-art performances on 13 of them. The results also show that the F1 score of identifying long-length segments is notably improved.", "keywords": "Neural Sequence Labeling;Neural Sequence Segmentation;Dynamic Programming", "primary_area": "", "supplementary_material": "", "author": "Yangming Li;lemao liu;Shuming Shi", "authorids": "~Yangming_Li1;~lemao_liu1;~Shuming_Shi1", "gender": ";M;M", "homepage": ";https://lemaoliu.github.io/homepage/;", "dblp": ";41/10887.html;s/ShumingShi", "google_scholar": ";;Lg31AKMAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Yangming_Li1;~lemao_liu1;~Shuming_Shi1", "aff": ";Tencent;Tencent AI Lab", "aff_domain": ";tencent.com;tencent.com", "position": ";Researcher;Principal Researcher", "bibtex": "@misc{\nli2021segmenting,\ntitle={Segmenting Natural Language Sentences via Lexical Unit Analysis},\nauthor={Yangming Li and lemao liu and Shuming Shi},\nyear={2021},\nurl={https://openreview.net/forum?id=PQlC91XxqK5}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=PQlC91XxqK5", "pdf_size": 0, "rating": "5;6;7", "confidence": "3;5;5", "wc_review": "147;226;249", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "234;298;316", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 4.333333333333333, 0.9428090415820634 ], "wc_review_avg": [ 207.33333333333334, 43.68320297576887 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 282.6666666666667, 35.188381921057726 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1430713671291477088&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "Tencent", "aff_unique_dep": "Tencent Holdings Limited", "aff_unique_url": "https://www.tencent.com", "aff_unique_abbr": "Tencent", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "PRr_3HPakQ", "title": "Learning to Generate Questions by Recovering Answer-containing Sentences", "track": "main", "status": "Reject", "tldr": "", "abstract": "To train a question answering model based on machine reading comprehension (MRC), significant effort is required to prepare annotated training data composed of questions and their answers from contexts. To mitigate this issue, recent research has focused on synthetically generating a question from a given context and an annotated (or generated) answer by training an additional generative model, which can be utilized to augment the training data. In light of this research direction, we propose a novel pre-training approach that learns to generate contextually rich questions, by recovering answer-containing sentences. Our approach is composed of two novel components, (1) dynamically determining K answers from a given document and (2) pre-training the question generator on the task of generating the answer-containing sentence. We evaluate our method against existing ones in terms of the quality of generated questions as well as the fine-tuned MRC model accuracy after training on the data synthetically generated by our method. Experimental results demonstrate that our approach consistently improves the question generation capability of existing models such as T5 and UniLM, and shows state-of-the-art results on MS MARCO and NewsQA, and comparable results to the state-of-the-art on SQuAD. Additionally, we demonstrate that the data synthetically generated by our approach is beneficial for boosting up the downstream MRC accuracy across a wide range of datasets, such as SQuAD-v1.1, v2.0, and KorQuAD, without any modification to the existing MRC models. Furthermore, our experiments highlight that our method shines especially when a limited amount of training data is given, in terms of both pre-training and downstream MRC data.", "keywords": "Question Generation;Question Answering;Data Augmentation;Machine Reading Comprehension", "primary_area": "", "supplementary_material": "/attachment/873dd8c2b9a7ba2adb6f0c0f84e1184c3a5554db.zip", "author": "Seohyun Back;Akhil Kedia;Sai Chetan Chinthakindi;Haejun Lee;Jaegul Choo", "authorids": "~Seohyun_Back1;~Akhil_Kedia1;~Sai_Chetan_Chinthakindi1;haejun82.lee@samsung.com;~Jaegul_Choo1", "gender": "M;;M;;M", "homepage": "https://becxer.github.io/;;;;https://sites.google.com/site/jaegulchoo/", "dblp": "223/2549;264/2699;;;07/2074", "google_scholar": "A1mZQ5cAAAAJ;VvLIqCcAAAAJ;;;GHJYsLEAAAAJ", "orcid": ";;;;", "linkedin": ";;sai-chetan-c/;;", "or_profile": "~Seohyun_Back1;~Akhil_Kedia1;~Sai_Chetan_Chinthakindi1;haejun82.lee@samsung.com;~Jaegul_Choo1", "aff": "Korea Advanced Institute of Science & Technology;Samsung;Samsung;;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;samsung.com;samsung.com;;kaist.ac.kr", "position": "PhD student;Researcher;Researcher;;Associate Professor", "bibtex": "@misc{\nback2021learning,\ntitle={Learning to Generate Questions by Recovering Answer-containing Sentences},\nauthor={Seohyun Back and Akhil Kedia and Sai Chetan Chinthakindi and Haejun Lee and Jaegul Choo},\nyear={2021},\nurl={https://openreview.net/forum?id=PRr_3HPakQ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=PRr_3HPakQ", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;4;4;4", "wc_review": "444;476;357;602", "wc_reply_reviewers": "0;97;0;101", "wc_reply_authors": "736;1098;383;1031", "reply_reviewers": "0;1;0;1", "reply_authors": "2;4;1;3", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 469.75, 87.8987343481122 ], "wc_reply_reviewers_avg": [ 49.5, 49.52019789944301 ], "wc_reply_authors_avg": [ 812.0, 282.65438259471586 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.5, 1.118033988749895 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Bdp08YUFjs0J:scholar.google.com/&scioq=Learning+to+Generate+Questions+by+Recovering+Answer-containing+Sentences&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;1;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology;Samsung", "aff_unique_dep": ";Samsung", "aff_unique_url": "https://www.kaist.ac.kr;https://www.samsung.com", "aff_unique_abbr": "KAIST;Samsung", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "South Korea" }, { "title": "Learning to Recombine and Resample Data For Compositional Generalization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2687", "id": "PS3IMnScugk", "poster": "", "openreview": "https://openreview.net/forum?id=PS3IMnScugk", "slides": "https://iclr.cc/virtual/2021/poster/2687", "video": "https://iclr.cc/virtual/2021/poster/2687", "author_site": "Ekin Aky\u00fcrek, Afra Feyza Aky\u00fcrek, Jacob Andreas", "tldr": "", "abstract": "Flexible neural sequence models outperform grammar- and automaton-based counterparts on a variety of tasks. However, neural models perform poorly in settings requiring compositional generalization beyond the training data\u2014particularly to rare or unseen subsequences. Past work has found symbolic scaffolding (e.g. grammars or automata) essential in these settings. We describe R&R, a learned data augmentation scheme that enables a large category of compositional generalizations without appeal to latent symbolic structure. R&R has two components: recombination of original training examples via a prototype-based generative model and resampling of generated examples to encourage extrapolation. Training an ordinary neural sequence model on a dataset augmented with recombined and resampled examples significantly improves generalization in two language processing problems\u2014instruction following (SCAN) and morphological analysis (SIGMORPHON 2018)\u2014where R&R enables learning of new constructions and tenses from as few as eight initial examples.", "keywords": "compositional generalization;data augmentation;language processing;sequence models;generative modeling", "primary_area": "", "supplementary_material": "/attachment/1c9bdb20af7c90ee5f3e58962941eb2c3a1ebed9.zip", "author": "Ekin Aky\u00fcrek;Afra Feyza Aky\u00fcrek;Jacob Andreas", "authorids": "~Ekin_Aky\u00fcrek1;~Afra_Feyza_Aky\u00fcrek1;~Jacob_Andreas1", "gender": "M;F;M", "homepage": "http://web.mit.edu/jda/www;https://feyzaakyurek.github.io;https://www.ekinakyurek.me/", "dblp": "97/8154;268/0913.html;216/3446", "google_scholar": "dnZ8udEAAAAJ;https://scholar.google.com/citations?hl=en;FQHeASwAAAAJ", "orcid": ";;0000-0002-5166-4689", "linkedin": ";afrafeyzaakyurek/;", "or_profile": "~Jacob_Andreas1;~Afra_Feyza_Akyurek1;~EKIN_AKYUREK1", "aff": "Microsoft;Boston University;Massachusetts Institute of Technology", "aff_domain": "microsoft.com;bu.edu;mit.edu", "position": "Researcher;PhD student;PhD student", "bibtex": "@inproceedings{\naky{\\\"u}rek2021learning,\ntitle={Learning to Recombine and Resample Data For Compositional Generalization},\nauthor={Ekin Aky{\\\"u}rek and Afra Feyza Aky{\\\"u}rek and Jacob Andreas},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PS3IMnScugk}\n}", "github": "[![github](/images/github_icon.svg) ekinakyurek/compgen](https://github.com/ekinakyurek/compgen)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "3;3;4;4", "wc_review": "524;386;467;260", "wc_reply_reviewers": "72;0;0;0", "wc_reply_authors": "348;165;299;243", "reply_reviewers": "1;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 409.25, 99.14478049801714 ], "wc_reply_reviewers_avg": [ 18.0, 31.176914536239792 ], "wc_reply_authors_avg": [ 263.75, 68.04915502781795 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.7071067811865476, "gs_citation": 106, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=16034423626440720931&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=PS3IMnScugk", "email": "microsoft.com;bu.edu;mit.edu", "author_num": 3, "aff_unique_index": "0;1;2", "aff_unique_norm": "Microsoft;Boston University;Massachusetts Institute of Technology", "aff_unique_dep": "Microsoft Corporation;;", "aff_unique_url": "https://www.microsoft.com;https://www.bu.edu;https://web.mit.edu", "aff_unique_abbr": "Microsoft;BU;MIT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "PTG9NdIn3wt", "title": "Neural Text Classification by Jointly Learning to Cluster and Align", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Distributional text clustering delivers semantically informative representations and captures the relevance between each word and semantic clustering centroids. We extend the neural text clustering approach to text classification tasks by inducing cluster centers via a latent variable model and interacting with distributional word embeddings, to enrich the representation of tokens and measure the relatedness between tokens and each learnable cluster centroid. The proposed method jointly learns word clustering centroids and clustering-token alignments, achieving the state of the art results on multiple benchmark datasets and proving that the proposed cluster-token alignment mechanism is indeed favorable to text classification. Notably, our qualitative analysis has conspicuously illustrated that text representations learned by the proposed model are in accord well with our intuition.", "keywords": "text clustering;text classification;latent variable model", "primary_area": "", "supplementary_material": "", "author": "Yekun Chai;Haidong Zhang;Shuo Jin", "authorids": "~Yekun_Chai1;haidong_zhang14@yahoo.com;shj42@pitt.edu", "gender": "M;;", "homepage": "https://cyk1337.github.io/;;", "dblp": "252/0188;;", "google_scholar": "P0NRuRYAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Yekun_Chai1;haidong_zhang14@yahoo.com;shj42@pitt.edu", "aff": "Institute of automation, Chinese academy of sciences;;", "aff_domain": "ia.ac.cn;;", "position": "Researcher;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=PTG9NdIn3wt", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "4;3;2;3", "wc_review": "1321;173;309;211", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 503.5, 474.5848185519634 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3615309784558147386&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "Chinese Academy of Sciences", "aff_unique_dep": "Institute of Automation", "aff_unique_url": "http://www.ia.cas.cn", "aff_unique_abbr": "CAS", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "PU35uLgRZkk", "title": "The Skill-Action Architecture: Learning Abstract Action Embeddings for Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": " The option framework, one of the most promising Hierarchical Reinforcement Learning (HRL) frameworks, is developed based on the Semi-Markov Decision Problem (SMDP) and employs a triple formulation of the option (i.e., an action policy, a termination probability, and an initiation set). These design choices, however, mean that the option framework: 1) has low sample efficiency, 2) cannot use more stable Markov Decision Problem (MDP) based learning algorithms, 3) represents abstract actions implicitly, and 4) is expensive to scale up. To overcome these problems, here we propose a simple yet effective MDP implementation of the option framework: the Skill-Action (SA) architecture. Derived from a novel discovery that the SMDP option framework has an MDP equivalence, SA hierarchically extracts skills (abstract actions) from primary actions and explicitly encodes these knowledge into skill context vectors (embedding vectors). Although SA is MDP formulated, skills can still be temporally extended by applying the attention mechanism to skill context vectors. Unlike the option framework, which requires $M$ action policies for $M$ skills, SA's action policy only needs one decoder to decode skill context vectors into primary actions. Under this formulation, SA can be optimized with any MDP based policy gradient algorithm. Moreover, it is sample efficient, cheap to scale up, and theoretically proven to have lower variance. Our empirical studies on challenging infinite horizon robot simulation environments demonstrate that SA not only outperforms all baselines by a large margin, but also exhibits smaller variance, faster convergence, and good interpretability. On transfer learning tasks, SA also outperforms the other models and shows its advantage on reusing knowledge across tasks. A potential impact of SA is to pave the way for a large scale pre-training architecture in the reinforcement learning area.", "keywords": "Hierarchical Reinforcement Learning;Reinforcement Learning", "primary_area": "", "supplementary_material": "/attachment/e662fafee2e9053da57a53b2b074304c8dd31d84.zip", "author": "Chang Li;Dongjin Song;Dacheng Tao", "authorids": "~Chang_Li5;~Dongjin_Song2;~Dacheng_Tao1", "gender": "M;M;", "homepage": "https://github.com/spacegoing;https://songdj.github.io/;", "dblp": ";41/3281;", "google_scholar": ";BJdHw6AAAAAJ;", "orcid": "0000-0002-9295-1254;;", "linkedin": ";;", "or_profile": "~Chang_Li5;~Dongjin_Song2;~Dacheng_Tao1", "aff": "University of Sydney;University of Connecticut;", "aff_domain": "usyd.edu.au;uconn.edu;", "position": "PhD student;Assistant Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=PU35uLgRZkk", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;4;3", "wc_review": "457;792;355", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "760;517;378", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 534.6666666666666, 186.6660714276224 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 551.6666666666666, 157.86562499656333 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12646448755557079909&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "University of Sydney;University of Connecticut", "aff_unique_dep": ";", "aff_unique_url": "https://www.sydney.edu.au;https://www.uconn.edu", "aff_unique_abbr": "USYD;UConn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "Australia;United States" }, { "title": "Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2592", "id": "PULSD5qI2N1", "poster": "", "openreview": "https://openreview.net/forum?id=PULSD5qI2N1", "slides": "https://iclr.cc/virtual/2021/poster/2592", "video": "https://iclr.cc/virtual/2021/poster/2592", "author_site": "Atsushi Nitanda, Taiji Suzuki", "tldr": "", "abstract": "We analyze the convergence of the averaged stochastic gradient descent for overparameterized two-layer neural networks for regression problems. It was recently found that a neural tangent kernel (NTK) plays an important role in showing the global convergence of gradient-based methods under the NTK regime, where the learning dynamics for overparameterized neural networks can be almost characterized by that for the associated reproducing kernel Hilbert space (RKHS). However, there is still room for a convergence rate analysis in the NTK regime. In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK. Moreover, we show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate through a smooth approximation of a ReLU network under certain conditions.", "keywords": "stochastic gradient descent;two-layer neural network;over-parameterization;neural tangent kernel", "primary_area": "", "supplementary_material": "", "author": "Atsushi Nitanda;Taiji Suzuki", "authorids": "~Atsushi_Nitanda1;~Taiji_Suzuki1", "gender": "M;M", "homepage": "https://sites.google.com/site/atsushinitanda;http://ibis.t.u-tokyo.ac.jp/suzuki/", "dblp": "155/1884;08/312", "google_scholar": "https://scholar.google.co.jp/citations?user=LyVvaf8AAAAJ;x8osrBsAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Atsushi_Nitanda1;~Taiji_Suzuki1", "aff": "The University of Tokyo;The University of Tokyo", "aff_domain": "u-tokyo.ac.jp;tokyo.ac.jp", "position": "Assistant Professor;Associate Professor", "bibtex": "@inproceedings{\nnitanda2021optimal,\ntitle={Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime},\nauthor={Atsushi Nitanda and Taiji Suzuki},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PULSD5qI2N1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer5", "pdf_size": 0, "rating": "7;7;8;8;8", "confidence": "3;2;2;4;5", "wc_review": "204;276;492;450;733", "wc_reply_reviewers": "0;0;14;0;0", "wc_reply_authors": "263;458;602;316;678", "reply_reviewers": "0;0;1;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 7.6, 0.48989794855663565 ], "confidence_avg": [ 3.2, 1.16619037896906 ], "wc_review_avg": [ 431.0, 184.84588175017586 ], "wc_reply_reviewers_avg": [ 2.8, 5.6 ], "wc_reply_authors_avg": [ 463.4, 159.48617494942937 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.4900980294098033, "gs_citation": 59, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2381348531588236955&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=PULSD5qI2N1", "email": "u-tokyo.ac.jp;tokyo.ac.jp", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Tokyo", "aff_unique_dep": "", "aff_unique_url": "https://www.u-tokyo.ac.jp", "aff_unique_abbr": "UTokyo", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "title": "Discovering a set of policies for the worst case reward", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2571", "id": "PUkhWz65dy5", "poster": "", "openreview": "https://openreview.net/forum?id=PUkhWz65dy5", "slides": "https://iclr.cc/virtual/2021/poster/2571", "video": "https://iclr.cc/virtual/2021/poster/2571", "author_site": "Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan ODonoghue, Iurii Kemaev, Satinder Singh", "tldr": "", "abstract": "We study the problem of how to construct a set of policies that can be composed together to solve a collection of reinforcement learning tasks. Each task is a different reward function defined as a linear combination of known features. We consider a specific class of policy compositions which we call set improving policies (SIPs): given a set of policies and a set of tasks, a SIP is any composition of the former whose performance is at least as good as that of its constituents across all the tasks. We focus on the most conservative instantiation of SIPs, set-max policies (SMPs), so our analysis extends to any SIP. This includes known policy-composition operators like generalized policy improvement. Our main contribution is an algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks. The algorithm works by successively adding new policies to the set. We show that the worst-case performance of the resulting SMP strictly improves at each iteration, and the algorithm only stops when there does not exist a policy that leads to improved performance. We empirically evaluate our algorithm on a grid world and also on a set of domains from the DeepMind control suite. We confirm our theoretical results regarding the monotonically improving performance of our algorithm. Interestingly, we also show empirically that the sets of policies computed by the algorithm are diverse, leading to different trajectories in the grid world and very distinct locomotion skills in the control suite.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/10c59ebfc70b17c874b0ad6e25e337116fb6c1ea.zip", "author": "Tom Zahavy;Andre Barreto;Daniel J Mankowitz;Shaobo Hou;Brendan O'Donoghue;Iurii Kemaev;Satinder Singh", "authorids": "~Tom_Zahavy2;~Andre_Barreto1;~Daniel_J_Mankowitz2;~Shaobo_Hou1;~Brendan_O'Donoghue1;iukemaev@google.com;~Satinder_Singh2", "gender": "M;M;;M;;;", "homepage": "http://tomzahavy.wixsite.com/zahavy;https://sites.google.com/corp/view/andrebarreto/about;;;;;", "dblp": "149/0142;72/953;;78/6677;;;", "google_scholar": "https://scholar.google.co.il/citations?user=9dXN6cMAAAAJ;https://scholar.google.co.uk/citations?user=H-xtdV4AAAAJ;;;;;", "orcid": ";;;;;;", "linkedin": "tomzahavy/;;;;;;", "or_profile": "~Tom_Zahavy2;~Andre_Barreto1;~Daniel_J_Mankowitz2;~Shaobo_Hou1;~Brendan_O'Donoghue1;iukemaev@google.com;~Satinder_Singh2", "aff": "Google DeepMind;Google DeepMind;;Google DeepMind;;;", "aff_domain": "deepmind.com;google.com;;deepmind.com;;;", "position": "Research Scientist;Research Scientist;;Research Engineer;;;", "bibtex": "@inproceedings{\nzahavy2021discovering,\ntitle={Discovering a set of policies for the worst case reward},\nauthor={Tom Zahavy and Andre Barreto and Daniel J Mankowitz and Shaobo Hou and Brendan O'Donoghue and Iurii Kemaev and Satinder Singh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PUkhWz65dy5}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "6;7;7;8", "confidence": "4;4;4;3", "wc_review": "392;528;707;500", "wc_reply_reviewers": "365;298;70;29", "wc_reply_authors": "1205;1780;797;637", "reply_reviewers": "2;2;1;1", "reply_authors": "3;3;1;1", "rating_avg": [ 7.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 531.75, 113.20860170499414 ], "wc_reply_reviewers_avg": [ 190.5, 143.70890717001504 ], "wc_reply_authors_avg": [ 1104.75, 441.45009627363316 ], "reply_reviewers_avg": [ 1.5, 0.5 ], "reply_authors_avg": [ 2.0, 1.0 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 29, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13234238852855033409&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=PUkhWz65dy5", "email": "deepmind.com;google.com;;deepmind.com;;;", "author_num": 7, "aff_unique_index": "0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google DeepMind", "aff_unique_url": "https://deepmind.com", "aff_unique_abbr": "DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United Kingdom" }, { "id": "PXDdWQDBsCG", "title": "Shape Defense", "track": "main", "status": "Reject", "tldr": "", "abstract": "Humans rely heavily on shape information to recognize objects. Conversely, convolutional\nneural networks (CNNs) are biased more towards texture. This fact\nis perhaps the main reason why CNNs are susceptible to adversarial examples.\nHere, we explore how shape bias can be incorporated into CNNs to improve their\nrobustness. Two algorithms are proposed, based on the observation that edges are\ninvariant to moderate imperceptible perturbations. In the first one, a classifier is\nadversarially trained on images with the edge map as an additional channel. At\ninference time, the edge map is recomputed and concatenated to the image. In the\nsecond algorithm, a conditional GAN is trained to translate the edge maps, from\nclean and/or perturbed images, into clean images. The inference is done over the\ngenerated image corresponding to the input\u2019s edge map. A large number of experiments\nwith more than 10 data sets have proved the effectiveness of the proposed\nalgorithms against FGSM and `1 PGD-40 attacks. against FGSM and `$\\ell_\\infty$ PGD-40 attacks. \nFurther, we show that edge information can a) benefit other adversarial training methods, b) be even more effective\nin conjunction with background subtraction, c) be used to defend against poisoning\nattacks, and d) make CNNs more robust against natural image corruptions\nsuch as motion blur, impulse noise, and JPEG compression, than CNNs trained\nsolely on RGB images. From a broader perspective, our study suggests that CNNs\ndo not adequately account for image structures and operations that are crucial for\nrobustness. The code is available at: https://github.com/[masked].", "keywords": "adversarial robustness;adversarial defense;adversarial attack;shape;background subtraction", "primary_area": "", "supplementary_material": "", "author": "ali borji", "authorids": "~ali_borji1", "gender": "M", "homepage": "https://scholar.google.com.tw/citations?user=7jTNT1IAAAAJ", "dblp": "49/6311", "google_scholar": "7jTNT1IAAAAJ", "orcid": "", "linkedin": "ali-borji-5736433a/", "or_profile": "~ali_borji1", "aff": "PrimerAI", "aff_domain": "primer.ai", "position": "ML Engineer", "bibtex": "@misc{\nborji2021shape,\ntitle={Shape Defense},\nauthor={ali borji},\nyear={2021},\nurl={https://openreview.net/forum?id=PXDdWQDBsCG}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=PXDdWQDBsCG", "pdf_size": 0, "rating": "3;4;5;6", "confidence": "4;4;4;4", "wc_review": "413;112;474;637", "wc_reply_reviewers": "367;33;192;70", "wc_reply_authors": "1525;664;1457;1040", "reply_reviewers": "1;1;1;1", "reply_authors": "3;4;3;3", "rating_avg": [ 4.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 409.0, 190.02236710450694 ], "wc_reply_reviewers_avg": [ 165.5, 130.3658314129895 ], "wc_reply_authors_avg": [ 1171.5, 346.88650881808593 ], "reply_reviewers_avg": [ 1.0, 0.0 ], "reply_authors_avg": [ 3.25, 0.4330127018922193 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6306312237537040324&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "PrimerAI", "aff_unique_dep": "", "aff_unique_url": "https://www.primer.ai", "aff_unique_abbr": "PrimerAI", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "PXedDe28hWH", "title": "Learning to Learn with Smooth Regularization", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Recent decades have witnessed great prosperity of deep learning in tackling various problems such as classification and decision making. The rapid development stimulates a novel framework, Learning-to-Learn (L2L), in which an automatic optimization algorithm (optimizer) modeled by neural networks is expected to learn rules for updating the target objective function (optimizee). Despite its advantages for specific problems, L2L still cannot replace classic methods due to its instability. Unlike hand-engineered algorithms, neural optimizers may suffer from the instability issue---when provided with similar states (a combination of some metrics to describe the optimizee), the same neural optimizer can produce quite different updates. Motivated by the stability property that should be satisfied by an ideal optimizer, we propose a regularization term that can enforce the smoothness and stability of the learned neural optimizers. Comprehensive experiments on the neural network training tasks demonstrate that the proposed regularization consistently improve the learned neural optimizers even when transferring to tasks with different architectures and data. Furthermore, we show that our regularizer can improve the performance of neural optimizers on few-shot learning tasks. ", "keywords": "learning to learn;neural optimizer", "primary_area": "", "supplementary_material": "", "author": "Yuanhao Xiong;Cho-Jui Hsieh", "authorids": "~Yuanhao_Xiong1;~Cho-Jui_Hsieh1", "gender": "M;M", "homepage": "https://xyh97.github.io/;http://web.cs.ucla.edu/~chohsieh/index.html", "dblp": "232/1248;14/2770", "google_scholar": "DVKxiMkAAAAJ;Wy89g4IAAAAJ", "orcid": ";", "linkedin": ";", "or_profile": "~Yuanhao_Xiong1;~Cho-Jui_Hsieh1", "aff": "University of California, Los Angeles;University of California, Los Angeles", "aff_domain": "cs.ucla.edu;ucla.edu", "position": "PhD student;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=PXedDe28hWH", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "4;3;4;1", "wc_review": "834;235;227;83", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 3.0, 1.224744871391589 ], "wc_review_avg": [ 344.75, 288.87226848557134 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Cj5OYZIbr2IJ:scholar.google.com/&scioq=Learning+to+Learn+with+Smooth+Regularization&hl=en&as_sdt=0,33", "gs_version_total": 4, "aff_unique_index": "0;0", "aff_unique_norm": "University of California, Los Angeles", "aff_unique_dep": "", "aff_unique_url": "https://www.ucla.edu", "aff_unique_abbr": "UCLA", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "PYAFKBc8GL4", "title": "Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies", "track": "main", "status": "Reject", "tldr": "", "abstract": "Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Several works have analyzed the convergence of federated learning by accounting of data heterogeneity, communication and computation limitations, and partial client participation. However, they assume unbiased client participation, where clients are selected at random or in proportion of their data sizes. In this paper, we present the first convergence analysis of federated optimization for biased client selection strategies, and quantify how the selection bias affects convergence speed. We reveal that biasing client selection towards clients with higher local loss achieves faster error convergence. Using this insight, we propose Power-of-Choice, a communication- and computation-efficient client selection framework that can flexibly span the trade-off between convergence speed and solution bias. We also propose an extension of Power-of-Choice that is able to maintain convergence speed improvement while diminishing the selection skew. Our experiments demonstrate that Power-of-Choice strategies converge up to 3 $\\times$ faster and give $10$% higher test accuracy than the baseline random selection. ", "keywords": "distributed optimization;federated learning;client selection", "primary_area": "", "supplementary_material": "/attachment/7c2231f55109f4aa2f122b294c879154da845932.zip", "author": "Yae Jee Cho;Jianyu Wang;Gauri Joshi", "authorids": "~Yae_Jee_Cho1;~Jianyu_Wang2;~Gauri_Joshi1", "gender": "F;M;", "homepage": "https://yaejeec.github.io/;;", "dblp": "179/2081;;", "google_scholar": "https://scholar.google.co.kr/citations?user=MR333jsAAAAJ;5nrx1YwAAAAJ;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Yae_Jee_Cho1;~Jianyu_Wang2;~Gauri_Joshi1", "aff": "Carnegie Mellon University;Carnegie Mellon University;", "aff_domain": "cmu.edu;andrew.cmu.edu;", "position": "PhD student;PhD student;", "bibtex": "@misc{\ncho2021client,\ntitle={Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies},\nauthor={Yae Jee Cho and Jianyu Wang and Gauri Joshi},\nyear={2021},\nurl={https://openreview.net/forum?id=PYAFKBc8GL4}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=PYAFKBc8GL4", "pdf_size": 0, "rating": "4;6;6;6", "confidence": "4;3;4;4", "wc_review": "878;600;292;208", "wc_reply_reviewers": "528;0;244;0", "wc_reply_authors": "2482;1261;1236;425", "reply_reviewers": "2;0;2;0", "reply_authors": "5;2;3;1", "rating_avg": [ 5.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 494.5, 265.184369825976 ], "wc_reply_reviewers_avg": [ 193.0, 217.55688911179072 ], "wc_reply_authors_avg": [ 1351.0, 734.5001701837788 ], "reply_reviewers_avg": [ 1.0, 1.0 ], "reply_authors_avg": [ 2.75, 1.479019945774904 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 554, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9844365595954115912&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;0", "aff_unique_norm": "Carnegie Mellon University", "aff_unique_dep": "", "aff_unique_url": "https://www.cmu.edu", "aff_unique_abbr": "CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "P__qBPffIlK", "title": "Adversarial representation learning for synthetic replacement of private attributes", "track": "main", "status": "Reject", "tldr": "", "abstract": "Data privacy is an increasingly important aspect of many real-world big data analytics tasks. Data sources that contain sensitive information may have immense potential which could be unlocked using privacy enhancing transformations, but current methods often fail to produce convincing output. Furthermore, finding the right balance between privacy and utility is often a tricky trade-off. In this work, we propose a novel approach for data privatization, which involves two steps: in the first step, it removes the sensitive information, and in the second step, it replaces this information with an independent random sample. Our method builds on adversarial representation learning which ensures strong privacy by training the model to fool an increasingly strong adversary. While previous methods only aim at obfuscating the sensitive information, we find that adding new random information in its place strengthens the provided privacy and provides better utility at any given level of privacy. The result is an approach that can provide stronger privatization on image data, and yet be preserving both the domain and the utility of the inputs, entirely independent of the downstream task.", "keywords": "Deep learning;privacy;generative adversarial networks", "primary_area": "", "supplementary_material": "/attachment/271f5450bc878529d51fc4fea15edb059f5b8155.zip", "author": "John Martinsson;Edvin Listo Zec;Daniel Gillblad;Olof Mogren", "authorids": "~John_Martinsson1;~Edvin_Listo_Zec1;~Daniel_Gillblad1;~Olof_Mogren1", "gender": "M;M;;M", "homepage": "https://johnmartinsson.org;https://edvinli.github.io/;;http://mogren.one/", "dblp": "224/2647;;48/5973;", "google_scholar": "https://scholar.google.se/citations?user=sAMIwlMAAAAJ;https://scholar.google.se/citations?user=Ft52aSsAAAAJ;;https://scholar.google.com/citations?hl=en", "orcid": "0000-0002-5032-4367;;;", "linkedin": "john-martinsson-2541b772/;edvin-listo-zec/;;", "or_profile": "~John_Martinsson1;~Edvin_Listo_Zec1;~Daniel_Gillblad1;~Olof_Mogren1", "aff": "RISE Research Institutes of Sweden;KTH Royal Institute of Technology;AI Sweden;RISE Research Institutes of Sweden", "aff_domain": "ri.se;kth.se;ai.se;ri.se", "position": "Researcher;PhD student;co-Director;Researcher", "bibtex": "@misc{\nmartinsson2021adversarial,\ntitle={Adversarial representation learning for synthetic replacement of private attributes},\nauthor={John Martinsson and Edvin Listo Zec and Daniel Gillblad and Olof Mogren},\nyear={2021},\nurl={https://openreview.net/forum?id=P__qBPffIlK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=P__qBPffIlK", "pdf_size": 0, "rating": "4;5;5", "confidence": "4;3;2", "wc_review": "205;268;261", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "372;526;392", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.666666666666667, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 244.66666666666666, 28.193773938387338 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 430.0, 68.37153403768755 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17419831177786009182&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;1;2;0", "aff_unique_norm": "RISE Research Institutes of Sweden;KTH Royal Institute of Technology;AI Sweden", "aff_unique_dep": ";;", "aff_unique_url": "https://www.rise.se;https://www.kth.se;https://www.aisweden.org", "aff_unique_abbr": "RISE;KTH;AI Sweden", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Sweden" }, { "title": "Byzantine-Resilient Non-Convex Stochastic Gradient Descent", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3312", "id": "PbEHqvFtcS", "poster": "", "openreview": "https://openreview.net/forum?id=PbEHqvFtcS", "slides": "https://iclr.cc/virtual/2021/poster/3312", "video": "https://iclr.cc/virtual/2021/poster/3312", "author_site": "Zeyuan Allen-Zhu, Faeze Ebrahimianghazani, Jerry Li, Dan Alistarh", "tldr": "", "abstract": "We study adversary-resilient stochastic distributed optimization, in which $m$ machines can independently compute stochastic gradients, and cooperate to jointly optimize over their local objective functions. However, an $\\alpha$-fraction of the machines are Byzantine, in that they may behave in arbitrary, adversarial ways. We consider a variant of this procedure in the challenging non-convex case. Our main result is a new algorithm SafeguardSGD, which can provably escape saddle points and find approximate local minima of the non-convex objective. The algorithm is based on a new concentration filtering technique, and its sample and time complexity bounds match the best known theoretical bounds in the stochastic, distributed setting when no Byzantine machines are present. \n\nOur algorithm is very practical: it improves upon the performance of all prior methods when training deep neural networks, it is relatively lightweight, and it is the first method to withstand two recently-proposed Byzantine attacks. ", "keywords": "distributed machine learning;distributed deep learning;robust deep learning;non-convex optimization;Byzantine resilience", "primary_area": "", "supplementary_material": "", "author": "Zeyuan Allen-Zhu;Faeze Ebrahimianghazani;Jerry Li;Dan Alistarh", "authorids": "~Zeyuan_Allen-Zhu1;faezeeb75@gmail.com;~Jerry_Li1;~Dan_Alistarh7", "gender": ";;M;", "homepage": ";;https://jerryzli.github.io/;", "dblp": ";;;", "google_scholar": ";;4zybTq4AAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Zeyuan_Allen-Zhu1;faezeeb75@gmail.com;~Jerry_Li1;~Dan_Alistarh7", "aff": ";;Microsoft;", "aff_domain": ";;microsoft.com;", "position": ";;Senior Researcher;", "bibtex": "@inproceedings{\nallen-zhu2021byzantineresilient,\ntitle={Byzantine-Resilient Non-Convex Stochastic Gradient Descent},\nauthor={Zeyuan Allen-Zhu and Faeze Ebrahimianghazani and Jerry Li and Dan Alistarh},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PbEHqvFtcS}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "4;2;3;4", "wc_review": "429;411;683;601", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "781;357;821;838", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 531.0, 114.8999564838908 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 699.25, 198.6786035284122 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.1348399724926484, "gs_citation": 89, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4030056767885226760&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=PbEHqvFtcS", "email": ";;microsoft.com;", "author_num": 4, "aff_unique_index": "0", "aff_unique_norm": "Microsoft", "aff_unique_dep": "Microsoft Corporation", "aff_unique_url": "https://www.microsoft.com", "aff_unique_abbr": "Microsoft", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Orthogonalizing Convolutional Layers with the Cayley Transform", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3293", "id": "Pbj8H_jEHYv", "poster": "", "openreview": "https://openreview.net/forum?id=Pbj8H_jEHYv", "slides": "https://iclr.cc/virtual/2021/poster/3293", "video": "https://iclr.cc/virtual/2021/poster/3293", "author_site": "Asher Trockman, Zico Kolter", "tldr": "", "abstract": "Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipschitz constants. Although numerous methods exist for enforcing the orthogonality of fully-connected layers, those for convolutional layers are more heuristic in nature, often focusing on penalty methods or limited classes of convolutions. In this work, we propose and evaluate an alternative approach to directly parameterize convolutional layers that are constrained to be orthogonal. Specifically, we propose to apply the Cayley transform to a skew-symmetric convolution in the Fourier domain, so that the inverse convolution needed by the Cayley transform can be computed efficiently. We compare our method to previous Lipschitz-constrained and orthogonal convolutional layers and show that it indeed preserves orthogonality to a high degree even for large convolutions. Applied to the problem of certified adversarial robustness, we show that networks incorporating the layer outperform existing deterministic methods for certified defense against $\\ell_2$-norm-bounded adversaries, while scaling to larger architectures than previously investigated. Code is available at https://github.com/locuslab/orthogonal-convolutions.", "keywords": "orthogonal layers;Lipschitz constrained networks;adversarial robustness", "primary_area": "", "supplementary_material": "", "author": "Asher Trockman;J Zico Kolter", "authorids": "~Asher_Trockman1;~J_Zico_Kolter1", "gender": ";", "homepage": ";", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Asher_Trockman1;~J_Zico_Kolter1", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@inproceedings{\ntrockman2021orthogonalizing,\ntitle={Orthogonalizing Convolutional Layers with the Cayley Transform},\nauthor={Asher Trockman and J Zico Kolter},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Pbj8H_jEHYv}\n}", "github": "[![github](/images/github_icon.svg) locuslab/orthogonal-convolutions](https://github.com/locuslab/orthogonal-convolutions)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "7;7;7;8", "confidence": "3;4;4;4", "wc_review": "609;303;1367;659", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 7.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 734.5, 389.7752557564423 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.3333333333333333, "gs_citation": 137, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7972253340344904687&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=Pbj8H_jEHYv", "email": ";", "author_num": 2 }, { "id": "PcBVjfeLODY", "title": "Constraining Latent Space to Improve Deep Self-Supervised e-Commerce Products Embeddings for Downstream Tasks", "track": "main", "status": "Reject", "tldr": "", "abstract": "The representation of products in a e-commerce marketplace is a key aspect to be exploited when trying to improve the user experience on the site. A well known example of the importance of a good product representation are tasks such as product search or product recommendation. There is however a multitude of lesser known tasks relevant to the business, examples are the detection of counterfeit items, the estimation of package sizes or the categorization of products, among others. It is in this setting that good vector representations of products that can be reused on different tasks are very valuable. Past years have seen a major increase in research in the area of latent representations for products in e-Commerce. Examples of this are models like Prod2Vec or Meta-Prod2Vec which leverage from the information of a user session in order to generate vectors of the products that can be used in product recommendations. This work proposes a novel deep encoder model for learning product embeddings to be applied in several downstream tasks. The model uses pairs of products that appear together in a browsing session of the users and adds a proximity constraint to the final latent space in order to project the embeddings of similar products close to each other. This has a regularization effect which gives better features representations to use across multiple downstream tasks, we explore such effect in our experimentation by assessing its impact on the performance of the tasks. Our experiments show effectiveness in transfer learning scenarios comparable to several industrial baselines.", "keywords": "representation learning;deep learning;self-supervised learning", "primary_area": "", "supplementary_material": "", "author": "Cristian Cardellino;Rafael Carrascosa", "authorids": "~Cristian_Cardellino1;rafael.carrascosa@mercadolibre.com", "gender": "M;", "homepage": "https://crscardellino.github.io;", "dblp": "152/9316.html;", "google_scholar": "YuIeGYEAAAAJ;", "orcid": ";", "linkedin": "crscardellino/;", "or_profile": "~Cristian_Cardellino1;rafael.carrascosa@mercadolibre.com", "aff": "Universidad Nacional de C\u00f3rdoba;", "aff_domain": "unc.edu.ar;", "position": "Assistant Professor;", "bibtex": "@misc{\ncardellino2021constraining,\ntitle={Constraining Latent Space to Improve Deep Self-Supervised e-Commerce Products Embeddings for Downstream Tasks},\nauthor={Cristian Cardellino and Rafael Carrascosa},\nyear={2021},\nurl={https://openreview.net/forum?id=PcBVjfeLODY}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=PcBVjfeLODY", "pdf_size": 0, "rating": "3;3;4;5", "confidence": "4;4;4;3", "wc_review": "191;313;694;429", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 406.75, 185.97362044117978 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:Dp_6AygLrzIJ:scholar.google.com/&scioq=Constraining+Latent+Space+to+Improve+Deep+Self-Supervised+e-Commerce+Products+Embeddings+for+Downstream+Tasks&hl=en&as_sdt=0,14", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Universidad Nacional de C\u00f3rdoba", "aff_unique_dep": "", "aff_unique_url": "https://www.unc.edu.ar", "aff_unique_abbr": "UNC", "aff_country_unique_index": "0", "aff_country_unique": "Argentina" }, { "id": "PcUprce4TM2", "title": "CAFE: Catastrophic Data Leakage in Federated Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Private training data can be leaked through the gradient sharing mechanism deployed in machine learning systems, such as federated learning (FL).\nIncreasing batch size is often viewed as a promising defense strategy against data leakage. In this paper, we revisit this defense premise and propose an advanced data leakage attack to efficiently recover batch data from the shared aggregated gradients. \nWe name our proposed method as \\textit{\\underline{c}atastrophic d\\underline{a}ta leakage in \\underline{f}ederated l\\underline{e}arning (CAFE)}.\nComparing to existing data leakage attacks, CAFE demonstrates the ability to perform large-batch data leakage attack with high data recovery quality. \nExperimental results on vertical and horizontal FL settings have validated the effectiveness of CAFE in recovering private data from the shared aggregated gradients. \nOur results suggest that data participated in FL, especially the vertical case, have a high risk of being leaked from the training gradients. Our analysis implies unprecedented and practical data leakage risks in those learning settings.", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/03808f06faecc1bcdedfb7ec27af91a9c253d1aa.zip", "author": "Xiao Jin;Ruijie Du;Pin-Yu Chen;Tianyi Chen", "authorids": "jinxiao96@gmail.com;du461007169@gmail.com;~Pin-Yu_Chen1;~Tianyi_Chen1", "gender": ";;M;", "homepage": ";;http://www.pinyuchen.com;", "dblp": ";;39/8969;", "google_scholar": ";;jxwlCUUAAAAJ;", "orcid": ";;0000-0003-1039-8369;", "linkedin": ";;pin-yu-chen-940062a2;", "or_profile": "jinxiao96@gmail.com;du461007169@gmail.com;~Pin-Yu_Chen1;~Tianyi_Chen1", "aff": ";;International Business Machines;", "aff_domain": ";;ibm.com;", "position": ";;Research Staff Member;", "bibtex": "@misc{\njin2021cafe,\ntitle={{\\{}CAFE{\\}}: Catastrophic Data Leakage in Federated Learning},\nauthor={Xiao Jin and Ruijie Du and Pin-Yu Chen and Tianyi Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=PcUprce4TM2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=PcUprce4TM2", "pdf_size": 0, "rating": "3;4;4;4", "confidence": "5;4;3;2", "wc_review": "320;459;382;394", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "307;186;255;195", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 3.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 1.118033988749895 ], "wc_review_avg": [ 388.75, 49.33241834737073 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 235.75, 48.945760797029195 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7745966692414834, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2405059354691182324&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "International Business Machines Corporation", "aff_unique_dep": "", "aff_unique_url": "https://www.ibm.com", "aff_unique_abbr": "IBM", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Iterated learning for emergent systematicity in VQA", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3302", "id": "Pd_oMxH8IlF", "poster": "", "openreview": "https://openreview.net/forum?id=Pd_oMxH8IlF", "slides": "https://iclr.cc/virtual/2021/poster/3302", "video": "https://iclr.cc/virtual/2021/poster/3302", "author_site": "Ankit Vani, Max Schwarzer, Yuchen Lu, Eeshan Dhekane, Aaron Courville", "tldr": "", "abstract": "Although neural module networks have an architectural bias towards compositionality, they require gold standard layouts to generalize systematically in practice. When instead learning layouts and modules jointly, compositionality does not arise automatically and an explicit pressure is necessary for the emergence of layouts exhibiting the right structure. We propose to address this problem using iterated learning, a cognitive science theory of the emergence of compositional languages in nature that has primarily been applied to simple referential games in machine learning. Considering the layouts of module networks as samples from an emergent language, we use iterated learning to encourage the development of structure within this language. We show that the resulting layouts support systematic generalization in neural agents solving the more complex task of visual question-answering. Our regularized iterated learning method can outperform baselines without iterated learning on SHAPES-SyGeT (SHAPES Systematic Generalization Test), a new split of the SHAPES dataset we introduce to evaluate systematic generalization, and on CLOSURE, an extension of CLEVR also designed to test systematic generalization. We demonstrate superior performance in recovering ground-truth compositional program structure with limited supervision on both SHAPES-SyGeT and CLEVR.", "keywords": "iterated learning;cultural transmission;neural module network;clevr;shapes;vqa;visual question answering;systematic generalization;compositionality", "primary_area": "", "supplementary_material": "", "author": "Ankit Vani;Max Schwarzer;Yuchen Lu;Eeshan Dhekane;Aaron Courville", "authorids": "~Ankit_Vani1;~Max_Schwarzer1;~Yuchen_Lu1;eeshandhekane@gmail.com;~Aaron_Courville3", "gender": "M;;M;;", "homepage": "https://ankitvani.com/;;http://jackhaha363.github.io/;;", "dblp": "178/2855;;223/4762;;56/1688", "google_scholar": "KtnTuq8AAAAJ;YmWRSvgAAAAJ;https://scholar.google.ca/citations?hl=en;;https://scholar.google.ca/citations?user=km6CP8cAAAAJ", "orcid": ";;;;", "linkedin": "ankitvani/;maxaschwarzer/;;;", "or_profile": "~Ankit_Vani1;~Max_Schwarzer1;~Yuchen_Lu1;eeshandhekane@gmail.com;~Aaron_Courville3", "aff": "Mila;University of Montreal;University of Montreal;;Universit\u00e9 de Montr\u00e9al", "aff_domain": "mila.quebec;umontreal.ca;umontreal.ca;; ", "position": "PhD student;PhD student;PhD student;;Assistant Professor", "bibtex": "@inproceedings{\nvani2021iterated,\ntitle={Iterated learning for emergent systematicity in {\\{}VQA{\\}}},\nauthor={Ankit Vani and Max Schwarzer and Yuchen Lu and Eeshan Dhekane and Aaron Courville},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Pd_oMxH8IlF}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;7;8", "confidence": "4;3;3", "wc_review": "564;440;381", "wc_reply_reviewers": "171;0;183", "wc_reply_authors": "2173;532;267", "reply_reviewers": "1;0;1", "reply_authors": "4;1;1", "rating_avg": [ 7.0, 0.816496580927726 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 461.6666666666667, 76.2641607979936 ], "wc_reply_reviewers_avg": [ 118.0, 83.58229477586745 ], "wc_reply_authors_avg": [ 990.6666666666666, 843.0066560960372 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 2.0, 1.4142135623730951 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 37, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10778314436717876801&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=Pd_oMxH8IlF", "email": "mila.quebec;umontreal.ca;umontreal.ca;; ", "author_num": 5, "aff_unique_index": "0;1;1;2", "aff_unique_norm": "Mila;University of Montreal;Universit\u00e9 de Montr\u00e9al", "aff_unique_dep": "Quebec Artificial Intelligence Institute;;", "aff_unique_url": "https://mila.quebec;https://wwwumontreal.ca;https://www.umontreal.ca", "aff_unique_abbr": "Mila;UM;UdeM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "Canada" }, { "id": "PdauS7wZBfC", "title": "Predictive Coding Approximates Backprop along Arbitrary Computation Graphs", "track": "main", "status": "Reject", "tldr": "", "abstract": "The backpropagation of error (backprop) is a powerful algorithm for training machine learning architectures through end-to-end differentiation. Recently it has been shown that backprop in multilayer-perceptrons (MLPs) can be approximated using predictive coding, a biologically-plausible process theory of cortical computation which relies solely on local and Hebbian updates. The power of backprop, however, lies not in its instantiation in MLPs, but rather in the concept of automatic differentiation which allows for the optimisation of any differentiable program expressed as a computation graph. Here, we demonstrate that predictive coding converges asymptotically (and in practice rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules. We apply this result to develop a straightforward strategy to translate core machine learning architectures into their predictive coding equivalents. We construct predictive coding CNNs, RNNs, and the more complex LSTMs, which include a non-layer-like branching internal graph structure and multiplicative interactions. Our models perform equivalently to backprop on challenging machine learning benchmarks, while utilising only local and (mostly) Hebbian plasticity. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry, and may also contribute to the development of completely distributed neuromorphic architectures.", "keywords": "Predictive Coding;Backprop;Biological plausibility;neural networks", "primary_area": "", "supplementary_material": "/attachment/8410b48c6f47724320fa3f0abd01731cc027314a.zip", "author": "Beren Millidge;Alexander Tschantz;Christopher Buckley", "authorids": "~Beren_Millidge1;~Alexander_Tschantz1;~Christopher_Buckley1", "gender": "M;M;M", "homepage": "http://beren.io/;;https://christopherlbuckley.com/", "dblp": "244/9967;254/2125;37/3540.html", "google_scholar": "3GGkFTkAAAAJ;5NbVgO0AAAAJ;https://scholar.google.co.uk/citations?user=nWuZ0XcAAAAJ", "orcid": ";;0000-0002-8551-9121", "linkedin": "beren-millidge-377065142/;;", "or_profile": "~Beren_Millidge1;~Alexander_Tschantz1;~Christopher_Buckley1", "aff": "University of Oxford;University of Sussex;", "aff_domain": "ox.ac.uk;sussex.ac.uk;", "position": "Postdoc;PhD student;", "bibtex": "@misc{\nmillidge2021predictive,\ntitle={Predictive Coding Approximates Backprop along Arbitrary Computation Graphs},\nauthor={Beren Millidge and Alexander Tschantz and Christopher Buckley},\nyear={2021},\nurl={https://openreview.net/forum?id=PdauS7wZBfC}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=PdauS7wZBfC", "pdf_size": 0, "rating": "4;6;6;7", "confidence": "4;4;4;3", "wc_review": "957;258;383;350", "wc_reply_reviewers": "67;0;0;0", "wc_reply_authors": "1209;85;302;541", "reply_reviewers": "1;0;0;0", "reply_authors": "3;1;1;1", "rating_avg": [ 5.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 487.0, 275.1935682387944 ], "wc_reply_reviewers_avg": [ 16.75, 29.011851026778693 ], "wc_reply_authors_avg": [ 534.25, 421.63335672121576 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.5, 0.8660254037844386 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6622661785325219, "gs_citation": 155, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=386614314922457199&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "aff_unique_index": "0;1", "aff_unique_norm": "University of Oxford;University of Sussex", "aff_unique_dep": ";", "aff_unique_url": "https://www.ox.ac.uk;https://www.sussex.ac.uk", "aff_unique_abbr": "Oxford;Sussex", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "id": "PeT5p3ocagr", "title": "PGPS : Coupling Policy Gradient with Population-based Search", "track": "main", "status": "Reject", "tldr": "", "abstract": "Gradient-based policy search algorithms (such as PPO, SAC or TD3) in deep reinforcement learning (DRL) have shown successful results on a range of challenging control tasks. However, they often suffer from flat or deceptive gradient problems. As an alternative to policy gradient methods, population-based evolutionary approaches have been applied to DRL. While population-based search algorithms show more robust learning in a broader range of tasks, they are usually inefficient in the use of samples. Recently, reported are a few attempts (such as CEMRL) to combine gradient with a population in searching optimal policy. This kind of hybrid algorithm takes advantage of both camps. In this paper, we propose yet another hybrid algorithm, which more tightly couples policy gradient with the population-based search. More specifically, we use the Cross-Entropy Method (CEM) for population-based search and Twin Delayed Deep Deterministic Policy Gradient (TD3) for policy gradient. In the proposed algorithm called Coupling Policy Gradient with Population-based Search (PGPS), a single TD3 agent, which learns by a gradient from all experiences generated by population, leads a population by providing its critic function Q as a surrogate to select better performing next-generation population from candidates. On the other hand, if the TD3 agent falls behind the CEM population, then the TD3 agent is updated toward the elite member of the CEM population using loss function augmented with the distance between the TD3 and the CEM elite. Experiments in a MuJoCo environment show that PGPS is robust to deceptive gradient and also outperforms the state-of-the-art algorithms.\n", "keywords": "Reinforcement Learning;Population-based Search;Policy Gradient;Combining PG with PS", "primary_area": "", "supplementary_material": "/attachment/695452267c829bb7f6baf829fce06a931b5a86b8.zip", "author": "Namyong Kim;Hyunsuk Baek;Hayong Shin", "authorids": "~Namyong_Kim1;hisuk31@kaist.ac.kr;~Hayong_Shin1", "gender": "M;;M", "homepage": "https://www.kaist.ac.kr;;", "dblp": ";;71/5345", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Namyong_Kim1;hisuk31@kaist.ac.kr;~Hayong_Shin1", "aff": ";;Korea Advanced Institute of Science & Technology", "aff_domain": ";;kaist.ac.kr", "position": ";;Full Professor", "bibtex": "@misc{\nkim2021pgps,\ntitle={{\\{}PGPS{\\}} : Coupling Policy Gradient with Population-based Search},\nauthor={Namyong Kim and Hyunsuk Baek and Hayong Shin},\nyear={2021},\nurl={https://openreview.net/forum?id=PeT5p3ocagr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer5", "site": "https://openreview.net/forum?id=PeT5p3ocagr", "pdf_size": 0, "rating": "3;5;5;5", "confidence": "4;4;3;4", "wc_review": "427;91;469;219", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 301.5, 154.04788216655237 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11128405389090037246&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_country_unique_index": "0", "aff_country_unique": "South Korea" }, { "id": "Peg7mkjzvyP", "title": "iPTR: Learning a representation for interactive program translation retrieval", "track": "main", "status": "Reject", "tldr": "", "abstract": "Program translation contributes to many real world scenarios, such as porting codebases written in an obsolete or deprecated language to a modern one or re-implementing existing projects in one's preferred programming language. Existing data-driven approaches either require large amounts of training data or neglect significant characteristics of programs. In this paper, we present iPTR for interactive code translation retrieval from Big Code. iPTR uses a novel code representation technique that encodes structural characteristics of a program and a predictive transformation technique to transform the representation into the target programming language. The transformed representation is used for code retrieval from Big Code. With our succinct representation, the user can easily update and correct the returned results to improve the retrieval process. Our experiments show that iPTR outperforms supervised baselines in terms of program accuracy.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Binger Chen;Ziawasch Abedjan", "authorids": "~Binger_Chen1;abedjan@dbs.uni-hannover.de", "gender": ";", "homepage": "https://www.bigdama.tu-berlin.de/menue/team/binger_chen/;", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": "~Binger_Chen1;abedjan@dbs.uni-hannover.de", "aff": "TU Berlin;", "aff_domain": "tu-berlin.de;", "position": "PhD student;", "bibtex": "@misc{\nchen2021iptr,\ntitle={i{\\{}PTR{\\}}: Learning a representation for interactive program translation retrieval},\nauthor={Binger Chen and Ziawasch Abedjan},\nyear={2021},\nurl={https://openreview.net/forum?id=Peg7mkjzvyP}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=Peg7mkjzvyP", "pdf_size": 0, "rating": "4;5;6", "confidence": "4;4;3", "wc_review": "335;680;152", "wc_reply_reviewers": "0;0;124", "wc_reply_authors": "742;1373;585", "reply_reviewers": "0;0;1", "reply_authors": "1;2;2", "rating_avg": [ 5.0, 0.816496580927726 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 389.0, 218.91094079556646 ], "wc_reply_reviewers_avg": [ 41.333333333333336, 58.45416057808793 ], "wc_reply_authors_avg": [ 900.0, 340.54759823946296 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:KAJObVF-9UwJ:scholar.google.com/&scioq=iPTR:+Learning+a+representation+for+interactive+program+translation+retrieval&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Technische Universit\u00e4t Berlin", "aff_unique_dep": "", "aff_unique_url": "https://www.tu-berlin.de", "aff_unique_abbr": "TU Berlin", "aff_campus_unique_index": "0", "aff_campus_unique": "Berlin", "aff_country_unique_index": "0", "aff_country_unique": "Germany" }, { "id": "PghuCwnjF6y", "title": "TaskSet: A Dataset of Optimization Tasks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present TaskSet, a dataset of tasks for use in training and evaluating optimizers. TaskSet is unique in its size and diversity, containing over a thousand tasks ranging from image classification with fully connected or convolutional neural networks, to variational autoencoders, to non-volume preserving flows on a variety of datasets. As an example application of such a dataset we explore meta-learning an ordered list of hyperparameters to try sequentially. By learning this hyperparameter list from data generated using TaskSet we achieve large speedups in sample efficiency over random search. Next we use the diversity of the TaskSet and our method for learning hyperparameter lists to empirically explore the generalization of these lists to new optimization tasks in a variety of settings including ImageNet classification with Resnet50 and LM1B language modeling with transformers. As part of this work we have opensourced code for all tasks, as well as ~29 million training curves for these problems and the corresponding hyperparameters.", "keywords": "optimizers;meta-learning", "primary_area": "", "supplementary_material": "/attachment/85fd3fca8140d1af4c385b53177f842361be9891.zip", "author": "Luke Metz;Niru Maheswaranathan;Ruoxi Sun;C. Daniel Freeman;Ben Poole;Jascha Sohl-Dickstein", "authorids": "~Luke_Metz1;~Niru_Maheswaranathan1;~Ruoxi_Sun2;~C._Daniel_Freeman1;~Ben_Poole1;~Jascha_Sohl-Dickstein2", "gender": "M;F;M;M;M;M", "homepage": "http://lukemetz.com;;https://github.com/danielfreeman11/;https://cs.stanford.edu/~poole;http://sohldickstein.com;http://niru.dev/", "dblp": ";72/7683;190/7046;16/10397;51/7117;155/7407", "google_scholar": "jCOmCb4AAAAJ;ut1-7LAAAAAJ;t5Xsx0IAAAAJ;i5FMLA4AAAAJ;-3zYIjQAAAAJ;bEOT7ScAAAAJ", "orcid": ";;;;;", "linkedin": ";;daniel-freeman-6952136?trk=hp-identity-name;;;", "or_profile": "~Luke_Metz1;~Ruoxi_Sun2;~C._Daniel_Freeman1;~Ben_Poole1;~Jascha_Sohl-Dickstein1;~Niru_Maheswaranathan2", "aff": "Google;Google;Google Research;Google;Google;Google", "aff_domain": "google.com;google.com;google.com;google.com;google.com;google.com", "position": "Research Scientist;Google;Software Engineer;Research Scientist;Research Scientist;Research Engineer", "bibtex": "@misc{\nmetz2021taskset,\ntitle={TaskSet: A Dataset of Optimization Tasks},\nauthor={Luke Metz and Niru Maheswaranathan and Ruoxi Sun and C. Daniel Freeman and Ben Poole and Jascha Sohl-Dickstein},\nyear={2021},\nurl={https://openreview.net/forum?id=PghuCwnjF6y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=PghuCwnjF6y", "pdf_size": 0, "rating": "3;5;5;7", "confidence": "4;2;4;4", "wc_review": "214;356;487;482", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "412;501;379;611", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 1.4142135623730951 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 384.75, 111.68566380695421 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 475.75, 89.93713081925618 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:_HjKLlwdt7sJ:scholar.google.com/&scioq=TaskSet:+A+Dataset+of+Optimization+Tasks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google", "aff_unique_url": "https://www.google.com", "aff_unique_abbr": "Google", "aff_campus_unique_index": "0;0;0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "id": "Pgq5GE_-ph", "title": "Video Prediction with Variational Temporal Hierarchies", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep learning has shown promise for accurately predicting high-dimensional video sequences. Existing video prediction models succeeded in generating sharp but often short video sequences. Toward improving long-term video prediction, we study hierarchical latent variable models with levels that process at different time scales. To gain insights into the representations of such models, we study the information stored at each level of the hierarchy via the KL divergence, predictive entropy, datasets of varying speed, and generative distributions. Our analysis confirms that faster changing details are generally captured by lower levels, while slower changing facts are remembered by higher levels. On synthetic datasets where common methods fail after 25 frames, we show that temporally abstract latent variable models can make accurate predictions for up to 200 frames.", "keywords": "latent dynamics;temporal abstraction;video prediction;probabilistic modeling;variational inference;deep learning", "primary_area": "", "supplementary_material": "", "author": "Vaibhav Saxena;Jimmy Ba;Danijar Hafner", "authorids": "~Vaibhav_Saxena1;~Jimmy_Ba1;~Danijar_Hafner1", "gender": "M;M;", "homepage": "https://sites.google.com/view/vaibhavsaxena;http://jimmylba.github.io;https://danijar.com", "dblp": "90/5273;https://dblp.org/pers/b/Ba:Jimmy.html;184/8088", "google_scholar": "J9xMyxMAAAAJ;https://scholar.google.ca/citations?user=ymzxRhAAAAAJ;VINmGpYAAAAJ", "orcid": ";;0000-0002-9534-7271", "linkedin": "vaibhavsaxena11/;;", "or_profile": "~Vaibhav_Saxena1;~Jimmy_Ba1;~Danijar_Hafner1", "aff": "Georgia Institute of Technology;Department of Computer Science, University of Toronto;University of Toronto", "aff_domain": "gatech.edu;cs.toronto.edu;cs.toronto", "position": "PhD student;Assistant Professor;PhD student", "bibtex": "@misc{\nsaxena2021video,\ntitle={Video Prediction with Variational Temporal Hierarchies},\nauthor={Vaibhav Saxena and Jimmy Ba and Danijar Hafner},\nyear={2021},\nurl={https://openreview.net/forum?id=Pgq5GE_-ph}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=Pgq5GE_-ph", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "5;4;4;4", "wc_review": "254;303;457;203", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "562;432;577;296", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 304.25, 95.01414368398002 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 466.75, 113.56798624612483 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.816496580927726, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:p8vJYAicC0oJ:scholar.google.com/&scioq=Video+Prediction+with+Variational+Temporal+Hierarchies&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1;1", "aff_unique_norm": "Georgia Institute of Technology;University of Toronto", "aff_unique_dep": ";Department of Computer Science", "aff_unique_url": "https://www.gatech.edu;https://www.utoronto.ca", "aff_unique_abbr": "Georgia Tech;U of T", "aff_campus_unique_index": "1", "aff_campus_unique": ";Toronto", "aff_country_unique_index": "0;1;1", "aff_country_unique": "United States;Canada" }, { "id": "PhV-qfEi3Mr", "title": "Improving the accuracy of neural networks in analog computing-in-memory systems by a generalized quantization method", "track": "main", "status": "Reject", "tldr": "", "abstract": "Crossbar-enabled analog computing-in-memory (CACIM) systems can significantly improve the computation speed and energy efficiency of deep neural networks (DNNs). However, the transition of DNN from the digital systems to CACIM systems usually reduces its accuracy. The major issue is that the weights of DNN are stored and calculated directly on analog quantities in CACIM systems. The variation and programming overhead of the analog weight limit the precision.\nTherefore, a suitable quantization algorithm is important when deploying a DNN into CACIM systems to obtain less accuracy loss. The analog weight has its unique advantages when doing quantization. Because there is no encoding and decoding process, the set of quanta will not affect the computing process. Therefore, a generalized quantization method that does not constrain the range of quanta and can obtain less quantization error will be effective in CACIM systems. For the first time, we introduced a generalized quantization method into CACIM systems and showed superior performance on a series of computer vision tasks, such as image classification, object detection, and semantic segmentation. Using the generalized quantization method, the DNN with 8-level analog weights can outperform the 32-bit networks. With fewer levels, the generalized quantization method can obtain less accuracy loss than other uniform quantization methods.", "keywords": "analog computing-in-memory;quantization algorithm;deep neural networks", "primary_area": "", "supplementary_material": "", "author": "Lingjun Dai;Qingtian Zhang;Huaqiang Wu", "authorids": "~Lingjun_Dai1;~Qingtian_Zhang1;~Huaqiang_Wu1", "gender": "M;M;M", "homepage": ";;http://www.ime.tsinghua.edu.cn/publish/ime/5910/2018/20180905113610190771930/20180905113610190771930_.html", "dblp": ";;", "google_scholar": ";;", "orcid": ";0000-0003-2732-3419;0000-0001-8359-7997", "linkedin": "%E5%87%8C%E5%90%9B-%E4%BB%A3-3b42b1b6/;;", "or_profile": "~Lingjun_Dai1;~Qingtian_Zhang1;~Huaqiang_Wu1", "aff": ";;Tsinghua University", "aff_domain": ";;tsinghua.edu.cn", "position": ";;Full Professor", "bibtex": "@misc{\ndai2021improving,\ntitle={Improving the accuracy of neural networks in analog computing-in-memory systems by a generalized quantization method},\nauthor={Lingjun Dai and Qingtian Zhang and Huaqiang Wu},\nyear={2021},\nurl={https://openreview.net/forum?id=PhV-qfEi3Mr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "site": "https://openreview.net/forum?id=PhV-qfEi3Mr", "pdf_size": 0, "rating": "3;4;5;5", "confidence": "5;4;4;4", "wc_review": "259;227;213;314", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "426;215;270;393", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.25, 0.82915619758885 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 253.25, 38.8353897881816 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 326.0, 86.52456298647223 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:3_x46PHG-JUJ:scholar.google.com/&scioq=Improving+the+accuracy+of+neural+networks+in+analog+computing-in-memory+systems+by+a+generalized+quantization+method&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Tsinghua University", "aff_unique_dep": "", "aff_unique_url": "https://www.tsinghua.edu.cn", "aff_unique_abbr": "THU", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "id": "PiKUvDj5jyN", "title": "Relational Learning with Variational Bayes", "track": "main", "status": "Reject", "tldr": "", "abstract": "In psychology, relational learning refers to the ability to recognize and respond to\nrelationship among objects irrespective of the nature of those objects. Relational\nlearning has long been recognized as a hallmark of human cognition and a key\nquestion in artificial intelligence research. In this work, we propose an unsupervised\nlearning method for addressing the relational learning problem where we\nlearn the underlying relationship between a pair of data irrespective of the nature\nof those data. The central idea of the proposed method is to encapsulate the relational\nlearning problem with a probabilistic graphical model in which we perform\ninference to learn about data relationships and other relational processing tasks.", "keywords": "Relational learning;unsupervised learning;variational inference;probabilistic graphical model", "primary_area": "", "supplementary_material": "", "author": "Kuang-Hung Liu", "authorids": "~Kuang-Hung_Liu1", "gender": "", "homepage": "https://scholar.google.com/citations?user=eaxkzLcAAAAJ&hl=en", "dblp": "", "google_scholar": "eaxkzLcAAAAJ", "orcid": "", "linkedin": "", "or_profile": "~Kuang-Hung_Liu1", "aff": "ExxonMobil", "aff_domain": "exxonmobil.com", "position": "Research Scientist", "bibtex": "@misc{\nliu2021relational,\ntitle={Relational Learning with Variational Bayes},\nauthor={Kuang-Hung Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=PiKUvDj5jyN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=PiKUvDj5jyN", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "3;3;3;3", "wc_review": "142;797;402;383", "wc_reply_reviewers": "123;1703;170;0", "wc_reply_authors": "1794;4466;1120;2383", "reply_reviewers": "2;4;1;0", "reply_authors": "5;9;3;5", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 431.0, 234.85208110638493 ], "wc_reply_reviewers_avg": [ 499.0, 697.8957658561915 ], "wc_reply_authors_avg": [ 2440.75, 1251.762632251019 ], "reply_reviewers_avg": [ 1.75, 1.479019945774904 ], "reply_authors_avg": [ 5.5, 2.179449471770337 ], "replies_avg": [ 39, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5420984774213966581&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 2, "aff_unique_index": "0", "aff_unique_norm": "ExxonMobil Corporation", "aff_unique_dep": "", "aff_unique_url": "https://www.exxonmobil.com", "aff_unique_abbr": "ExxonMobil", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "PkqwRo2wjuW", "title": "Learning Axioms to Compute Verifiable Symbolic Expression Equivalence Proofs Using Graph-to-Sequence Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We target the problem of proving the semantic equivalence between two complex expressions represented as typed trees, and demonstrate our system on expressions from a rich multi-type symbolic language for linear algebra. We propose the first graph-to-sequence deep learning system to generate axiomatic proofs of equivalence between program pairs. We generate expressions which include scalars, vectors and matrices and 16 distinct operators combining them, with 147 distinct axioms of equivalence. We study the robustness of the system to generate proofs of increasing length, demonstrating how incremental graph-to-sequence networks can learn to represent complex and verifiable symbolic reasoning. It achieves 93% average true positive coverage on 10,000 test cases while ensuring zero false positives by design.", "keywords": "Graph Neural Network;Symbolic Proofs;Graph-to-Sequence", "primary_area": "", "supplementary_material": "", "author": "Steven James Kommrusch;Louis-Noel Pouchet;Theo Barolett", "authorids": "~Steven_James_Kommrusch1;~Louis-Noel_Pouchet2;theo.barolett@inria.fr", "gender": "M;;", "homepage": "https://www.cs.colostate.edu/~steveko/;;", "dblp": "https://dblp.uni-trier.de/pid/230/4452.html;;", "google_scholar": "lfv3-LcAAAAJ;;", "orcid": ";;", "linkedin": "stevenkommrusch/;;", "or_profile": "~Steven_James_Kommrusch1;~Louis-Noel_Pouchet2;theo.barolett@inria.fr", "aff": "Colorado State University;University of California-Los Angeles;", "aff_domain": "colostate.edu;;", "position": "PhD student;;", "bibtex": "@misc{\nkommrusch2021learning,\ntitle={Learning Axioms to Compute Verifiable Symbolic Expression Equivalence Proofs Using Graph-to-Sequence Networks},\nauthor={Steven James Kommrusch and Louis-Noel Pouchet and Theo Barolett},\nyear={2021},\nurl={https://openreview.net/forum?id=PkqwRo2wjuW}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=PkqwRo2wjuW", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "3;4;5;3", "wc_review": "399;622;567;474", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 515.5, 85.57014666342462 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.0909090909090909, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:nKA7Vp-9Mq0J:scholar.google.com/&scioq=Learning+Axioms+to+Compute+Verifiable+Symbolic+Expression+Equivalence+Proofs+Using+Graph-to-Sequence+Networks&hl=en&as_sdt=0,14", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Colorado State University;University of California, Los Angeles", "aff_unique_dep": ";", "aff_unique_url": "https://www.colostate.edu;https://www.ucla.edu", "aff_unique_abbr": "CSU;UCLA", "aff_campus_unique_index": "1", "aff_campus_unique": ";Los Angeles", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "PmUGXmOY1wK", "title": "GL-Disen: Global-Local disentanglement for unsupervised learning of graph-level representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph-level representation learning plays a crucial role in a variety of tasks such as molecular property prediction and community analysis. Currently, several models based on mutual information maximization have shown strong performance on the task of unsupervised graph representation learning. In this paper, instead, we consider a disentanglement approach to learn graph-level representations in the unsupervised setting. Our work is the first to study disentanglement learning for graph-level representations. Our key observation is that the formation of many real-world graphs is a complex process with global and local generative factors. We hypothesize that disentangled representations which capture these global and local generative factors into independent latent units can be highly beneficial. Specifically, for graph-level representation learning, our disentanglement approach can alleviate distraction due to local variations of individual nodes or individual local neighbourhoods. We propose a VAE based learning algorithm to disentangle the global graph-level information, which is common across the entire graph, and local patch-level information, which varies across individual patches (the local subgraphs centered around the nodes). Through extensive experiments and analysis, we show that our method achieves the state-of-the-art performance on the task of unsupervised graph representation learning.\n", "keywords": "Unsupervised Graph Representations;Disentanglement Learning;GNN;Unsupervised Learning", "primary_area": "", "supplementary_material": "/attachment/4df6dc0a9bf9cc365f348b178c855e10fdb62233.zip", "author": "Thilini Cooray;Ngai-man Cheung;Wei Lu", "authorids": "~Thilini_Cooray1;~Ngai-man_Cheung1;~Wei_Lu4", "gender": "F;M;M", "homepage": ";https://sites.google.com/site/mancheung0407/;https://istd.sutd.edu.sg/people/faculty/lu-wei", "dblp": ";82/3605;98/6613-11.html", "google_scholar": ";https://scholar.google.com.sg/citations?hl=en;n41KN9AAAAAJ", "orcid": ";0000-0003-0135-3791;0000-0003-0827-0382", "linkedin": "thilinicooray;;wei-lu-59aa9615/", "or_profile": "~Thilini_Cooray1;~Ngai-man_Cheung1;~Wei_Lu9", "aff": "Singapore University of Technology and Design;Singapore University of Technology and Design;Singapore University of Technology and Design", "aff_domain": "sutd.edu.sg;sutd.edu.sg;sutd.edu.sg", "position": "PhD student;Associate Professor;Associate Professor", "bibtex": "@misc{\ncooray2021gldisen,\ntitle={{\\{}GL{\\}}-Disen: Global-Local disentanglement for unsupervised learning of graph-level representations},\nauthor={Thilini Cooray and Ngai-man Cheung and Wei Lu},\nyear={2021},\nurl={https://openreview.net/forum?id=PmUGXmOY1wK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=PmUGXmOY1wK", "pdf_size": 0, "rating": "3;4;5;5;6", "confidence": "4;4;3;4;4", "wc_review": "373;232;312;578;271", "wc_reply_reviewers": "0;0;0;94;0", "wc_reply_authors": "1782;1725;739;1444;1183", "reply_reviewers": "0;0;0;1;0", "reply_authors": "3;3;2;3;2", "rating_avg": [ 4.6, 1.0198039027185568 ], "confidence_avg": [ 3.8, 0.39999999999999997 ], "wc_review_avg": [ 353.2, 121.7134339339746 ], "wc_reply_reviewers_avg": [ 18.8, 37.6 ], "wc_reply_authors_avg": [ 1374.6, 383.21513540046925 ], "reply_reviewers_avg": [ 0.2, 0.4 ], "reply_authors_avg": [ 2.6, 0.4898979485566356 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.19611613513818407, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:S1jf5sXn2O0J:scholar.google.com/&scioq=GL-Disen:+Global-Local+disentanglement+for+unsupervised+learning+of+graph-level+representations&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0;0", "aff_unique_norm": "Singapore University of Technology and Design", "aff_unique_dep": "", "aff_unique_url": "https://www.sutd.edu.sg", "aff_unique_abbr": "SUTD", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Singapore" }, { "id": "PmVfnB0nkqr", "title": "Autonomous Learning of Object-Centric Abstractions for High-Level Planning", "track": "main", "status": "Reject", "tldr": "", "abstract": "We propose a method for autonomously learning an object-centric representation of a continuous and high-dimensional environment that is suitable for planning. Such representations can immediately be transferred between tasks that share the same types of objects, resulting in agents that require fewer samples to learn a model of a new task. We first demonstrate our approach on a simple domain where the agent learns a compact, lifted representation that generalises across objects. We then apply it to a series of Minecraft tasks to learn object-centric representations, including object types\u2014directly from pixel data\u2014that can be leveraged to solve new tasks quickly. The resulting learned representations enable the use of a task-level planner, resulting in an agent capable of forming complex, long-term plans with considerably fewer environment interactions.", "keywords": "reinforcement learning;planning;PDDL;multitask;transfer;objects", "primary_area": "", "supplementary_material": "/attachment/254b7e25671e9dd36d56f4570d603265f7888640.zip", "author": "Steven James;Benjamin Rosman;George Konidaris", "authorids": "~Steven_James1;~Benjamin_Rosman1;~George_Konidaris1", "gender": "M;M;M", "homepage": ";http://www.raillab.org;http://cs.brown.edu/people/gdk/", "dblp": "195/8202;45/4591;56/6762", "google_scholar": ";https://scholar.google.co.za/citations?user=pWJ0SocAAAAJ;9UERvVEAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Steven_James1;~Benjamin_Rosman1;~George_Konidaris1", "aff": "University of the Witwatersrand;University of the Witwatersrand;Brown University", "aff_domain": "wits.ac.za;wits.ac.za;brown.edu", "position": "PhD student;Full Professor;Assistant Professor", "bibtex": "@misc{\njames2021autonomous,\ntitle={Autonomous Learning of Object-Centric Abstractions for High-Level Planning},\nauthor={Steven James and Benjamin Rosman and George Konidaris},\nyear={2021},\nurl={https://openreview.net/forum?id=PmVfnB0nkqr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=PmVfnB0nkqr", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "4;3;4;1", "wc_review": "326;688;426;82", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "588;1417;683;269", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 3.0, 1.224744871391589 ], "wc_review_avg": [ 380.5, 217.19749077740286 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 739.25, 420.2739433988265 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8660254037844385, "gs_citation": 28, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11456250171836680945&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of the Witwatersrand;Brown University", "aff_unique_dep": ";", "aff_unique_url": "https://www.wits.ac.za;https://www.brown.edu", "aff_unique_abbr": "Wits;Brown", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "South Africa;United States" }, { "id": "PoP96DrBHnl", "title": "Gradient descent temporal difference-difference learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Off-policy algorithms, in which a behavior policy differs from the target policy and is used to gain experience for learning, have proven to be of great practical value in reinforcement learning. However, even for simple convex problems such as linear value function approximation, these algorithms are not guaranteed to be stable. To address this, alternative algorithms that are provably convergent in such cases have been introduced, the most well known being gradient descent temporal difference (GTD) learning. This algorithm and others like it, however, tend to converge much more slowly than conventional temporal difference learning.\nIn this paper we propose gradient descent temporal difference-difference (Gradient-DD) learning in order to accelerate GTD learning by introducing second-order differences in successive parameter updates.\nWe investigate this algorithm in the framework of linear value function approximation and analytically showing its improvement over GTD learning. Studying the model empirically on the random walk and Boyan-chain prediction tasks, we find substantial improvement over GTD learning and, in several cases, better performance even than conventional TD learning.\n", "keywords": "temporal difference learning;gradient-descent based temporal difference;Off-policy;regularization", "primary_area": "", "supplementary_material": "", "author": "Rong Zhu;James Murray", "authorids": "~Rong_Zhu4;jmurray9@uoregon.edu", "gender": ";", "homepage": ";", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": ";", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@misc{\nzhu2021gradient,\ntitle={Gradient descent temporal difference-difference learning},\nauthor={Rong Zhu and James Murray},\nyear={2021},\nurl={https://openreview.net/forum?id=PoP96DrBHnl}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=PoP96DrBHnl", "pdf_size": 0, "rating": "3;5;5;5", "confidence": "4;4;4;4", "wc_review": "2017;328;640;1045", "wc_reply_reviewers": "524;0;0;150", "wc_reply_authors": "969;348;332;804", "reply_reviewers": "1;0;0;1", "reply_authors": "2;1;1;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 1007.5, 635.8602440788385 ], "wc_reply_reviewers_avg": [ 168.5, 214.1885851300204 ], "wc_reply_authors_avg": [ 613.25, 279.46500228114434 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=874159245264011837&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3 }, { "id": "PpOtGYNVT6A", "title": "A Probabilistic Model for Discriminative and Neuro-Symbolic Semi-Supervised Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Strong progress has been achieved in semi-supervised learning (SSL) by combining several methods, some of which relate to properties of the data distribution p(x), others to the model outputs p(y|x), e.g. minimising the entropy of unlabelled predictions. Focusing on the latter, we fill a gap in the standard text by introducing a probabilistic model for discriminative semi-supervised learning, mirroring the classical generative model. Several SSL methods are theoretically explained by our model as inducing (approximate) strong priors over parameters of p(y|x). Applying this same probabilistic model to tasks in which labels represent binary attributes, we theoretically justify a family of neuro-symbolic SSL approaches, taking a step towards bridging the divide between statistical learning and logical reasoning.", "keywords": "semi-supervised learning;probabilistic model;neuro-symbolic learning", "primary_area": "", "supplementary_material": "", "author": "Carl Allen;Ivana Balazevic;Timothy Hospedales", "authorids": "~Carl_Allen1;~Ivana_Balazevic1;~Timothy_Hospedales1", "gender": "M;F;M", "homepage": "https://carl-allen.github.io/;https://ibalazevic.github.io/;http://homepages.inf.ed.ac.uk/thospeda/", "dblp": "220/5654;185/0837;32/3545", "google_scholar": "https://scholar.google.co.uk/citations?user=wRcURR8AAAAJ;CnxZPkkAAAAJ;https://scholar.google.fr/citations?user=nHhtvqkAAAAJ", "orcid": "0000-0002-1536-657X;;0000-0003-4867-7486", "linkedin": ";;timothyhospedales/", "or_profile": "~Carl_Allen1;~Ivana_Balazevic1;~Timothy_Hospedales1", "aff": "University of Edinburgh;University of Edinburgh;Samsung AI Research Centre", "aff_domain": "ed.ac.uk;ed.ac.uk;samsung.com", "position": "PhD student;PhD student;Principal Researcher", "bibtex": "@misc{\nallen2021a,\ntitle={A Probabilistic Model for Discriminative and Neuro-Symbolic Semi-Supervised Learning},\nauthor={Carl Allen and Ivana Balazevic and Timothy Hospedales},\nyear={2021},\nurl={https://openreview.net/forum?id=PpOtGYNVT6A}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=PpOtGYNVT6A", "pdf_size": 0, "rating": "3;4;5;7", "confidence": "4;4;3;2", "wc_review": "192;740;758;161", "wc_reply_reviewers": "0;49;22;0", "wc_reply_authors": "351;1383;647;56", "reply_reviewers": "0;1;1;0", "reply_authors": "1;2;2;1", "rating_avg": [ 4.75, 1.479019945774904 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 462.75, 286.5304303211092 ], "wc_reply_reviewers_avg": [ 17.75, 20.154093876927337 ], "wc_reply_authors_avg": [ 609.25, 493.1766291096933 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.9683296637314885, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12607804652693042348&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "University of Edinburgh;Samsung", "aff_unique_dep": ";AI Research", "aff_unique_url": "https://www.ed.ac.uk;https://www.samsung.com/global/researchers/samsung-ai-research-centre/", "aff_unique_abbr": "Edinburgh;SARC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United Kingdom;South Korea" }, { "title": "Generative Time-series Modeling with Fourier Flows", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2750", "id": "PpshD0AXfA", "poster": "", "openreview": "https://openreview.net/forum?id=PpshD0AXfA", "slides": "https://iclr.cc/virtual/2021/poster/2750", "video": "https://iclr.cc/virtual/2021/poster/2750", "author_site": "Ahmed Alaa, Alex Chan, Mihaela van der Schaar", "tldr": "", "abstract": "Generating synthetic time-series data is crucial in various application domains, such as medical prognosis, wherein research is hamstrung by the lack of access to data due to concerns over privacy. Most of the recently proposed methods for generating synthetic time-series rely on implicit likelihood modeling using generative adversarial networks (GANs)\u2014but such models can be difficult to train, and may jeopardize privacy by \u201cmemorizing\u201d temporal patterns in training data. In this paper, we propose an explicit likelihood model based on a novel class of normalizing flows that view time-series data in the frequency-domain rather than the time-domain. The proposed flow, dubbed a Fourier flow, uses a discrete Fourier transform (DFT) to convert variable-length time-series with arbitrary sampling periods into fixed-length spectral representations, then applies a (data-dependent) spectral filter to the frequency-transformed time-series. We show that, by virtue of the DFT analytic properties, the Jacobian determinants and inverse mapping for the Fourier flow can be computed efficiently in linearithmic time, without imposing explicit structural constraints as in existing flows such as NICE (Dinh et al. (2014)), RealNVP (Dinh et al. (2016)) and GLOW (Kingma & Dhariwal (2018)). Experiments show that Fourier flows perform competitively compared to state-of-the-art baselines.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Ahmed Alaa;Alex James Chan;Mihaela van der Schaar", "authorids": "~Ahmed_Alaa1;~Alex_James_Chan1;~Mihaela_van_der_Schaar2", "gender": "M;M;F", "homepage": "https://alaalab.berkeley.edu/;https://alexjchan.com;https://www.vanderschaar-lab.com", "dblp": "140/7324;268/6948;", "google_scholar": "https://scholar.google.com.eg/citations?user=_pv1sEcAAAAJ;yfy_BGIAAAAJ;DZ3S--MAAAAJ", "orcid": ";;", "linkedin": ";alex-chan-040081131/;", "or_profile": "~Ahmed_Alaa1;~Alex_James_Chan1;~Mihaela_van_der_Schaar2", "aff": ";University of Cambridge;University of California, Los Angeles", "aff_domain": ";cam.ac.uk;ucla.edu", "position": ";PhD student;Full Professor", "bibtex": "@inproceedings{\nalaa2021generative,\ntitle={Generative Time-series Modeling with Fourier Flows},\nauthor={Ahmed Alaa and Alex James Chan and Mihaela van der Schaar},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PpshD0AXfA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;3;4;4", "wc_review": "442;202;294;374", "wc_reply_reviewers": "164;0;0;70", "wc_reply_authors": "1681;798;310;961", "reply_reviewers": "1;0;0;1", "reply_authors": "3;2;1;2", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 328.0, 89.64373932405988 ], "wc_reply_reviewers_avg": [ 58.5, 67.28112662552553 ], "wc_reply_authors_avg": [ 937.5, 491.56917112447155 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 54, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12942747538259864726&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=PpshD0AXfA", "email": ";cam.ac.uk;ucla.edu", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "University of Cambridge;University of California, Los Angeles", "aff_unique_dep": ";", "aff_unique_url": "https://www.cam.ac.uk;https://www.ucla.edu", "aff_unique_abbr": "Cambridge;UCLA", "aff_campus_unique_index": "0;1", "aff_campus_unique": "Cambridge;Los Angeles", "aff_country_unique_index": "0;1", "aff_country_unique": "United Kingdom;United States" }, { "id": "PrvaKdJcKhX", "title": "Differentiable Approximations for Multi-resource Spatial Coverage Problems", "track": "main", "status": "Reject", "tldr": "", "abstract": "Resource allocation for coverage of physical spaces is a challenging problem in robotic surveillance, mobile sensor networks and security domains. Recent gradient-based optimization approaches to this problem estimate utilities of actions by using neural networks to learn a differentiable approximation to spatial coverage objectives. In this work, we empirically show that spatial coverage objectives with multiple-resources are combinatorially hard to approximate for neural networks and lead to sub-optimal policies. As our major contribution, we propose a tractable framework to approximate a general class of spatial coverage objectives and their gradients using a combination of Newton-Leibniz theorem, spatial discretization and implicit boundary differentiation. We empirically demonstrate the efficacy of our proposed framework on single and multi-agent spatial coverage problems.", "keywords": "Multi-agent coverage;Multi-resource coverage;Areal coverage;Differentiable approximations", "primary_area": "", "supplementary_material": "/attachment/01fa35f00bf85dc65b61a8a5202224afdd593286.zip", "author": "Nitin Kamra;Yan Liu", "authorids": "~Nitin_Kamra1;~Yan_Liu1", "gender": "M;F", "homepage": "https://nitinkamra1992.github.io/;http://www-bcf.usc.edu/~liu32/", "dblp": "169/2428;150/4295", "google_scholar": "Qn_rbIgAAAAJ;UUKLPMYAAAAJ", "orcid": ";0000-0002-7055-9518", "linkedin": ";", "or_profile": "~Nitin_Kamra1;~Yan_Liu1", "aff": "University of Southern California;University of Southern California", "aff_domain": "usc.edu;usc.edu", "position": "PhD student;Professor", "bibtex": "@misc{\nkamra2021differentiable,\ntitle={Differentiable Approximations for Multi-resource Spatial Coverage Problems},\nauthor={Nitin Kamra and Yan Liu},\nyear={2021},\nurl={https://openreview.net/forum?id=PrvaKdJcKhX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=PrvaKdJcKhX", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;4;2;1", "wc_review": "507;275;301;68", "wc_reply_reviewers": "0;19;0;0", "wc_reply_authors": "735;388;140;43", "reply_reviewers": "0;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 2.75, 1.299038105676658 ], "wc_review_avg": [ 287.75, 155.48211311916236 ], "wc_reply_reviewers_avg": [ 4.75, 8.227241335952167 ], "wc_reply_authors_avg": [ 326.5, 267.3055367926373 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.986440050415621, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:cYYMTYvYM5AJ:scholar.google.com/&scioq=Differentiable+Approximations+for+Multi-resource+Spatial+Coverage+Problems&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Southern California", "aff_unique_dep": "", "aff_unique_url": "https://www.usc.edu", "aff_unique_abbr": "USC", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Los Angeles", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "CcGAN: Continuous Conditional Generative Adversarial Networks for Image Generation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3022", "id": "PrzjugOsDeE", "poster": "", "openreview": "https://openreview.net/forum?id=PrzjugOsDeE", "slides": "https://iclr.cc/virtual/2021/poster/3022", "video": "https://iclr.cc/virtual/2021/poster/3022", "author_site": "Xin Ding, Yongwei Wang, Zuheng Xu, William J Welch, Z. J Wang", "tldr": "", "abstract": "This work proposes the continuous conditional generative adversarial network (CcGAN), the first generative model for image generation conditional on continuous, scalar conditions (termed regression labels). Existing conditional GANs (cGANs) are mainly designed for categorical conditions (e.g., class labels); conditioning on a continuous label is mathematically distinct and raises two fundamental problems: (P1) Since there may be very few (even zero) real images for some regression labels, minimizing existing empirical versions of cGAN losses (a.k.a. empirical cGAN losses) often fails in practice; (P2) Since regression labels are scalar and infinitely many, conventional label input methods (e.g., combining a hidden map of the generator/discriminator with a one-hot encoded label) are not applicable. The proposed CcGAN solves the above problems, respectively, by (S1) reformulating existing empirical cGAN losses to be appropriate for the continuous scenario; and (S2) proposing a novel method to incorporate regression labels into the generator and the discriminator. The reformulation in (S1) leads to two novel empirical discriminator losses, termed the hard vicinal discriminator loss (HVDL) and the soft vicinal discriminator loss (SVDL) respectively, and a novel empirical generator loss. The error bounds of a discriminator trained with HVDL and SVDL are derived under mild assumptions in this work. A new benchmark dataset, RC-49, is also proposed for generative image modeling conditional on regression labels. Our experiments on the Circular 2-D Gaussians, RC-49, and UTKFace datasets show that CcGAN is able to generate diverse, high-quality samples from the image distribution conditional on a given regression label. Moreover, in these experiments, CcGAN substantially outperforms cGAN both visually and quantitatively.", "keywords": "Conditional generative adversarial networks;image generation;continuous and scalar conditions", "primary_area": "", "supplementary_material": "/attachment/f4c0a4582ecec95982196843462261c480b07b2e.zip", "author": "Xin Ding;Yongwei Wang;Zuheng Xu;William J Welch;Z. Jane Wang", "authorids": "~Xin_Ding2;~Yongwei_Wang1;~Zuheng_Xu1;~William_J_Welch1;~Z._Jane_Wang1", "gender": "M;M;M;;F", "homepage": ";https://enkiwang.github.io/index.html;https://zuhengxu.github.io/;https://www.stat.ubc.ca/users/william-j-welch;https://www.ece.ubc.ca/~zjanew", "dblp": ";;278/8104;;13/3672-1", "google_scholar": ";https://scholar.google.ca/citations?hl=en;lkMkblkAAAAJ;;https://scholar.google.ca/citations?user=W75uTm8AAAAJ", "orcid": "0000-0003-2183-607X;0000-0001-9712-8964;;;0000-0002-3791-0249", "linkedin": ";;zuheng-david-xu-29825624b/;;", "or_profile": "~Xin_Ding2;~Yongwei_Wang1;~Zuheng_Xu1;~William_J_Welch1;~Z._Jane_Wang1", "aff": "University of British Columbia;University of British Columbia;University of British Columbia;University of British Columbia;University of British Columbia", "aff_domain": "ubc.ca;ubc.ca;ubc.ca;ubc.ca;ubc.ca", "position": "PhD student;PhD student;PhD student;Full Professor;Full Professor", "bibtex": "@inproceedings{\nding2021ccgan,\ntitle={Cc{\\{}GAN{\\}}: Continuous Conditional Generative Adversarial Networks for Image Generation},\nauthor={Xin Ding and Yongwei Wang and Zuheng Xu and William J Welch and Z. Jane Wang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PrzjugOsDeE}\n}", "github": "[![github](/images/github_icon.svg) UBCDingXin/improved_CcGAN](https://github.com/UBCDingXin/improved_CcGAN)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer5;AnonReviewer3", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "3;3;3;4", "wc_review": "152;507;441;544", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "951;1024;1415;570", "reply_reviewers": "0;0;0;0", "reply_authors": "2;3;3;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 411.0, 154.01785610766046 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 990.0, 299.87580762709086 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.816496580927726, "gs_citation": 102, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7304940947545542123&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=PrzjugOsDeE", "email": "ubc.ca;ubc.ca;ubc.ca;ubc.ca;ubc.ca", "author_num": 5, "aff_unique_index": "0;0;0;0;0", "aff_unique_norm": "University of British Columbia", "aff_unique_dep": "", "aff_unique_url": "https://www.ubc.ca", "aff_unique_abbr": "UBC", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "Canada" }, { "id": "PsdsEbzxZWr", "title": "Analyzing and Improving Generative Adversarial Training for Generative Modeling and Out-of-Distribution Detection", "track": "main", "status": "Reject", "tldr": "", "abstract": "Generative adversarial training (GAT) is a recently introduced adversarial defense method. Previous works have focused on empirical evaluations of its application to training robust predictive models. In this paper we focus on theoretical understanding of the GAT method and extending its application to generative modeling and out-of-distribution detection. We analyze the optimal solutions of the maximin formulation employed by the GAT objective, and make a comparative analysis of the minimax formulation employed by GANs. We use theoretical analysis and 2D simulations to understand the convergence property of the training algorithm. Based on these results, we develop an unconstrained GAT algorithm, and conduct comprehensive evaluations of the algorithm's application to image generation and adversarial out-of-distribution detection. Our results suggest that generative adversarial training is a promising new direction for the above applications.", "keywords": "Adversarial Training;Generative Modeling;Out-of-Distribution Detection;GANs;Generative adversarial networks", "primary_area": "", "supplementary_material": "/attachment/7fc7331eb1d17022753419a67c9826c7319a6fce.zip", "author": "Xuwang Yin;Shiying li;Gustavo Rohde", "authorids": "~Xuwang_Yin2;sl8jx@virginia.edu;~Gustavo_Rohde1", "gender": "M;;M", "homepage": "https://xuwangyin.github.io/;;https://www.imagedatascience.com/", "dblp": "125/2311;;", "google_scholar": "c425B6UAAAAJ;;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Xuwang_Yin2;sl8jx@virginia.edu;~Gustavo_Rohde1", "aff": "University of Virginia;;University of Virginia, Charlottesville", "aff_domain": "virginia.edu;;virginia.edu", "position": "PhD student;;Full Professor", "bibtex": "@misc{\nyin2021analyzing,\ntitle={Analyzing and Improving Generative Adversarial Training for Generative Modeling and Out-of-Distribution Detection},\nauthor={Xuwang Yin and Shiying li and Gustavo Rohde},\nyear={2021},\nurl={https://openreview.net/forum?id=PsdsEbzxZWr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=PsdsEbzxZWr", "pdf_size": 0, "rating": "4;5;7", "confidence": "4;5;3", "wc_review": "751;392;315", "wc_reply_reviewers": "506;0;0", "wc_reply_authors": "1630;896;384", "reply_reviewers": "1;0;0", "reply_authors": "2;3;1", "rating_avg": [ 5.333333333333333, 1.247219128924647 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 486.0, 190.00175437786532 ], "wc_reply_reviewers_avg": [ 168.66666666666666, 238.53068752026206 ], "wc_reply_authors_avg": [ 970.0, 511.3615811406511 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.0, 0.816496580927726 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6546536707079772, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14753755992025591727&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Virginia", "aff_unique_dep": "", "aff_unique_url": "https://www.virginia.edu", "aff_unique_abbr": "UVA", "aff_campus_unique_index": "1", "aff_campus_unique": ";Charlottesville", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Prediction and generalisation over directed actions by grid cells", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3259", "id": "Ptaz_zIFbX", "poster": "", "openreview": "https://openreview.net/forum?id=Ptaz_zIFbX", "slides": "https://iclr.cc/virtual/2021/poster/3259", "video": "https://iclr.cc/virtual/2021/poster/3259", "author_site": "Changmin Yu, Timothy Behrens, Neil Burgess", "tldr": "", "abstract": "Knowing how the effects of directed actions generalise to new situations (e.g. moving North, South, East and West, or turning left, right, etc.) is key to rapid generalisation across new situations. Markovian tasks can be characterised by a state space and a transition matrix and recent work has proposed that neural grid codes provide an efficient representation of the state space, as eigenvectors of a transition matrix reflecting diffusion across states, that allows efficient prediction of future state distributions. Here we extend the eigenbasis prediction model, utilising tools from Fourier analysis, to prediction over arbitrary translation-invariant directed transition structures (i.e. displacement and diffusion), showing that a single set of eigenvectors can support predictions over arbitrary directed actions via action-specific eigenvalues. We show how to define a \"sense of direction\" to combine actions to reach a target state (ignoring task-specific deviations from translation-invariance), and demonstrate that adding the Fourier representations to a deep Q network aids policy learning in continuous control tasks. We show the equivalence between the generalised prediction framework and traditional models of grid cell firing driven by self-motion to perform path integration, either using oscillatory interference (via Fourier components as velocity-controlled oscillators) or continuous attractor networks (via analysis of the update dynamics). We thus provide a unifying framework for the role of the grid system in predictive planning, sense of direction and path integration: supporting generalisable inference over directed actions across different tasks.", "keywords": "Computational neuroscience;grid cells;normative models", "primary_area": "", "supplementary_material": "/attachment/f0250064b46c0a1a293cb7de8d547b43a99b8770.zip", "author": "Changmin Yu;Timothy Behrens;Neil Burgess", "authorids": "~Changmin_Yu1;behrens@fmrib.ox.ac.uk;n.burgess@ucl.ac.uk", "gender": "M;;", "homepage": "https://changmin-yu.github.io;;", "dblp": "266/9733;;", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "~Changmin_Yu1;behrens@fmrib.ox.ac.uk;n.burgess@ucl.ac.uk", "aff": "University College London;;", "aff_domain": "ucl.ac.uk;;", "position": "PhD student;;", "bibtex": "@inproceedings{\nyu2021prediction,\ntitle={Prediction and generalisation over directed actions by grid cells},\nauthor={Changmin Yu and Timothy Behrens and Neil Burgess},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Ptaz_zIFbX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "4;5;5;7;7", "confidence": "4;1;4;4;4", "wc_review": "666;136;695;363;193", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "0;0;0;0;0", "reply_reviewers": "0;0;0;0;0", "reply_authors": "0;0;0;0;0", "rating_avg": [ 5.6, 1.2 ], "confidence_avg": [ 3.4, 1.2000000000000002 ], "wc_review_avg": [ 410.6, 232.86614180683287 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.24999999999999997, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12624097588185835640&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=Ptaz_zIFbX", "email": "ucl.ac.uk;;", "author_num": 3, "aff_unique_index": "0", "aff_unique_norm": "University College London", "aff_unique_dep": "", "aff_unique_url": "https://www.ucl.ac.uk", "aff_unique_abbr": "UCL", "aff_country_unique_index": "0", "aff_country_unique": "United Kingdom" }, { "id": "PuG6vCSbrV9", "title": "Density estimation on low-dimensional manifolds: an inflation-deflation approach", "track": "main", "status": "Reject", "tldr": "", "abstract": "Normalizing Flows (NFs) are universal density estimators based on Neuronal Networks. However, this universality is limited: the density's support needs to be diffeomorphic to a Euclidean space. In this paper, we propose a novel method to overcome this limitation without sacrificing the universality. The proposed method inflates the data manifold by adding noise in the normal space, trains an NF on this inflated manifold and, finally, deflates the learned density. Our main result provides sufficient conditions on the manifold and the specific choice of noise under which the corresponding estimator is exact. Our method has the same computational complexity as NFs, and does not require to compute an inverse flow. We also show that, if the embedding dimension is much larger than the manifold dimension, noise in the normal space can be well approximated by some Gaussian noise. This allows using our method for approximating arbitrary densities on non-flat manifolds provided that the manifold dimension is known. ", "keywords": "Normalizing Flow;Density Estimation;low-dimensional manifolds;noise;normal space", "primary_area": "", "supplementary_material": "", "author": "Christian Horvat", "authorids": "~Christian_Horvat1", "gender": "M", "homepage": "https://physio.unibe.ch/~pfister/group/", "dblp": "293/8018", "google_scholar": "LpRirZAAAAAJ", "orcid": "", "linkedin": "", "or_profile": "~Christian_Horvat1", "aff": "Universit\u00e4t Bern", "aff_domain": "unibe.ch", "position": "PhD student", "bibtex": "@misc{\nhorvat2021density,\ntitle={Density estimation on low-dimensional manifolds: an inflation-deflation approach},\nauthor={Christian Horvat},\nyear={2021},\nurl={https://openreview.net/forum?id=PuG6vCSbrV9}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=PuG6vCSbrV9", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;3;3;2", "wc_review": "246;517;439;312", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "656;645;507;329", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 378.5, 105.85485345509672 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 534.25, 132.24858222302421 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 19, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5735081804534779666&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0", "aff_unique_norm": "University of Bern", "aff_unique_dep": "", "aff_unique_url": "https://www.unibe.ch", "aff_unique_abbr": "UniBE", "aff_country_unique_index": "0", "aff_country_unique": "Switzerland" }, { "id": "PvVbsAmxdlZ", "title": "Causal Inference Q-Network: Toward Resilient Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications. In practice, however, a DRL agent may receive faulty observation by abrupt interferences such as black-out, frozen-screen, and adversarial perturbation. How to design a resilient DRL algorithm against these rare but mission-critical and safety-crucial scenarios is an important yet challenging task. In this paper, we consider a resilient DRL framework with observational interferences. Under this framework, we discuss the importance of the causal relation and propose a causal inference based DRL algorithm called causal inference Q-network (CIQ). We evaluate the performance of CIQ in several benchmark DRL environments with different types of interferences. Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences.", "keywords": "Deep Reinforcement Learning;Causal Inference;Robust Reinforcement Learning;Adversarial Robustness", "primary_area": "", "supplementary_material": "/attachment/d600966faa231b9c60864d090e29079b13a461f0.zip", "author": "Chao-Han Huck Yang;Danny I-Te Hung;Yi Ouyang;Pin-Yu Chen", "authorids": "~Chao-Han_Huck_Yang1;ih2320@columbia.edu;~Yi_Ouyang1;~Pin-Yu_Chen1", "gender": "M;;;M", "homepage": "https://huckiyang.github.io/;;;http://www.pinyuchen.com", "dblp": "230/4012;;;39/8969", "google_scholar": "TT3XJW8AAAAJ;;dw_Sj_YAAAAJ;jxwlCUUAAAAJ", "orcid": "0000-0003-2879-8811;;;0000-0003-1039-8369", "linkedin": ";;;pin-yu-chen-940062a2", "or_profile": "~Chao-Han_Huck_Yang1;ih2320@columbia.edu;~Yi_Ouyang1;~Pin-Yu_Chen1", "aff": "Amazon Alexa AI;;Preferred Networks, Inc.;International Business Machines", "aff_domain": "amazon.com;;preferred.jp;ibm.com", "position": "Research intern;;Researcher;Research Staff Member", "bibtex": "@misc{\nyang2021causal,\ntitle={Causal Inference Q-Network: Toward Resilient Reinforcement Learning},\nauthor={Chao-Han Huck Yang and Danny I-Te Hung and Yi Ouyang and Pin-Yu Chen},\nyear={2021},\nurl={https://openreview.net/forum?id=PvVbsAmxdlZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=PvVbsAmxdlZ", "pdf_size": 0, "rating": "4;4;7;7", "confidence": "4;3;4;3", "wc_review": "844;838;342;347", "wc_reply_reviewers": "152;0;81;0", "wc_reply_authors": "1729;360;1276;550", "reply_reviewers": "2;0;1;0", "reply_authors": "5;1;3;1", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 592.75, 248.26535702751602 ], "wc_reply_reviewers_avg": [ 58.25, 63.42860159265692 ], "wc_reply_authors_avg": [ 978.75, 551.794968715736 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.5, 1.6583123951777 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 25, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2761705995825822271&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;1;2", "aff_unique_norm": "Amazon;Preferred Networks, Inc.;International Business Machines Corporation", "aff_unique_dep": "Amazon Alexa AI;;", "aff_unique_url": "https://www.amazon.com;https://www.preferred-networks.com;https://www.ibm.com", "aff_unique_abbr": "Amazon;PFN;IBM", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;Japan" }, { "id": "PvZqCDCen_E", "title": "FsNet: Feature Selection Network on High-dimensional Biological Data", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Biological data including gene expression data are generally high-dimensional and require efficient, generalizable, and scalable machine-learning methods to discover their complex nonlinear patterns. The recent advances in machine learning can be attributed to deep neural networks (DNNs), which excel in various tasks in terms of computer vision and natural language processing. However, standard DNNs are not appropriate for high-dimensional datasets generated in biology because they have many parameters, which in turn require many samples. In this paper, we propose a DNN-based, nonlinear feature selection method, called the feature selection network (FsNet), for high-dimensional and small number of sample data. Specifically, FsNet comprises a selection layer that selects features and a reconstruction layer that stabilizes the training. Because a large number of parameters in the selection and reconstruction layers can easily result in overfitting under a limited number of samples, we use two tiny networks to predict the large, virtual weight matrices of the selection and reconstruction layers. Experimental results on several real-world, high-dimensional biological datasets demonstrate the efficacy of the proposed method.", "keywords": "Feature selection;neural networks;high-dimension;small number of samples;biology", "primary_area": "", "supplementary_material": "/attachment/ad07c3f2e73f32d1bb7adf543ae63a0d0b787f79.zip", "author": "Dinesh Singh;Hector Clemente;Mathis Petrovich;Eiryo Kawakami;Makoto Yamada", "authorids": "~Dinesh_Singh1;hector.climente@riken.jp;mathis.petrovich@gmail.com;eiryo.kawakami@riken.jp;~Makoto_Yamada3", "gender": "M;;;;M", "homepage": "https://faculty.iitmandi.ac.in/~dineshsingh/;;;;https://groups.oist.jp/mlds", "dblp": "127/1101-1;;;;56/4937", "google_scholar": "TsHUNA0AAAAJ;;;;1cKNu1gAAAAJ", "orcid": "0000-0001-8889-9847;;;;", "linkedin": ";;;;", "or_profile": "~Dinesh_Singh1;hector.climente@riken.jp;mathis.petrovich@gmail.com;eiryo.kawakami@riken.jp;~Makoto_Yamada3", "aff": "RIKEN;;;;Kyoto University", "aff_domain": "riken.jp;;;;kyoto-u.ac.jp", "position": "Postdoc;;;;Associate Professor", "bibtex": "", "github": "", "project": "", "reviewers": "", "site": "https://openreview.net/forum?id=PvZqCDCen_E", "pdf_size": 0, "rating": "", "confidence": "", "wc_review": "", "wc_reply_reviewers": "", "wc_reply_authors": "", "reply_reviewers": "", "reply_authors": "", "rating_avg": [ 0, 0 ], "confidence_avg": [ 0, 0 ], "wc_review_avg": [ 0, 0 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 1, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0, "gs_citation": 60, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17524462289169134803&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1", "aff_unique_norm": "RIKEN;Kyoto University", "aff_unique_dep": ";", "aff_unique_url": "https://www.riken.jp;https://www.kyoto-u.ac.jp", "aff_unique_abbr": "RIKEN;Kyoto U", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Japan" }, { "id": "Px7xIKHjmMS", "title": "Beyond GNNs: A Sample Efficient Architecture for Graph Problems", "track": "main", "status": "Reject", "tldr": "", "abstract": "Despite their popularity in learning problems over graph structured data, existing Graph Neural Networks (GNNs) have inherent limitations for fundamental graph problems such as shortest paths, $k$-connectivity, minimum spanning tree and minimum cuts. In all these instances, it is known that one needs GNNs of high depth, scaling at a polynomial rate with the number of nodes $n$, to provably encode the solution space. This in turn affects their statistical efficiency thus requiring a significant amount of training data in order to obtain networks with good performance. In this work we propose a new hybrid architecture to overcome this limitation. Our proposed architecture that we call as GNNplus networks involve a combination of multiple parallel low depth GNNs along with simple pooling layers involving low depth fully connected networks. We provably demonstrate that for many graph problems, the solution space can be encoded by GNNplus networks using depth that scales only poly-logarithmically in the number of nodes. This significantly improves the amount of training data needed that we establish via improved generalization bounds. Finally, we empirically demonstrate the effectiveness of our proposed architecture for a variety of graph problems.\n", "keywords": "Graph Neural Networks;Deep Learning Theory;Graph Connectivity;Minimum Spanning Trees", "primary_area": "", "supplementary_material": "/attachment/5f9aa26cd137da9ae689227fbb7a4cf8e0789068.zip", "author": "Pranjal Awasthi;Abhimanyu Das;Sreenivas Gollapudi", "authorids": "~Pranjal_Awasthi3;abhidas@google.com;~Sreenivas_Gollapudi2", "gender": ";;M", "homepage": "https://www.cs.rutgers.edu/~pa336/;;https://www.sreenivasgollapudi.com", "dblp": "57/679;;https://dblp.uni-trier.de/pers/g/Gollapudi:Sreenivas.html", "google_scholar": ";;Ysd-WJgAAAAJ", "orcid": ";;", "linkedin": ";;", "or_profile": "~Pranjal_Awasthi3;abhidas@google.com;~Sreenivas_Gollapudi2", "aff": "Rutgers University;;Google", "aff_domain": "rutgers.edu;;google.com", "position": "Assistant Professor;;Researcher", "bibtex": "@misc{\nawasthi2021beyond,\ntitle={Beyond {\\{}GNN{\\}}s: A Sample Efficient Architecture for Graph Problems},\nauthor={Pranjal Awasthi and Abhimanyu Das and Sreenivas Gollapudi},\nyear={2021},\nurl={https://openreview.net/forum?id=Px7xIKHjmMS}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=Px7xIKHjmMS", "pdf_size": 0, "rating": "4;5;5;8", "confidence": "3;3;3;3", "wc_review": "908;293;319;162", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 420.5, 287.6790746648077 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 5, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15069969710016177970&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1", "aff_unique_norm": "Rutgers University;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.rutgers.edu;https://www.google.com", "aff_unique_abbr": "Rutgers;Google", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Score-Based Generative Modeling through Stochastic Differential Equations", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3177", "id": "PxTIG12RRHS", "poster": "", "openreview": "https://openreview.net/forum?id=PxTIG12RRHS", "slides": "https://iclr.cc/virtual/2021/poster/3177", "video": "https://iclr.cc/virtual/2021/poster/3177", "author_site": "Yang Song, Jascha Sohl-Dickstein, Durk Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole", "tldr": "", "abstract": "Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. \nCrucially, the reverse-time SDE depends only on the time-dependent gradient field (a.k.a., score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of $1024\\times 1024$ images for the first time from a score-based generative model.", "keywords": "generative models;score-based generative models;stochastic differential equations;score matching;diffusion", "primary_area": "", "supplementary_material": "", "author": "Yang Song;Jascha Sohl-Dickstein;Diederik P Kingma;Abhishek Kumar;Stefano Ermon;Ben Poole", "authorids": "~Yang_Song1;~Jascha_Sohl-Dickstein2;~Diederik_P_Kingma1;~Abhishek_Kumar1;~Stefano_Ermon1;~Ben_Poole1", "gender": "M;M;;M;M;M", "homepage": "https://yang-song.net;http://www.dpkingma.com;http://inductivebias.ml;http://cs.stanford.edu/~ermon/;https://cs.stanford.edu/~poole;http://sohldickstein.com", "dblp": ";http://dblp.uni-trier.de/pers/hd/k/Kingma:Diederik_P=;67/6188-1;47/8135;16/10397;51/7117", "google_scholar": "o_J2CroAAAAJ;https://scholar.google.nl/citations?user=yyIoQu4AAAAJ;6vghMS0AAAAJ;;i5FMLA4AAAAJ;-3zYIjQAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Yang_Song1;~Diederik_P_Kingma1;~Abhishek_Kumar1;~Stefano_Ermon1;~Ben_Poole1;~Jascha_Sohl-Dickstein1", "aff": "Stanford University;Google;Google DeepMind;Stanford University;Google;Google", "aff_domain": "stanford.edu;google.com;google.com;stanford.edu;google.com;google.com", "position": "PhD student;Research Scientist;Research Scientist;Assistant Professor;Research Scientist;Research Scientist", "bibtex": "@inproceedings{\nsong2021scorebased,\ntitle={Score-Based Generative Modeling through Stochastic Differential Equations},\nauthor={Yang Song and Jascha Sohl-Dickstein and Diederik P Kingma and Abhishek Kumar and Stefano Ermon and Ben Poole},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=PxTIG12RRHS}\n}", "github": "[![github](/images/github_icon.svg) yang-song/score_sde](https://github.com/yang-song/score_sde) + [![Papers with Code](/images/pwc_icon.svg) 9 community implementations](https://paperswithcode.com/paper/?openreview=PxTIG12RRHS)", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "7;8;8;9", "confidence": "3;4;3;4", "wc_review": "534;442;464;348", "wc_reply_reviewers": "0;0;0;26", "wc_reply_authors": "977;294;652;242", "reply_reviewers": "0;0;0;1", "reply_authors": "2;1;1;1", "rating_avg": [ 8.0, 0.7071067811865476 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 447.0, 66.49060083951716 ], "wc_reply_reviewers_avg": [ 6.5, 11.258330249197702 ], "wc_reply_authors_avg": [ 541.25, 296.99610687684105 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.7071067811865476, "gs_citation": 7224, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14592788616550656262&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=PxTIG12RRHS", "email": "stanford.edu;google.com;google.com;stanford.edu;google.com;google.com", "author_num": 6, "aff_unique_index": "0;1;1;0;1;1", "aff_unique_norm": "Stanford University;Google", "aff_unique_dep": ";Google", "aff_unique_url": "https://www.stanford.edu;https://www.google.com", "aff_unique_abbr": "Stanford;Google", "aff_campus_unique_index": "0;1;0;1;1", "aff_campus_unique": "Stanford;Mountain View;", "aff_country_unique_index": "0;0;1;0;0;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "Py4VjN6V2JX", "title": "Contrastive Self-Supervised Learning of Global-Local Audio-Visual Representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Contrastive self-supervised learning has delivered impressive results in many audio-visual recognition tasks. However, existing approaches optimize for learning either global representations useful for high-level understanding tasks such as classification, or local representations useful for tasks such as audio-visual source localization and separation. While they produce satisfactory results in their intended downstream scenarios, they often fail to generalize to tasks that they were not originally designed for. In this work, we propose a versatile self-supervised approach to learn audio-visual representations that can generalize to both the tasks which require global semantic information (e.g., classification) and the tasks that require fine-grained spatio-temporal information (e.g. localization). We achieve this by optimizing two cross-modal contrastive objectives that together encourage our model to learn discriminative global-local visual information given audio signals. To show that our approach learns generalizable video representations, we evaluate it on various downstream scenarios including action/sound classification, lip reading, deepfake detection, and sound source localization. ", "keywords": "Contrastive learning;self-supervised learning;video representation learning;audio-visual representation learning;multimodal representation learning", "primary_area": "", "supplementary_material": "", "author": "Shuang Ma;Zhaoyang Zeng;Daniel McDuff;Yale Song", "authorids": "~Shuang_Ma3;~Zhaoyang_Zeng1;~Daniel_McDuff1;~Yale_Song1", "gender": "M;M;M;F", "homepage": ";http://alumni.media.mit.edu/~djmcduff/;https://people.csail.mit.edu/yalesong;https://www.shuangma.me/", "dblp": ";63/9606;31/9606.html;98/3906", "google_scholar": ";m7Jr-b4AAAAJ;dNHNpxoAAAAJ;IHPRZuMAAAAJ", "orcid": ";;;", "linkedin": "%E5%85%86%E9%98%B3-%E6%9B%BE-1a505291/;;;", "or_profile": "~Zhaoyang_Zeng1;~Daniel_McDuff1;~Yale_Song1;~shuang_ma1", "aff": "SUN YAT-SEN UNIVERSITY;Microsoft;Microsoft Research;Microsoft", "aff_domain": "sysu.edu.cn;microsoft.com;microsoft.com;microsoft.com", "position": "PhD student;Principal Researcer;Researcher;Senior Research Scientist", "bibtex": "@misc{\nma2021contrastive,\ntitle={Contrastive Self-Supervised Learning of Global-Local Audio-Visual Representations},\nauthor={Shuang Ma and Zhaoyang Zeng and Daniel McDuff and Yale Song},\nyear={2021},\nurl={https://openreview.net/forum?id=Py4VjN6V2JX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=Py4VjN6V2JX", "pdf_size": 0, "rating": "5;5;6;7", "confidence": "4;4;4;4", "wc_review": "460;523;499;424", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "315;457;348;235", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 476.5, 37.73923687622737 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 338.75, 79.68178901104066 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17992265951738354364&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;1;1", "aff_unique_norm": "Sun Yat-sen University;Microsoft", "aff_unique_dep": ";Microsoft Corporation", "aff_unique_url": "http://www.sysu.edu.cn;https://www.microsoft.com", "aff_unique_abbr": "SYSU;Microsoft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;1;1", "aff_country_unique": "China;United States" }, { "title": "Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3075", "id": "Pz_dcqfcKW8", "poster": "", "openreview": "https://openreview.net/forum?id=Pz_dcqfcKW8", "slides": "https://iclr.cc/virtual/2021/poster/3075", "video": "https://iclr.cc/virtual/2021/poster/3075", "author_site": "Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara Sainath, Yonghui Wu, Ruoming Pang", "tldr": "", "abstract": "Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses. In this work, we propose a unified framework, Dual-mode ASR, to train a single end-to-end ASR model with shared weights for both streaming and full-context speech recognition. We show that the latency and accuracy of streaming ASR significantly benefit from weight sharing and joint training of full-context ASR, especially with inplace knowledge distillation during the training. The Dual-mode ASR framework can be applied to recent state-of-the-art convolution-based and transformer-based ASR networks. We present extensive experiments with two state-of-the-art ASR networks, ContextNet and Conformer, on two datasets, a widely used public dataset LibriSpeech and a large-scale dataset MultiDomain. Experiments and ablation studies demonstrate that Dual-mode ASR not only simplifies the workflow of training and deploying streaming and full-context ASR models, but also significantly improves both emission latency and recognition accuracy of streaming ASR. With Dual-mode ASR, we achieve new state-of-the-art streaming ASR results on both LibriSpeech and MultiDomain in terms of accuracy and latency.", "keywords": "Speech Recognition;Streaming ASR;Low-latency ASR;Dual-mode ASR", "primary_area": "", "supplementary_material": "", "author": "Jiahui Yu;Wei Han;Anmol Gulati;Chung-Cheng Chiu;Bo Li;Tara N Sainath;Yonghui Wu;Ruoming Pang", "authorids": "~Jiahui_Yu1;~Wei_Han3;~Anmol_Gulati1;~Chung-Cheng_Chiu1;~Bo_Li1;~Tara_N_Sainath1;~Yonghui_Wu1;~Ruoming_Pang1", "gender": "M;;M;M;;;M;", "homepage": "http://jiahuiyu.com/;;;;;https://sites.google.com/site/tsainath/;;", "dblp": "185/1060;82/1911-2;205/9256;99/2064;50/3402-28;28/7825;26/2189;32/2940", "google_scholar": "-CLCMk4AAAAJ;https://scholar.google.com/citations?hl=en;S2Pk9ooAAAAJ;;iRhp1PAAAAAJ;RtQA6Z8AAAAJ;55FnA9wAAAAJ;", "orcid": ";;;;0000-0002-6711-3603;;;", "linkedin": "jiahuiyuu/;;anmol01gulati/;;;;;", "or_profile": "~Jiahui_Yu1;~Wei_Han3;~Anmol_Gulati1;~Chung-Cheng_Chiu1;~Bo_Li1;~Tara_N_Sainath1;~Yonghui_Wu1;~Ruoming_Pang1", "aff": "Google Brain;Google;Google;Google;Google;Google;;", "aff_domain": "google.com;google.com;google.com;google.com;google.com;google.com;;", "position": "Research Scientist;Software Engineer;Researcher;Software Engineer;Research Scientist;Research Scientist;;", "bibtex": "@inproceedings{\nyu2021dualmode,\ntitle={Dual-mode {\\{}ASR{\\}}: Unify and Improve Streaming {\\{}ASR{\\}} with Full-context Modeling},\nauthor={Jiahui Yu and Wei Han and Anmol Gulati and Chung-Cheng Chiu and Bo Li and Tara N Sainath and Yonghui Wu and Ruoming Pang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Pz_dcqfcKW8}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "5;4;5;5", "wc_review": "302;280;872;632", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "193;426;445;783", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 521.5, 245.7452949702191 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 461.75, 210.34896600649122 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 91, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2453990540749768735&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=Pz_dcqfcKW8", "email": "google.com;google.com;google.com;google.com;google.com;google.com;;", "author_num": 8, "aff_unique_index": "0;0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google Brain", "aff_unique_url": "https://brain.google.com", "aff_unique_abbr": "Google Brain", "aff_campus_unique_index": "0;0;0;0;0;0", "aff_campus_unique": "Mountain View", "aff_country_unique_index": "0;0;0;0;0;0", "aff_country_unique": "United States" }, { "title": "IsarStep: a Benchmark for High-level Mathematical Reasoning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3049", "id": "Pzj6fzU6wkj", "poster": "", "openreview": "https://openreview.net/forum?id=Pzj6fzU6wkj", "slides": "https://iclr.cc/virtual/2021/poster/3049", "video": "https://iclr.cc/virtual/2021/poster/3049", "author_site": "Wenda Li, Lei Yu, Yuhuai Wu, Lawrence Paulson", "tldr": "", "abstract": "A well-defined benchmark is essential for measuring and accelerating research progress of machine learning models. In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models. We build a non-synthetic dataset from the largest repository of proofs written by human experts in a theorem prover. The dataset has a broad coverage of undergraduate and research-level mathematical and computer science theorems. In our defined task, a model is required to fill in a missing intermediate proposition given surrounding proofs. This task provides a starting point for the long-term goal of having machines generate human-readable proofs automatically. Our experiments and analysis reveal that while the task is challenging, neural models can capture non-trivial mathematical reasoning. We further design a hierarchical transformer that outperforms the transformer baseline. ", "keywords": "mathematical reasoning;dataset;benchmark;reasoning;transformer", "primary_area": "", "supplementary_material": "/attachment/c4ffbcc06fe39373cc0175e85faf7976f01086c8.zip", "author": "Wenda Li;Lei Yu;Yuhuai Wu;Lawrence C. Paulson", "authorids": "~Wenda_Li1;~Lei_Yu4;~Yuhuai_Wu1;lp15@cam.ac.uk", "gender": "M;F;M;", "homepage": "https://wenda302.github.io;;http://www.cs.toronto.edu/~ywu/;", "dblp": "132/9868.html;https://dblp.uni-trier.de/pid/01/2775-0008;;", "google_scholar": "ufYxQkEAAAAJ;https://scholar.google.co.uk/citations?user=gX5JBc4AAAAJ;https://scholar.google.ca/citations?user=bOQGfFIAAAAJ;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Wenda_Li1;~Lei_Yu4;~Yuhuai_Wu1;lp15@cam.ac.uk", "aff": "University of Cambridge;Google DeepMind;Department of Computer Science, University of Toronto;", "aff_domain": "cam.ac.uk;deepmind.com;cs.toronto.edu;", "position": "Postdoc;Research Scientist;PhD student;", "bibtex": "@inproceedings{\nli2021isarstep,\ntitle={IsarStep: a Benchmark for High-level Mathematical Reasoning},\nauthor={Wenda Li and Lei Yu and Yuhuai Wu and Lawrence C. Paulson},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Pzj6fzU6wkj}\n}", "github": "[![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=Pzj6fzU6wkj)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7;9", "confidence": "4;4;5;4", "wc_review": "627;752;701;415", "wc_reply_reviewers": "58;0;13;0", "wc_reply_authors": "893;428;463;277", "reply_reviewers": "1;0;1;0", "reply_authors": "3;2;3;1", "rating_avg": [ 7.0, 1.224744871391589 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 623.75, 128.4550018488965 ], "wc_reply_reviewers_avg": [ 17.75, 23.836683913665507 ], "wc_reply_authors_avg": [ 515.25, 229.02005916513076 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 16, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 82, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13495048360221403300&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=Pzj6fzU6wkj", "email": "cam.ac.uk;deepmind.com;cs.toronto.edu;", "author_num": 4, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Cambridge;Google;University of Toronto", "aff_unique_dep": ";Google DeepMind;Department of Computer Science", "aff_unique_url": "https://www.cam.ac.uk;https://deepmind.com;https://www.utoronto.ca", "aff_unique_abbr": "Cambridge;DeepMind;U of T", "aff_campus_unique_index": "0;2", "aff_campus_unique": "Cambridge;;Toronto", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United Kingdom;Canada" }, { "id": "Q1aiM7sCi1", "title": "Fuzzy c-Means Clustering for Persistence Diagrams", "track": "main", "status": "Reject", "tldr": "", "abstract": "Persistence diagrams concisely represent the topology of a point cloud whilst having strong theoretical guarantees. Most current approaches to integrating topological information into machine learning implicitly map persistence diagrams to a Hilbert space, resulting in deformation of the underlying metric structure whilst also generally requiring prior knowledge about the true topology of the space. In this paper we give an algorithm for Fuzzy c-Means (FCM) clustering directly on the space of persistence diagrams, enabling unsupervised learning that automatically captures the topological structure of data, with no prior knowledge or additional processing of persistence diagrams. We prove the same convergence guarantees as traditional FCM clustering: every convergent subsequence of iterates tends to a local minimum or saddle point. We end by presenting experiments where our fuzzy topological clustering algorithm allows for unsupervised top-$k$ candidate selection in settings where (i) the properties of persistence diagrams make them the natural choice over geometric equivalents, and (ii) the probabilistic membership values let us rank candidates in settings where verifying candidate suitability is expensive: lattice structure classification in materials science and pre-trained model selection in machine learning.", "keywords": "Topological data analysis;fuzzy clustering", "primary_area": "", "supplementary_material": "/attachment/e2ff32cd5858b16425e7099e1bfd3c2f2e26e251.zip", "author": "Thomas Davies;Jack Aspinall;Bryan Wilder;Long Tran-Thanh", "authorids": "~Thomas_Davies1;jack.aspinall@materials.ox.ac.uk;~Bryan_Wilder1;long.tran-thanh@warwick.ac.uk", "gender": ";;;", "homepage": ";;;", "dblp": ";;;", "google_scholar": ";;;", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Thomas_Davies1;jack.aspinall@materials.ox.ac.uk;~Bryan_Wilder1;long.tran-thanh@warwick.ac.uk", "aff": ";;;", "aff_domain": ";;;", "position": ";;;", "bibtex": "@misc{\ndavies2021fuzzy,\ntitle={Fuzzy c-Means Clustering for Persistence Diagrams},\nauthor={Thomas Davies and Jack Aspinall and Bryan Wilder and Long Tran-Thanh},\nyear={2021},\nurl={https://openreview.net/forum?id=Q1aiM7sCi1}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=Q1aiM7sCi1", "pdf_size": 0, "rating": "3;4;6;6", "confidence": "5;4;4;5", "wc_review": "190;748;729;1216", "wc_reply_reviewers": "0;0;0;51", "wc_reply_authors": "475;470;845;1050", "reply_reviewers": "0;0;0;1", "reply_authors": "2;2;2;2", "rating_avg": [ 4.75, 1.299038105676658 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 720.75, 363.2419131928473 ], "wc_reply_reviewers_avg": [ 12.75, 22.083647796503186 ], "wc_reply_authors_avg": [ 710.0, 248.31935083678033 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.19245008972987526, "gs_citation": 6, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=673631740351141025&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6 }, { "title": "Neural Delay Differential Equations", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3120", "id": "Q1jmmQz72M2", "poster": "", "openreview": "https://openreview.net/forum?id=Q1jmmQz72M2", "slides": "https://iclr.cc/virtual/2021/poster/3120", "video": "https://iclr.cc/virtual/2021/poster/3120", "author_site": "Qunxi Zhu, Yao Guo, Wei Lin", "tldr": "", "abstract": " Neural Ordinary Differential Equations (NODEs), a framework of continuous-depth neural networks, have been widely applied, showing exceptional efficacy in coping with some representative datasets. Recently, an augmented framework has been successfully developed for conquering some limitations emergent in application of the original framework. Here we propose a new class of continuous-depth neural networks with delay, named as Neural Delay Differential Equations (NDDEs), and, for computing the corresponding gradients, we use the adjoint sensitivity method to obtain the delayed dynamics of the adjoint. Since the differential equations with delays are usually seen as dynamical systems of infinite dimension possessing more fruitful dynamics, the NDDEs, compared to the NODEs, own a stronger capacity of nonlinear representations. Indeed, we analytically validate that the NDDEs are of universal approximators, and further articulate an extension of the NDDEs, where the initial function of the NDDEs is supposed to satisfy ODEs. More importantly, we use several illustrative examples to demonstrate the outstanding capacities of the NDDEs and the NDDEs with ODEs' initial value. More precisely, (1) we successfully model the delayed dynamics where the trajectories in the lower-dimensional phase space could be mutually intersected, while the traditional NODEs without any argumentation are not directly applicable for such modeling, and (2) we achieve lower loss and higher accuracy not only for the data produced synthetically by complex models but also for the real-world image datasets, i.e., CIFAR10, MNIST and SVHN. Our results on the NDDEs reveal that appropriately articulating the elements of dynamical systems into the network design is truly beneficial to promoting the network performance.", "keywords": "Delay differential equations;neural networks", "primary_area": "", "supplementary_material": "", "author": "Qunxi Zhu;Yao Guo;Wei Lin", "authorids": "~Qunxi_Zhu1;~Yao_Guo3;~Wei_Lin1", "gender": "M;M;M", "homepage": "https://www.researchgate.net/profile/Qunxi_Zhu;https://istbi.fudan.edu.cn/info/1245/2230.htm;https://faculty.fudan.edu.cn/wlin/zh_CN/", "dblp": "219/7742;;99/2649", "google_scholar": "https://scholar.google.co.jp/citations?user=45oFQD4AAAAJ;;https://scholar.google.com/citations?hl=zh-CN", "orcid": "0000-0001-7281-5274;;0000-0002-1863-4306", "linkedin": ";;", "or_profile": "~Qunxi_Zhu1;~Yao_Guo3;~Wei_Lin1", "aff": "Fudan University;Fudan University;Fudan University", "aff_domain": "fudan.edu.cn;fudan.edu.cn;fudan.edu.cn", "position": "Postdoc;Associate Professor;Full Professor", "bibtex": "@inproceedings{\nzhu2021neural,\ntitle={Neural Delay Differential Equations},\nauthor={Qunxi Zhu and Yao Guo and Wei Lin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Q1jmmQz72M2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "5;6;6;7", "confidence": "4;4;3;4", "wc_review": "755;337;325;298", "wc_reply_reviewers": "0;24;0;0", "wc_reply_authors": "1034;543;1216;646", "reply_reviewers": "0;1;0;0", "reply_authors": "2;3;2;1", "rating_avg": [ 6.0, 0.7071067811865476 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 428.75, 188.8893525321107 ], "wc_reply_reviewers_avg": [ 6.0, 10.392304845413264 ], "wc_reply_authors_avg": [ 859.75, 275.36192093316026 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 46, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2192458333216803575&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=Q1jmmQz72M2", "email": "fudan.edu.cn;fudan.edu.cn;fudan.edu.cn", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Fudan University", "aff_unique_dep": "", "aff_unique_url": "https://www.fudan.edu.cn", "aff_unique_abbr": "Fudan", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "China" }, { "id": "Q2iaAc-4I1v", "title": "Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Humans show an innate ability to learn the regularities of the world through interaction. By performing experiments in our environment, we are able to discern the causal factors of variation and infer how they affect the dynamics of our world. Analogously, here we attempt to equip reinforcement learning agents with the ability to perform experiments that facilitate a categorization of the rolled-out trajectories, and to subsequently infer the causal factors of the environment in a hierarchical manner. We introduce a novel intrinsic reward, called causal curiosity, and show that it allows our agents to learn optimal sequences of actions, and to discover causal factors in the dynamics. The learned behavior allows the agent to infer a binary quantized representation for the ground-truth causal factors in every environment. Additionally, we find that these experimental behaviors are semantically meaningful (e.g., to differentiate between heavy and light blocks, our agents learn to lift them), and are learnt in a self-supervised manner with approximately 2.5 times less data than conventional supervised planners. We show that these behaviors can be re-purposed and fine-tuned (e.g., from lifting to pushing or other downstream tasks). Finally, we show that the knowledge of causal factor representations aids zero-shot learning for more complex tasks.", "keywords": "Causal Representation Learning;Unsupervised/Self-Supervised Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Sumedh Anand Sontakke;Arash Mehrjou;Theofanis Karaletsos;Laurent Itti;Bernhard Sch\u00f6lkopf", "authorids": "~Sumedh_Anand_Sontakke1;~Arash_Mehrjou1;~Theofanis_Karaletsos1;~Laurent_Itti2;~Bernhard_Sch\u00f6lkopf1", "gender": "M;M;M;;", "homepage": "https://sumedh7.github.io/;https://distantvantagepoint.com;http://karaletsos.com/;;", "dblp": "276/0127;174/1295;31/11191;;", "google_scholar": "https://scholar.google.com/citations?hl=en;pnypNygAAAAJ;zrxafGsAAAAJ;;", "orcid": ";0000-0002-3832-7784;;;", "linkedin": "sumedh-sontakke-0ab24210a/;arash-mehrjou/;;;", "or_profile": "~Sumedh_Anand_Sontakke1;~Arash_Mehrjou1;~Theofanis_Karaletsos1;~Laurent_Itti2;~Bernhard_Sch\u00f6lkopf1", "aff": "University of Southern California;Max Planck Institute for Intelligent Systems, Max-Planck Institute;Meta Facebook;;", "aff_domain": "usc.edu;tuebingen.mpg.de;facebook.com;;", "position": "PhD student;PhD student;Staff Scientist;;", "bibtex": "@misc{\nsontakke2021causal,\ntitle={Causal Curiosity: {\\{}RL{\\}} Agents Discovering Self-supervised Experiments for Causal Representation Learning},\nauthor={Sumedh Anand Sontakke and Arash Mehrjou and Theofanis Karaletsos and Laurent Itti and Bernhard Sch{\\\"o}lkopf},\nyear={2021},\nurl={https://openreview.net/forum?id=Q2iaAc-4I1v}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=Q2iaAc-4I1v", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "4;4;4;3", "wc_review": "1579;1819;400;478", "wc_reply_reviewers": "0;0;587;0", "wc_reply_authors": "458;1555;2548;940", "reply_reviewers": "0;0;3;0", "reply_authors": "1;2;6;2", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 1069.0, 636.2864920772718 ], "wc_reply_reviewers_avg": [ 146.75, 254.17845601073273 ], "wc_reply_authors_avg": [ 1375.25, 780.7756960228719 ], "reply_reviewers_avg": [ 0.75, 1.299038105676658 ], "reply_authors_avg": [ 2.75, 1.920286436967152 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.6622661785325219, "gs_citation": 81, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10455299841806261684&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "aff_unique_index": "0;1;2", "aff_unique_norm": "University of Southern California;Max Planck Institute for Intelligent Systems;Meta", "aff_unique_dep": ";Intelligent Systems;Meta Platforms, Inc.", "aff_unique_url": "https://www.usc.edu;https://www.mpi-is.mpg.de;https://meta.com", "aff_unique_abbr": "USC;MPI-IS;Meta", "aff_campus_unique_index": "0", "aff_campus_unique": "Los Angeles;", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;Germany" }, { "title": "Contemplating Real-World Object Classification", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2746", "id": "Q4EUywJIkqr", "poster": "", "openreview": "https://openreview.net/forum?id=Q4EUywJIkqr", "slides": "https://iclr.cc/virtual/2021/poster/2746", "video": "https://iclr.cc/virtual/2021/poster/2746", "author_site": "Ali Borji", "tldr": "", "abstract": "Deep object recognition models have been very successful over benchmark\ndatasets such as ImageNet. How accurate and robust are they to distribution\nshifts arising from natural and synthetic variations in datasets? Prior research on\nthis problem has primarily focused on ImageNet variations (e.g., ImageNetV2,\nImageNet-A). To avoid potential inherited biases in these studies, we take a\ndifferent approach. Specifically, we reanalyze the ObjectNet dataset recently\nproposed by Barbu et al. containing objects in daily life situations. They showed\na dramatic performance drop of the state of the art object recognition models on\nthis dataset. Due to the importance and implications of their results regarding\nthe generalization ability of deep models, we take a second look at their analysis.\nWe find that applying deep models to the isolated objects, rather than the entire\nscene as is done in the original paper, results in around 20-30% performance\nimprovement. Relative to the numbers reported in Barbu et al., around 10-15%\nof the performance loss is recovered, without any test time data augmentation.\nDespite this gain, however, we conclude that deep models still suffer drastically\non the ObjectNet dataset. We also investigate the robustness of models against\nsynthetic image perturbations such as geometric transformations (e.g., scale,\nrotation, translation), natural image distortions (e.g., impulse noise, blur) as well\nas adversarial attacks (e.g., FGSM and PGD-5). Our results indicate that limiting\nthe object area as much as possible (i.e., from the entire image to the bounding\nbox to the segmentation mask) leads to consistent improvement in accuracy and\nrobustness. Finally, through a qualitative analysis of ObjectNet data, we find that\ni) a large number of images in this dataset are hard to recognize even for humans,\nand ii) easy (hard) samples for models match with easy (hard) samples for humans.\nOverall, our analysis shows that ObjecNet is still a challenging test platform that\ncan be used to measure the generalization ability of models. The code and data\nare available in [masked due to blind review].", "keywords": "object recognition;deep learning;ObjectNet;Robustness", "primary_area": "", "supplementary_material": "", "author": "ali borji", "authorids": "~ali_borji1", "gender": "M", "homepage": "https://scholar.google.com.tw/citations?user=7jTNT1IAAAAJ", "dblp": "49/6311", "google_scholar": "7jTNT1IAAAAJ", "orcid": "", "linkedin": "ali-borji-5736433a/", "or_profile": "~ali_borji1", "aff": "PrimerAI", "aff_domain": "primer.ai", "position": "ML Engineer", "bibtex": "@inproceedings{\nborji2021contemplating,\ntitle={Contemplating Real-World Object Classification},\nauthor={ali borji},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Q4EUywJIkqr}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "5;6;6;6", "confidence": "4;4;4;4", "wc_review": "751;851;233;318", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "419;871;159;224", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 5.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 538.25, 266.815830677267 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 418.25, 278.35532597742764 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 1, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6078463704696071400&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 8, "pdf": "https://openreview.net/pdf?id=Q4EUywJIkqr", "email": "primer.ai", "author_num": 1, "aff_unique_index": "0", "aff_unique_norm": "PrimerAI", "aff_unique_dep": "", "aff_unique_url": "https://www.primer.ai", "aff_unique_abbr": "PrimerAI", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "Q5ZxoD2LqcI", "title": "On the use of linguistic similarities to improve Neural Machine Translation for African Languages", "track": "main", "status": "Reject", "tldr": "", "abstract": "In recent years, there has been a resurgence in research on empirical methods for machine translation. Most of this research has been focused on high-resource, European languages. Despite the fact that around 30% of all languages spoken worldwide are African, the latter have been heavily under investigated and this, partly due to the lack of public parallel corpora online. Furthermore, despite their large number (more than 2,000) and the similarities between them, there is currently no publicly available study on how to use this multilingualism (and associated similarities) to improve machine translation systems performance on African languages. So as to address these issues: \nWe propose a new dataset for African languages that provides parallel data for vernaculars not present in commonly used dataset like JW300 [1]. To exploit multilingualism, we first use a historical approach based on historical origins of these languages, their morphologies, their geographical and cultural distributions as well as migrations of population to identify similar vernaculars.\nWe also propose a new metric to automatically evaluate similarities between languages. This new metric does not require word level parallelism like traditional methods but only paragraph level parallelism.\nWe then show that performing Masked Language Modelling and Translation Language Modeling in addition to multi-task learning on a cluster of similar languages leads to a strong boost of performance in translating individual pairs inside this cluster.\nIn particular, we record an improvement of 29 BLEU on the pair Bafia-Ewondo using our approaches compared to previous work methods that did not exploit multilingualism in any way.\n\n[1] http://opus.nlpl.eu/JW300.php", "keywords": "Machine Translation;Multilingualism;Linguistic similarity;Dataset;African languages;Multi-task learning", "primary_area": "", "supplementary_material": "/attachment/80eba3ba419c1ce28a1b836ee1b91beb9a6fa19c.zip", "author": "Tikeng Notsawo Pascal;NANDA ASSOBJIO Brice Yvan;James Assiene", "authorids": "~Tikeng_Notsawo_Pascal1;~NANDA_ASSOBJIO_Brice_Yvan1;~James_Assiene1", "gender": "M;M;M", "homepage": "https://tikquuss.github.io/;;", "dblp": ";;", "google_scholar": "vUerGI8AAAAJ;;", "orcid": ";;", "linkedin": "pascal-junior-tikeng-notsawo-3b1b14183/;http://linkedin.com/in/brice-nanda-594b33183;james-assiene-a66a3b14a/", "or_profile": "~Tikeng_Notsawo_Pascal1;~NANDA_ASSOBJIO_Brice_Yvan1;~James_Assiene1", "aff": "National Advanced School of Engineering Yaounde;Ecole Nationale Sup\u00e9rieure Polytechnique de Yaound\u00e9;Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal", "aff_domain": "polytechnique.cm;polytechnique.cm;mila.umontreal.ca", "position": "MS student;Engeneering student;Research Mathematician", "bibtex": "@misc{\npascal2021on,\ntitle={On the use of linguistic similarities to improve Neural Machine Translation for African Languages},\nauthor={Tikeng Notsawo Pascal and NANDA ASSOBJIO Brice Yvan and James Assiene},\nyear={2021},\nurl={https://openreview.net/forum?id=Q5ZxoD2LqcI}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=Q5ZxoD2LqcI", "pdf_size": 0, "rating": "3;4;4;5", "confidence": "4;5;4;4", "wc_review": "539;220;614;1182", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;95;567;478", "reply_reviewers": "0;0;0;0", "reply_authors": "0;1;1;1", "rating_avg": [ 4.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 638.75, 346.7833437464954 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 285.0, 241.91837466385226 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0.75, 0.4330127018922193 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3270338908156228675&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;1;2", "aff_unique_norm": "National Advanced School of Engineering;Ecole Nationale Sup\u00e9rieure Polytechnique de Yaound\u00e9;University of Montreal", "aff_unique_dep": ";;Montreal Institute for Learning Algorithms", "aff_unique_url": ";https://www.enspy.ump.edu.cm;https://www.umontreal.ca", "aff_unique_abbr": ";ENSPY;UM", "aff_campus_unique_index": "0;1;2", "aff_campus_unique": "Yaounde;Yaound\u00e9;Montreal", "aff_country_unique_index": "0;0;1", "aff_country_unique": "Cameroon;Canada" }, { "id": "Q8ZdJahesWe", "title": "Graph Adversarial Networks: Protecting Information against Adversarial Attacks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "We study the problem of protecting information when learning with graph-structured data. While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representational learning in many applications, the neighborhood aggregation paradigm exposes additional vulnerabilities to attackers seeking to extract node-level information about sensitive attributes. To counter this, we propose a minimax game between the desired GNN encoder and the worst-case attacker. The resulting adversarial training creates a strong defense against inference attacks, while only suffering small loss in task performance. We analyze the effectiveness of our framework against a worst-case adversary, and characterize the trade-off between predictive accuracy and adversarial defense. Experiments across multiple datasets from recommender systems, knowledge graphs and quantum chemistry demonstrate that the proposed approach provides a robust defense across various graph structures and tasks, while producing competitive GNN encoders.", "keywords": "graph neural networks;deep learning;adversarial learning;theory", "primary_area": "", "supplementary_material": "/attachment/ffb54f23d18942f9a6556a50e03e8b99a2308751.zip", "author": "Peiyuan Liao;Han Zhao;Keyulu Xu;Tommi S. Jaakkola;Geoff Gordon;Stefanie Jegelka;Ruslan Salakhutdinov", "authorids": "~Peiyuan_Liao1;~Han_Zhao1;~Keyulu_Xu1;~Tommi_S._Jaakkola1;~Geoff_Gordon2;~Stefanie_Jegelka3;~Ruslan_Salakhutdinov1", "gender": ";M;;;;F;", "homepage": "https://www.liaopeiyuan.com;https://hanzhaoml.github.io/;https://people.csail.mit.edu/keyulux/;;;http://people.csail.mit.edu/stefje/;", "dblp": ";03/3520-2;177/6079;;;38/7003;", "google_scholar": "aP5VahUAAAAJ;x942ipYAAAAJ;https://scholar.google.co.jp/citations?user=eV2tuR8AAAAJ;;;gTWUZlsAAAAJ;", "orcid": ";0000-0002-8579-1600;;;;;", "linkedin": ";;;;;;", "or_profile": "~Peiyuan_Liao1;~Han_Zhao1;~Keyulu_Xu1;~Tommi_S._Jaakkola1;~Geoff_Gordon2;~Stefanie_Jegelka3;~Ruslan_Salakhutdinov1", "aff": "OctoML;University of Illinois, Urbana Champaign;Massachusetts Institute of Technology;;;Massachusetts Institute of Technology;", "aff_domain": "octoml.ai;illinois.edu;mit.edu;;;mit.edu;", "position": "Intern;Assistant Professor;PhD student;;;Associate Professor;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer5;AnonReviewer1", "site": "https://openreview.net/forum?id=Q8ZdJahesWe", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;4;4;3", "wc_review": "323;667;800;183", "wc_reply_reviewers": "112;0;0;0", "wc_reply_authors": "210;634;470;311", "reply_reviewers": "1;0;0;0", "reply_authors": "2;2;2;2", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 493.25, 249.76226196124986 ], "wc_reply_reviewers_avg": [ 28.0, 48.49742261192856 ], "wc_reply_authors_avg": [ 406.25, 160.87320317566875 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.3333333333333333, "gs_citation": 14, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8390598066653157263&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;2", "aff_unique_norm": "OctoML;University of Illinois Urbana-Champaign;Massachusetts Institute of Technology", "aff_unique_dep": ";;", "aff_unique_url": "https://www.octoml.ai;https://illinois.edu;https://web.mit.edu", "aff_unique_abbr": "OctoML;UIUC;MIT", "aff_campus_unique_index": "1", "aff_campus_unique": ";Urbana-Champaign", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "Q9U_H8lQ4yV", "title": "Good for Misconceived Reasons: Revisiting Neural Multimodal Machine Translation", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information. Many recent studies report improvements when equipping their models with the multimodal module, despite the controversy whether such improvements indeed come from the multimodal part. We revisit the recent development of neural multimodal machine translation by proposing two \\textit{interpretable} MMT models that achieve new state-of-the-art results on the standard \\dataset\\ dataset. To our surprise, however, while we observe similar gains as in the recent developed multimodal-integrated models, our models learn to \\textit{ignore} the multimodal information. Upon further investigation, we discover that the improvements bought about by the multimodal models over text-only counterpart are in fact results of the regularization effect. We report our empirical findings which express the importance of MMT models' interpretability and set new paradigms for future MMT research.", "keywords": "multimodal machine translation;interpretability", "primary_area": "", "supplementary_material": "", "author": "Zhiyong Wu;Lingpeng Kong;Ben Kao", "authorids": "~Zhiyong_Wu3;~Lingpeng_Kong1;~Ben_Kao1", "gender": ";M;M", "homepage": ";https://ikekonglp.github.io/;https://www.cs.hku.hk/index.php/people/academic-staff/kao", "dblp": ";144/7656;k/BenKao", "google_scholar": ";f1hBi5wAAAAJ;TwSParMAAAAJ", "orcid": ";;0000-0002-0501-9435", "linkedin": ";;", "or_profile": "~Zhiyong_Wu3;~Lingpeng_Kong1;~Ben_Kao1", "aff": ";Department of Computer Science, The University of Hong Kong;the University of Hong Kong, University of Hong Kong", "aff_domain": ";cs.hku.hk;cs.hku.hk", "position": ";Assistant Professor;Full Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=Q9U_H8lQ4yV", "pdf_size": 0, "rating": "4;5;5;6", "confidence": "5;5;4;3", "wc_review": "710;530;737;464", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "865;983;726;559", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;1;1", "rating_avg": [ 5.0, 0.7071067811865476 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 610.25, 116.02235775918363 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 783.25, 158.23143650994263 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.8528028654224418, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:zM8_KWqKa_wJ:scholar.google.com/&scioq=Good+for+Misconceived+Reasons:+Revisiting+Neural+Multimodal+Machine+Translation&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "University of Hong Kong", "aff_unique_dep": "Department of Computer Science", "aff_unique_url": "https://www.hku.hk", "aff_unique_abbr": "HKU", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Hong Kong SAR", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "QB7FkNVAfxa", "title": "On the Explicit Role of Initialization on the Convergence and Generalization Properties of Overparametrized Linear Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon is the \\emph{Neural Tangent Kernel} (NTK), which characterizes the implicit regularization effect of gradient flow/descent on infinitely wide neural networks with random initialization. However, a non-asymptotic analysis that connects generalization performance, initialization, and optimization for finite width networks remains elusive. In this paper, we present a novel analysis of overparametrized single-hidden layer linear networks, which formally connects initialization, optimization, and overparametrization with generalization performance. We exploit the fact that gradient flow preserves a certain matrix that characterizes the \\emph{imbalance} of the network weights, to show that the squared loss converges exponentially at a rate that depends on the level of imbalance of the initialization. Such guarantees on the convergence rate allow us to show that large hidden layer width, together with (properly scaled) random initialization, implicitly constrains the dynamics of the network parameters to be close to a low-dimensional manifold. In turn, minimizing the loss over this manifold leads to solutions with good generalization, which correspond to the min-norm solution in the linear case. Finally, we derive a novel $\\mathcal{O}( h^{-1/2})$ upper-bound on the operator norm distance between the trained network and the min-norm solution, where $h$ is the hidden layer width. ", "keywords": "", "primary_area": "", "supplementary_material": "/attachment/eaba17ffe2c1c428f7160fcc00dc7847449aef4e.zip", "author": "Hancheng Min;Salma Tarmoun;Rene Vidal;Enrique Mallada", "authorids": "~Hancheng_Min1;~Salma_Tarmoun1;~Rene_Vidal1;~Enrique_Mallada1", "gender": "M;F;;M", "homepage": "https://hanchmin.github.io/;;http://www.vision.jhu.edu;http://mallada.ece.jhu.edu", "dblp": "226/6324;;v/ReneVidal;", "google_scholar": "XgQjPZIAAAAJ;;https://scholar.google.com/citations?hl=en;ZvRFA04AAAAJ", "orcid": ";;;0000-0003-1568-1833", "linkedin": ";salma-tarmoun-94aa5158/;rene-vidal-74844928/;emallada/", "or_profile": "~Hancheng_Min1;~Salma_Tarmoun1;~Rene_Vidal1;~Enrique_Mallada1", "aff": "Johns Hopkins University;University of Pennsylvania;Johns Hopkins University;Johns Hopkins University", "aff_domain": "jhu.edu;upenn.edu;jhu.edu;jhu.edu", "position": "PhD student;PhD student;Professor;Assistant Professor", "bibtex": "@misc{\nmin2021on,\ntitle={On the Explicit Role of Initialization on the Convergence and Generalization Properties of Overparametrized Linear Networks},\nauthor={Hancheng Min and Salma Tarmoun and Rene Vidal and Enrique Mallada},\nyear={2021},\nurl={https://openreview.net/forum?id=QB7FkNVAfxa}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=QB7FkNVAfxa", "pdf_size": 0, "rating": "3;5;6;9", "confidence": "5;4;4;4", "wc_review": "1289;455;259;287", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1001;451;514;317", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 5.75, 2.165063509461097 ], "confidence_avg": [ 4.25, 0.4330127018922193 ], "wc_review_avg": [ 572.5, 420.4078377004882 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 570.75, 258.3915391416677 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.7333333333333333, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:do4ZCt7-K4oJ:scholar.google.com/&scioq=On+the+Explicit+Role+of+Initialization+on+the+Convergence+and+Generalization+Properties+of+Overparametrized+Linear+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Johns Hopkins University;University of Pennsylvania", "aff_unique_dep": ";", "aff_unique_url": "https://www.jhu.edu;https://www.upenn.edu", "aff_unique_abbr": "JHU;UPenn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "title": "Reinforcement Learning with Random Delays", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3078", "id": "QFYnKlBJYR", "poster": "", "openreview": "https://openreview.net/forum?id=QFYnKlBJYR", "slides": "https://iclr.cc/virtual/2021/poster/3078", "video": "https://iclr.cc/virtual/2021/poster/3078", "author_site": "Yann Bouteiller, Simon Ramstedt, Giovanni Beltrame, Chris J Pal, Jonathan Binas", "tldr": "", "abstract": "Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.", "keywords": "Reinforcement Learning;Deep Reinforcement Learning", "primary_area": "", "supplementary_material": "", "author": "Yann Bouteiller;Simon Ramstedt;Giovanni Beltrame;Christopher Pal;Jonathan Binas", "authorids": "~Yann_Bouteiller1;~Simon_Ramstedt1;giovanni.beltrame@polymtl.ca;~Christopher_Pal1;~Jonathan_Binas1", "gender": "M;M;;;", "homepage": ";https://simonramstedt.com;;https://scholar.google.ca/citations?user=1ScWJOoAAAAJ&hl=en&oi=ao;", "dblp": ";;;45/1217;116/4760", "google_scholar": "YKbSnwEAAAAJ;;;https://scholar.google.ca/citations?user=1ScWJOoAAAAJ;https://scholar.google.ca/citations?user=oD1W8a4AAAAJ", "orcid": ";;;;", "linkedin": "https://ca.linkedin.com/in/yann-bouteiller-46a18212b/en;;;;", "or_profile": "~Yann_Bouteiller1;~Simon_Ramstedt1;giovanni.beltrame@polymtl.ca;~Christopher_Pal1;~Jonathan_Binas1", "aff": "Polytechnique Montreal;;;Polytechnique Montreal;Montreal Institute for Learning Algorithms, University of Montreal", "aff_domain": "polymtl.ca;;;polymtl.ca;mila.umontreal.ca", "position": "MS student;;;Full Professor;Postdoc", "bibtex": "@inproceedings{\nbouteiller2021reinforcement,\ntitle={Reinforcement Learning with Random Delays},\nauthor={Yann Bouteiller and Simon Ramstedt and Giovanni Beltrame and Christopher Pal and Jonathan Binas},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QFYnKlBJYR}\n}", "github": "[![github](/images/github_icon.svg) rmst/rlrd](https://github.com/rmst/rlrd) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=QFYnKlBJYR)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "3;6;6;8", "confidence": "3;4;3;4", "wc_review": "445;836;655;220", "wc_reply_reviewers": "0;320;159;0", "wc_reply_authors": "553;1641;1435;319", "reply_reviewers": "0;3;2;0", "reply_authors": "1;4;3;1", "rating_avg": [ 5.75, 1.7853571071357126 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 539.0, 230.35950164905287 ], "wc_reply_reviewers_avg": [ 119.75, 132.5902994189243 ], "wc_reply_authors_avg": [ 987.0, 561.9163638834519 ], "reply_reviewers_avg": [ 1.25, 1.299038105676658 ], "reply_authors_avg": [ 2.25, 1.299038105676658 ], "replies_avg": [ 22, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.7001400420140049, "gs_citation": 76, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13694087164023050153&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=QFYnKlBJYR", "email": "polymtl.ca;;;polymtl.ca;mila.umontreal.ca", "author_num": 5, "aff_unique_index": "0;0;1", "aff_unique_norm": "Polytechnique Montreal;University of Montreal", "aff_unique_dep": ";Montreal Institute for Learning Algorithms", "aff_unique_url": "https://www.polymtl.ca;https://www.mila.quebec", "aff_unique_abbr": "PolyMTL;MILA", "aff_campus_unique_index": "0;0;0", "aff_campus_unique": "Montreal", "aff_country_unique_index": "0;0;0", "aff_country_unique": "Canada" }, { "id": "QHUUrieaqai", "title": "LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning", "track": "main", "status": "Reject", "tldr": "", "abstract": "While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks. Here, we replace architecture engineering by encoding inductive bias in the form of datasets. Inspired by Peirce's view that deduction, induction, and abduction form an irreducible set of reasoning primitives, we design three synthetic tasks that are intended to require the model to have these three abilities. We specifically design these synthetic tasks in a way that they are devoid of mathematical knowledge to ensure that only the fundamental reasoning biases can be learned from these tasks. This defines a new pre-training methodology called ``\"LIME\" (Learning Inductive bias for Mathematical rEasoning). Models trained with LIME significantly outperform vanilla transformers on three very different large mathematical reasoning benchmarks. Unlike dominating the computation cost as traditional pre-training approaches, LIME requires only a small fraction of the computation cost of the typical downstream task.", "keywords": "Theorem proving;Pre-training;Inductive bias;Reasoning.", "primary_area": "", "supplementary_material": "", "author": "Yuhuai Wu;Markus Norman Rabe;Wenda Li;Jimmy Ba;Roger Baker Grosse;Christian Szegedy", "authorids": "~Yuhuai_Wu1;~Markus_Norman_Rabe1;~Wenda_Li1;~Jimmy_Ba1;~Roger_Baker_Grosse1;~Christian_Szegedy1", "gender": "M;M;M;M;M;", "homepage": "http://www.cs.toronto.edu/~ywu/;https://people.eecs.berkeley.edu/~rabe/;https://wenda302.github.io;http://jimmylba.github.io;http://www.cs.toronto.edu/~rgrosse/;", "dblp": ";88/1112-2;132/9868.html;https://dblp.org/pers/b/Ba:Jimmy.html;26/7058;78/1537", "google_scholar": "https://scholar.google.ca/citations?user=bOQGfFIAAAAJ;https://scholar.google.com/citations?hl=en;ufYxQkEAAAAJ;https://scholar.google.ca/citations?user=ymzxRhAAAAAJ;xgQd1qgAAAAJ;3QeF7mAAAAAJ", "orcid": ";;;;;", "linkedin": ";;;;;", "or_profile": "~Yuhuai_Wu1;~Markus_Norman_Rabe1;~Wenda_Li1;~Jimmy_Ba1;~Roger_Baker_Grosse1;~Christian_Szegedy1", "aff": "Department of Computer Science, University of Toronto;Google;University of Cambridge;Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto;Google", "aff_domain": "cs.toronto.edu;google.com;cam.ac.uk;cs.toronto.edu;cs.toronto.edu;google.com", "position": "PhD student;Researcher/Software Engineer;Postdoc;Assistant Professor;Assistant Professor;Research Scientist", "bibtex": "@misc{\nwu2021lime,\ntitle={{\\{}LIME{\\}}: Learning Inductive Bias for Primitives of Mathematical Reasoning},\nauthor={Yuhuai Wu and Markus Norman Rabe and Wenda Li and Jimmy Ba and Roger Baker Grosse and Christian Szegedy},\nyear={2021},\nurl={https://openreview.net/forum?id=QHUUrieaqai}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=QHUUrieaqai", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "2;4;4;4", "wc_review": "399;758;539;1465", "wc_reply_reviewers": "28;0;30;69", "wc_reply_authors": "407;1127;541;837", "reply_reviewers": "1;0;1;1", "reply_authors": "3;3;1;2", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 790.25, 410.03986086720886 ], "wc_reply_reviewers_avg": [ 31.75, 24.55987581401828 ], "wc_reply_authors_avg": [ 728.0, 277.98021512330695 ], "reply_reviewers_avg": [ 0.75, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.82915619758885 ], "replies_avg": [ 20, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": 0.5222329678670935, "gs_citation": 65, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6631886312737976055&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;1;2;0;0;1", "aff_unique_norm": "University of Toronto;Google;University of Cambridge", "aff_unique_dep": "Department of Computer Science;Google;", "aff_unique_url": "https://www.utoronto.ca;https://www.google.com;https://www.cam.ac.uk", "aff_unique_abbr": "U of T;Google;Cambridge", "aff_campus_unique_index": "0;1;2;0;0;1", "aff_campus_unique": "Toronto;Mountain View;Cambridge", "aff_country_unique_index": "0;1;2;0;0;1", "aff_country_unique": "Canada;United States;United Kingdom" }, { "title": "Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2817", "id": "QIRlze3I6hX", "poster": "", "openreview": "https://openreview.net/forum?id=QIRlze3I6hX", "slides": "https://iclr.cc/virtual/2021/poster/2817", "video": "https://iclr.cc/virtual/2021/poster/2817", "author_site": "Qiang Zhang, Tete Xiao, Alexei Efros, Lerrel Pinto, Xiaolong Wang", "tldr": "", "abstract": "At the heart of many robotics problems is the challenge of learning correspondences across domains. For instance, imitation learning requires obtaining correspondence between humans and robots; sim-to-real requires correspondence between physics simulators and real hardware; transfer learning requires correspondences between different robot environments. In this paper, we propose to learn correspondence across such domains emphasizing on differing modalities (vision and internal state), physics parameters (mass and friction), and morphologies (number of limbs). Importantly, correspondences are learned using unpaired and randomly collected data from the two domains. We propose dynamics cycles that align dynamic robotic behavior across two domains using a cycle consistency constraint. Once this correspondence is found, we can directly transfer the policy trained on one domain to the other, without needing any additional fine-tuning on the second domain. We perform experiments across a variety of problem domains, both in simulation and on real robots. Our framework is able to align uncalibrated monocular video of a real robot arm to dynamic state-action trajectories of a simulated arm without paired data. Video demonstrations of our results are available at: https://sites.google.com/view/cycledynamics .", "keywords": "self-supervised learning;robotics", "primary_area": "", "supplementary_material": "/attachment/0e5a950bcfa8ac3f77046d8f609539c187ee686f.zip", "author": "Qiang Zhang;Tete Xiao;Alexei A Efros;Lerrel Pinto;Xiaolong Wang", "authorids": "~Qiang_Zhang5;~Tete_Xiao1;~Alexei_A_Efros1;~Lerrel_Pinto1;~Xiaolong_Wang3", "gender": "M;M;M;M;M", "homepage": ";http://tetexiao.com;https://www.lerrelpinto.com/;https://xiaolonw.github.io/;http://www.eecs.berkeley.edu/~efros/", "dblp": ";200/8130;168/8304;91/952-4;40/6158", "google_scholar": "mapNJjcAAAAJ;U4RqBdAAAAAJ;pmVPj94AAAAJ;Y8O9N_0AAAAJ;https://scholar.google.com.tw/citations?user=d97bGd8AAAAJ", "orcid": ";;;;0000-0001-5720-8070", "linkedin": "qiang-zhang-6b48791a7/;;;;alexei-efros-890736a3/", "or_profile": "~Qiang_Zhang5;~Tete_Xiao1;~Lerrel_Pinto1;~Xiaolong_Wang3;~Alyosha_Efros1", "aff": "Shanghai Jiaotong University;Facebook AI Research;New York University;University of California, San Diego;University of California, Berkeley", "aff_domain": "sjtu.edu.cn;facebook.com;cs.nyu.edu;ucsd.edu;berkeley.edu", "position": "Undergrad student;Researcher;Assistant Professor;Assistant Professor;Professor", "bibtex": "@inproceedings{\nzhang2021learning,\ntitle={Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency},\nauthor={Qiang Zhang and Tete Xiao and Alexei A Efros and Lerrel Pinto and Xiaolong Wang},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QIRlze3I6hX}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;7;8;10", "confidence": "2;3;3;4", "wc_review": "198;613;429;520", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "322;234;799;653", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 7.75, 1.479019945774904 ], "confidence_avg": [ 3.0, 0.7071067811865476 ], "wc_review_avg": [ 440.0, 154.1217051553739 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 502.0, 231.96659242227102 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.9561828874675149, "gs_citation": 73, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=2051313103367025946&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=QIRlze3I6hX", "email": "sjtu.edu.cn;facebook.com;cs.nyu.edu;ucsd.edu;berkeley.edu", "author_num": 5, "aff_unique_index": "0;1;2;3;4", "aff_unique_norm": "Shanghai Jiao Tong University;Meta;New York University;University of California, San Diego;University of California, Berkeley", "aff_unique_dep": ";Facebook AI Research;;;", "aff_unique_url": "https://www.sjtu.edu.cn;https://research.facebook.com;https://www.nyu.edu;https://www.ucsd.edu;https://www.berkeley.edu", "aff_unique_abbr": "SJTU;FAIR;NYU;UCSD;UC Berkeley", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";San Diego;Berkeley", "aff_country_unique_index": "0;1;1;1;1", "aff_country_unique": "China;United States" }, { "id": "QJc4HWzF7FW", "title": "Meta-Continual Learning Via Dynamic Programming", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Meta continual learning algorithms seek to train a model when faced with similar tasks observed in a sequential manner. Despite promising methodological advancements, there is a lack of theoretical frameworks that enable analysis of learning challenges such as generalization and catastrophic forgetting. To that end, we develop a new theoretical approach for meta continual learning~(MCL) where we mathematically model the learning dynamics using dynamic programming, and we establish conditions of optimality for the MCL problem. Moreover, using the theoretical framework, we derive a new dynamic-programming-based MCL method that adopts stochastic-gradient-driven alternating optimization to balance generalization and catastrophic forgetting. We show that, on MCL benchmark data sets, our theoretically grounded method achieves accuracy better than or comparable to that of existing state-of-the-art methods.", "keywords": "Meta Continual Learning;Supervised Learning;Dynamic Programming;Catastrophic Forgetting;Generalization", "primary_area": "", "supplementary_material": "/attachment/69c0ec276d9ac244973753f79a7bdd6a0d1464bb.zip", "author": "Krishnan Raghavan;Prasanna Balaprakash", "authorids": "~Krishnan_Raghavan1;~Prasanna_Balaprakash1", "gender": "M;M", "homepage": ";http://pbalapra.github.io/", "dblp": "253/0069;", "google_scholar": "https://scholar.google.com/citations?hl=en;ZycQHdgAAAAJ", "orcid": ";0000-0002-0292-5715", "linkedin": ";prasannaprakash/", "or_profile": "~Krishnan_Raghavan1;~Prasanna_Balaprakash1", "aff": "Argonne National Laboratory;Argonne National Laboratory", "aff_domain": "anl.gov;anl.gov", "position": "Postdoc;Computer Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=QJc4HWzF7FW", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "3;3;3;4", "wc_review": "529;1607;529;771", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "895;763;954;540", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;2;1", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 859.0, 443.0146724432499 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 788.0, 159.00786144087343 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 1.0, "gs_citation": 10, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=6351309519381890317&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 2, "aff_unique_index": "0;0", "aff_unique_norm": "Argonne National Laboratory", "aff_unique_dep": "", "aff_unique_url": "https://www.anl.gov", "aff_unique_abbr": "ANL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "QKbS9KXkE_y", "title": "Data-efficient Hindsight Off-policy Option Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Hierarchical approaches for reinforcement learning aim to improve data efficiency and accelerate learning by incorporating different abstractions. We introduce Hindsight Off-policy Options (HO2), an efficient off-policy option learning algorithm, and isolate the impact of action and temporal abstraction in the option framework by comparing flat policies, mixture policies without temporal abstraction, and finally option policies; all with comparable policy optimization. When aiming for data efficiency, we demonstrate the importance of off-policy optimization, as even flat policies trained off-policy can outperform on-policy option methods. In addition, off-policy training and backpropagation through a dynamic programming inference procedure -- through time and through the policy components for every time-step -- enable us to train all components' parameters independently of the data-generating behavior policy. We continue to illustrate challenges in off-policy option learning and the related importance of trust-region constraints. Experimentally, we demonstrate that HO2 outperforms existing option learning methods and that both action and temporal abstraction provide strong benefits in particular in more demanding simulated robot manipulation tasks from raw pixel inputs. Finally, we develop an intuitive extension to encourage temporal abstraction and investigate differences in its impact between learning from scratch and using pre-trained options. ", "keywords": "Hierarchical Reinforcement Learning;Off-Policy;Abstractions;Data-Efficiency", "primary_area": "", "supplementary_material": "", "author": "Markus Wulfmeier;Dushyant Rao;Roland Hafner;Thomas Lampe;Abbas Abdolmaleki;Tim Hertweck;Michael Neunert;Dhruva Tirumala;Noah Yamamoto Siegel;Nicolas Heess;Martin Riedmiller", "authorids": "~Markus_Wulfmeier1;~Dushyant_Rao1;~Roland_Hafner1;~Thomas_Lampe1;~Abbas_Abdolmaleki3;thertweck@google.com;~Michael_Neunert1;~Dhruva_Tirumala1;~Noah_Yamamoto_Siegel1;~Nicolas_Heess1;~Martin_Riedmiller1", "gender": "M;M;Not Specified;;;;M;;;;M", "homepage": ";;;;;;;;;;https://www.riedmiller.me/", "dblp": "166/1552;;19/765;139/5934;;;153/7715;;259/1484;76/9181;", "google_scholar": ";;;;;;;;l2E0LR4AAAAJ;79k7bGEAAAAJ;1gVfqpcAAAAJ", "orcid": ";;;;;;;;0000-0002-5746-117X;;", "linkedin": ";;;;;;;;noah-y-siegel-8751925b;;", "or_profile": "~Markus_Wulfmeier1;~Dushyant_Rao1;~Roland_Hafner1;~Thomas_Lampe1;~Abbas_Abdolmaleki3;thertweck@google.com;~Michael_Neunert1;~Dhruva_Tirumala1;~Noah_Yamamoto_Siegel1;~Nicolas_Heess1;~Martin_Riedmiller1", "aff": "Google DeepMind;Google DeepMind;Google DeepMind;Google DeepMind;Google;;;;Google DeepMind;Google DeepMind;", "aff_domain": "deepmind.com;google.com;deepmind.com;deepmind.com;google.com;;;;deepmind.com;google.com;", "position": "Research Scientist;Research Scientist;Researcher;Researcher;research scientist;;;;Researcher;Research Scientist;", "bibtex": "@misc{\nwulfmeier2021dataefficient,\ntitle={Data-efficient Hindsight Off-policy Option Learning},\nauthor={Markus Wulfmeier and Dushyant Rao and Roland Hafner and Thomas Lampe and Abbas Abdolmaleki and Tim Hertweck and Michael Neunert and Dhruva Tirumala and Noah Yamamoto Siegel and Nicolas Heess and Martin Riedmiller},\nyear={2021},\nurl={https://openreview.net/forum?id=QKbS9KXkE_y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=QKbS9KXkE_y", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "3;4;3;3", "wc_review": "897;672;718;251", "wc_reply_reviewers": "2654;175;0;0", "wc_reply_authors": "1824;1019;1066;285", "reply_reviewers": "4;1;0;0", "reply_authors": "4;3;2;1", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 3.25, 0.4330127018922193 ], "wc_review_avg": [ 634.5, 236.83169129151614 ], "wc_reply_reviewers_avg": [ 707.25, 1126.2249719749602 ], "wc_reply_authors_avg": [ 1048.5, 544.4054095983985 ], "reply_reviewers_avg": [ 1.25, 1.6393596310755 ], "reply_authors_avg": [ 2.5, 1.118033988749895 ], "replies_avg": [ 21, 0 ], "authors#_avg": [ 11, 0 ], "corr_rating_confidence": 0.13245323570650439, "gs_citation": 51, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=1296097676637165629&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;0;0;0;0;0", "aff_unique_norm": "Google", "aff_unique_dep": "Google DeepMind", "aff_unique_url": "https://deepmind.com", "aff_unique_abbr": "DeepMind", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;0;0;0;1;0;0", "aff_country_unique": "United Kingdom;United States" }, { "id": "QM4_h99pjCE", "title": "Decentralized Deterministic Multi-Agent Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Recent work in multi-agent reinforcement learning (MARL) by [Zhang, ICML12018] provided the first decentralized actor-critic algorithm to offer convergence guarantees. In that work, policies are stochastic and are defined on finite action spaces. We extend those results to develop a provably-convergent decentralized actor-critic algorithm for learning deterministic policies on continuous action spaces. Deterministic policies are important in many real-world settings. To handle the lack of exploration inherent in deterministic policies we provide results for the off-policy setting as well as the on-policy setting. We provide the main ingredients needed for this problem: the expression of a local deterministic policy gradient, a decentralized deterministic actor-critic algorithm, and convergence guarantees when the value functions are approximated linearly. This work enables decentralized MARL in high-dimensional action spaces and paves the way for more widespread application of MARL.", "keywords": "multiagent reinforcement learning;MARL;decentralized actor-critic algorithm", "primary_area": "", "supplementary_material": "/attachment/a385acc35917e709f44b29b2c66d17bf00685688.zip", "author": "Antoine Grosnit;Desmond Cai;Laura Wynter", "authorids": "antoine.grosnit@polytechnique.edu;desmond.cai@gmail.com;~Laura_Wynter1", "gender": ";;", "homepage": ";;", "dblp": ";;91/5132", "google_scholar": ";;", "orcid": ";;", "linkedin": ";;", "or_profile": "antoine.grosnit@polytechnique.edu;desmond.cai@gmail.com;~Laura_Wynter1", "aff": ";;IBM Research", "aff_domain": ";;ibm.com", "position": ";;Research scientist", "bibtex": "@misc{\ngrosnit2021decentralized,\ntitle={Decentralized Deterministic Multi-Agent Reinforcement Learning},\nauthor={Antoine Grosnit and Desmond Cai and Laura Wynter},\nyear={2021},\nurl={https://openreview.net/forum?id=QM4_h99pjCE}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer4;AnonReviewer5;AnonReviewer2", "site": "https://openreview.net/forum?id=QM4_h99pjCE", "pdf_size": 0, "rating": "4;5;5;5;6", "confidence": "3;5;4;4;3", "wc_review": "405;362;218;285;207", "wc_reply_reviewers": "0;10;0;0;0", "wc_reply_authors": "445;418;161;250;174", "reply_reviewers": "0;1;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.0, 0.6324555320336759 ], "confidence_avg": [ 3.8, 0.7483314773547882 ], "wc_review_avg": [ 295.4, 77.92457892090275 ], "wc_reply_reviewers_avg": [ 2.0, 4.0 ], "wc_reply_authors_avg": [ 289.6, 120.0876346673545 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 9, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5156990061369840557&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "aff_unique_index": "0", "aff_unique_norm": "IBM", "aff_unique_dep": "IBM Research", "aff_unique_url": "https://www.ibm.com/research", "aff_unique_abbr": "IBM", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "title": "Exemplary Natural Images Explain CNN Activations Better than State-of-the-Art Feature Visualization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3153", "id": "QO9-y8also-", "poster": "", "openreview": "https://openreview.net/forum?id=QO9-y8also-", "slides": "https://iclr.cc/virtual/2021/poster/3153", "video": "https://iclr.cc/virtual/2021/poster/3153", "author_site": "Judith Borowski, Roland Zimmermann, Judith Schepers, Robert Geirhos, Thomas S Wallis, Matthias Bethge, Wieland Brendel", "tldr": "", "abstract": "Feature visualizations such as synthetic maximally activating images are a widely used explanation method to better understand the information processing of convolutional neural networks (CNNs). At the same time, there are concerns that these visualizations might not accurately represent CNNs' inner workings. Here, we measure how much extremely activating images help humans to predict CNN activations.\nUsing a well-controlled psychophysical paradigm, we compare the informativeness of synthetic images by Olah et al. (2017) with a simple baseline visualization, namely exemplary natural images that also strongly activate a specific feature map. Given either synthetic or natural reference images, human participants choose which of two query images leads to strong positive activation. The experiment is designed to maximize participants' performance, and is the first to probe intermediate instead of final layer representations. We find that synthetic images indeed provide helpful information about feature map activations ($82\\pm4\\%$ accuracy; chance would be $50\\%$). However, natural images --- originally intended to be a baseline --- outperform these synthetic images by a wide margin ($92\\pm2\\%$). Additionally, participants are faster and more confident for natural images, whereas subjective impressions about the interpretability of the feature visualizations by Olah et al. (2017) are mixed. The higher informativeness of natural images holds across most layers, for both expert and lay participants as well as for hand- and randomly-picked feature visualizations. Even if only a single reference image is given, synthetic images provide less information than natural images ($65\\pm5\\%$ vs. $73\\pm4\\%$). In summary, synthetic images from a popular feature visualization method are significantly less informative for assessing CNN activations than natural images. We argue that visualization methods should improve over this simple baseline.", "keywords": "evaluation of interpretability;feature visualization;activation maximization;human psychophysics;understanding CNNs;explanation method", "primary_area": "", "supplementary_material": "/attachment/a0ec4a854f1998d17a35193514eb2e1ed056ded1.zip", "author": "Judy Borowski;Roland Simon Zimmermann;Judith Schepers;Robert Geirhos;Thomas S. A. Wallis;Matthias Bethge;Wieland Brendel", "authorids": "~Judy_Borowski1;~Roland_Simon_Zimmermann1;judith-schepers@web.de;~Robert_Geirhos1;~Thomas_S._A._Wallis1;~Matthias_Bethge1;~Wieland_Brendel1", "gender": ";M;;M;M;M;M", "homepage": ";https://rzimmermann.com;;https://robertgeirhos.com/;https://www.psychologie.tu-darmstadt.de/perception/;https://bethgelab.org;", "dblp": ";227/2603;;176/0076;151/6665;77/3005;37/11107", "google_scholar": "https://scholar.google.com/citations?hl=en;https://scholar.google.de/citations?user=4jdISHwAAAAJ;;w3kGtMIAAAAJ;Xs2TXzAAAAAJ;https://scholar.google.com/citations?hl=en;v-JL-hsAAAAJ", "orcid": ";;;0000-0001-7698-3187;0000-0001-7431-4852;;", "linkedin": "https://de.linkedin.com/in/judy-borowski;;;rgeirhos/;;;", "or_profile": "~Judy_Borowski1;~Roland_Simon_Zimmermann1;judith-schepers@web.de;~Robert_Geirhos1;~Thomas_S._A._Wallis1;~Matthias_Bethge1;~Wieland_Brendel1", "aff": "University of Tuebingen;Google;;University of T\u00fcbingen & International Max Planck Research School for Intelligent Systems;Amazon;University of Tuebingen;University of Tuebingen", "aff_domain": "uni-tuebingen.de;google.com;;uni-tuebingen.de;amazon.com;uni-tuebingen.de;uni-tuebingen.de", "position": "PhD student;Intern;;PhD student;Scientist;Full Professor;Principal Researcher", "bibtex": "@inproceedings{\nborowski2021exemplary,\ntitle={Exemplary Natural Images Explain {\\{}CNN{\\}} Activations Better than State-of-the-Art Feature Visualization},\nauthor={Judy Borowski and Roland Simon Zimmermann and Judith Schepers and Robert Geirhos and Thomas S. A. Wallis and Matthias Bethge and Wieland Brendel},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QO9-y8also-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;6;7;8", "confidence": "4;4;3;5", "wc_review": "513;376;244;620", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "687;841;514;749", "reply_reviewers": "0;0;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.5, 1.118033988749895 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 438.25, 141.6234002557487 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 697.75, 119.40137143265986 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": 0.3162277660168379, "gs_citation": 47, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4262811630228932097&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=QO9-y8also-", "email": "uni-tuebingen.de;google.com;;uni-tuebingen.de;amazon.com;uni-tuebingen.de;uni-tuebingen.de", "author_num": 7, "aff_unique_index": "0;1;2;3;0;0", "aff_unique_norm": "University of Tuebingen;Google;University of T\u00fcbingen;Amazon", "aff_unique_dep": ";Google;;Amazon.com, Inc.", "aff_unique_url": "https://www.uni-tuebingen.de/;https://www.google.com;https://www.uni-tuebingen.de/;https://www.amazon.com", "aff_unique_abbr": "Uni T\u00fcbingen;Google;Uni T\u00fcbingen;Amazon", "aff_campus_unique_index": "1", "aff_campus_unique": ";Mountain View", "aff_country_unique_index": "0;1;0;1;0;0", "aff_country_unique": "Germany;United States" }, { "id": "QQzomPbSV7q", "title": "Reducing Class Collapse in Metric Learning with Easy Positive Sampling", "track": "main", "status": "Reject", "tldr": "", "abstract": "Metric learning seeks perceptual embeddings where visually similar instances are close and dissimilar instances are apart, but learned representation can be sub-optimal when the distribution of intra-class samples is diverse and distinct sub-clusters are present. We theoretically prove and empirically show that under reasonable noise assumptions, prevalent embedding losses in metric learning, e.g., triplet loss, tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval. To address this problem, we propose a simple modification to the embedding losses such that each sample selects its nearest same-class counterpart in a batch as the positive element in the tuple/triplet. This allows for the presence of multiple sub-clusters within each class. The adaptation can be integrated into a wide range of metric learning losses. Our method demonstrates clear benefits on various fine-grained image retrieval datasets over a variety of existing losses; qualitative retrieval results show that samples with similar visual patterns are indeed closer in the embedding space.\n", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Elad Levi;Tete Xiao;Xiaolong Wang;trevor darrell", "authorids": "~Elad_Levi1;~Tete_Xiao1;~Xiaolong_Wang3;~trevor_darrell1", "gender": "M;M;M;M", "homepage": ";http://tetexiao.com;https://xiaolonw.github.io/;https://people.eecs.berkeley.edu/~trevor/", "dblp": "232/2420;200/8130;91/952-4;d/TrevorDarrell", "google_scholar": "https://scholar.google.com/citations?hl=en;U4RqBdAAAAAJ;Y8O9N_0AAAAJ;https://scholar.google.com.tw/citations?user=bh-uRFMAAAAJ", "orcid": ";;;", "linkedin": ";;;", "or_profile": "~Elad_Levi1;~Tete_Xiao1;~Xiaolong_Wang3;~trevor_darrell1", "aff": ";Facebook AI Research;University of California, San Diego;Electrical Engineering & Computer Science Department", "aff_domain": ";facebook.com;ucsd.edu;eecs.berkeley.edu", "position": ";Researcher;Assistant Professor;Professor", "bibtex": "@misc{\nlevi2021reducing,\ntitle={Reducing Class Collapse in Metric Learning with Easy Positive Sampling},\nauthor={Elad Levi and Tete Xiao and Xiaolong Wang and trevor darrell},\nyear={2021},\nurl={https://openreview.net/forum?id=QQzomPbSV7q}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=QQzomPbSV7q", "pdf_size": 0, "rating": "4;5;6;6", "confidence": "5;5;4;5", "wc_review": "378;326;202;216", "wc_reply_reviewers": "0;0;99;0", "wc_reply_authors": "711;603;286;275", "reply_reviewers": "0;0;1;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 0.82915619758885 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 280.5, 73.99155357201253 ], "wc_reply_reviewers_avg": [ 24.75, 42.868257487329714 ], "wc_reply_authors_avg": [ 468.75, 192.12284481549818 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -0.5222329678670935, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=13556340907051022687&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2", "aff_unique_norm": "Meta;University of California, San Diego;Electrical Engineering & Computer Science Department", "aff_unique_dep": "Facebook AI Research;;Electrical Engineering & Computer Science", "aff_unique_url": "https://research.facebook.com;https://www.ucsd.edu;", "aff_unique_abbr": "FAIR;UCSD;", "aff_campus_unique_index": "1", "aff_campus_unique": ";San Diego", "aff_country_unique_index": "0;0", "aff_country_unique": "United States;" }, { "id": "QSMvGB5j5-", "title": "Higher-order Structure Prediction in Evolving Graph Simplicial Complexes", "track": "main", "status": "Reject", "tldr": "", "abstract": "Dynamic graphs are rife with higher-order interactions, such as co-authorship relationships and protein-protein interactions in biological networks, that naturally arise between more than two nodes at once. In spite of the ubiquitous presence of such higher-order interactions, limited attention has been paid to the higher-order counterpart of the popular pairwise link prediction problem. Existing higher-order structure prediction methods are mostly based on heuristic feature extraction procedures, which work well in practice but lack theoretical guarantees. Such heuristics are primarily focused on predicting links in a static snapshot of the graph. Moreover, these heuristic-based methods fail to effectively utilize and benefit from the knowledge of latent substructures already present within the higher-order structures. In this paper, we overcome these obstacles by capturing higher-order interactions succinctly as simplices, model their neighborhood by face-vectors, and develop a nonparametric kernel estimator for simplices that views the evolving graph from the perspective of a time process (i.e., a sequence of graph snapshots). Our method substantially outperforms several baseline higher-order prediction methods. As a theoretical achievement, we prove the consistency and asymptotic normality in terms of Wasserstein distance of our estimator using Stein's method.", "keywords": "Higher-order;graph simplicial complex;link prediction", "primary_area": "", "supplementary_material": "", "author": "Manohar Kaul;Masaaki Imaizumi", "authorids": "~Manohar_Kaul1;~Masaaki_Imaizumi1", "gender": "M;M", "homepage": "https://manukaul.github.io/;https://sites.google.com/view/mimaizumi/home", "dblp": "29/10735;", "google_scholar": "https://scholar.google.com.tw/citations?user=jNroyK4AAAAJ;https://scholar.google.co.jp/citations?user=6c0Ljd4AAAAJ", "orcid": ";", "linkedin": "manu-k-72b936287/;masaaki-imaizumi-38600b157/", "or_profile": "~Manohar_Kaul1;~Masaaki_Imaizumi1", "aff": "Indian Institute of Technology, Hyderabad, Dhirubhai Ambani Institute Of Information and Communication Technology;The University of Tokyo", "aff_domain": "iith.ac.in;u-tokyo.ac.jp", "position": "Associate Professor;Associate Professor", "bibtex": "@misc{\nkaul2021higherorder,\ntitle={Higher-order Structure Prediction in Evolving Graph Simplicial Complexes},\nauthor={Manohar Kaul and Masaaki Imaizumi},\nyear={2021},\nurl={https://openreview.net/forum?id=QSMvGB5j5-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=QSMvGB5j5-", "pdf_size": 0, "rating": "4;6;6", "confidence": "4;2;2", "wc_review": "443;805;210", "wc_reply_reviewers": "207;232;0", "wc_reply_authors": "1597;1522;649", "reply_reviewers": "1;1;0", "reply_authors": "3;3;1", "rating_avg": [ 5.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 2.6666666666666665, 0.9428090415820634 ], "wc_review_avg": [ 486.0, 244.80332241754127 ], "wc_reply_reviewers_avg": [ 146.33333333333334, 103.975424446784 ], "wc_reply_authors_avg": [ 1256.0, 430.3045433178692 ], "reply_reviewers_avg": [ 0.6666666666666666, 0.4714045207910317 ], "reply_authors_avg": [ 2.3333333333333335, 0.9428090415820634 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4931239736906670790&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Indian Institute of Technology, Hyderabad;University of Tokyo", "aff_unique_dep": ";", "aff_unique_url": "https://www.iith.ac.in;https://www.u-tokyo.ac.jp", "aff_unique_abbr": "IIT Hyderabad;UTokyo", "aff_campus_unique_index": "0", "aff_campus_unique": "Hyderabad;", "aff_country_unique_index": "0;1", "aff_country_unique": "India;Japan" }, { "id": "QTgP9nKmMPM", "title": "Decoupled Greedy Learning of Graph Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Graph Neural Networks (GNNs) become very popular for graph-related applications due to their superior performance. However, they have been shown to be computationally expensive in large scale settings, because their produced node embeddings have to be computed recursively, which scales exponentially with the number of layers. To address this issue, several sampling-based methods have recently been proposed to perform training on a subset of nodes while maintaining the fidelity of the trained model. In this work, we introduce a decoupled greedy learning method for GNNs (DGL-GNN) that, instead of sampling the input graph, decouples the GNN into smaller modules and associates each module with greedy auxiliary objectives. Our approach allows GNN layers to be updated during the training process without waiting for feedback from successor layers, thus making parallel GNN training possible. Our method achieves improved efficiency without significantly compromising model performances, which would be important for time or memory limited applications. Further, we propose a lazy-update scheme during training to further improve its efficiency. We empirically analyse our proposed DGL-GNN model, and demonstrate its effectiveness and superior efficiency through a range of experiments. Compared to the sampling-based acceleration, our model is more stable, and we do not have to trade-off between efficiency and accuracy. Finally, we note that while here we focus on comparing the decoupled approach as an alternative to other methods, it can also be regarded as complementary, for example, to sampling and other scalability-enhancing improvements of GNN training.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "YEWEN WANG;Jian Tang;Yizhou Sun;Guy Wolf", "authorids": "~YEWEN_WANG1;~Jian_Tang1;~Yizhou_Sun1;~Guy_Wolf1", "gender": "F;;F;M", "homepage": ";http://www.jian-tang.com;http://web.cs.ucla.edu/~yzsun/;http://guywolf.org", "dblp": "219/8145.html;181/2667-5;37/3868;120/1308", "google_scholar": ";https://scholar.google.ca/citations?user=1ir6WUEAAAAJ;https://scholar.google.com.tw/citations?user=TQgOjK0AAAAJ;g0k3SjcAAAAJ", "orcid": ";;;0000-0002-6740-059X", "linkedin": ";;;", "or_profile": "~YEWEN_WANG1;~Jian_Tang1;~Yizhou_Sun1;~Guy_Wolf1", "aff": "University of California, Los Angeles;Mila, HEC Montreal;University of California, Los Angeles;University of Montreal", "aff_domain": "ucla.edu;hec.ca;ucla.edu;umontreal.ca", "position": "PhD student;Assistant Professor;Associate Professor;Assistant Professor", "bibtex": "@misc{\nwang2021decoupled,\ntitle={Decoupled Greedy Learning of Graph Neural Networks},\nauthor={YEWEN WANG and Jian Tang and Yizhou Sun and Guy Wolf},\nyear={2021},\nurl={https://openreview.net/forum?id=QTgP9nKmMPM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=QTgP9nKmMPM", "pdf_size": 0, "rating": "4;4;6", "confidence": "4;4;5", "wc_review": "421;553;602", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "719;760;246", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 4.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 4.333333333333333, 0.4714045207910317 ], "wc_review_avg": [ 525.3333333333334, 76.43879178067174 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 575.0, 233.2395049443097 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 7, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.9999999999999997, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8097943978159130757&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "University of California, Los Angeles;HEC Montreal;University of Montreal", "aff_unique_dep": ";HEC Business School;", "aff_unique_url": "https://www.ucla.edu;https://www.hec.ca;https://wwwumontreal.ca", "aff_unique_abbr": "UCLA;HEC;UM", "aff_campus_unique_index": "0;1;0", "aff_campus_unique": "Los Angeles;Montreal;", "aff_country_unique_index": "0;1;0;1", "aff_country_unique": "United States;Canada" }, { "title": "Distributional Sliced-Wasserstein and Applications to Generative Modeling", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3072", "id": "QYjO70ACDK", "poster": "", "openreview": "https://openreview.net/forum?id=QYjO70ACDK", "slides": "https://iclr.cc/virtual/2021/poster/3072", "video": "https://iclr.cc/virtual/2021/poster/3072", "author_site": "Khai Nguyen, Nhat Ho, Tung Pham, Hung Bui", "tldr": "", "abstract": "Sliced-Wasserstein distance (SW) and its variant, Max Sliced-Wasserstein distance (Max-SW), have been used widely in the recent years due to their fast computation and scalability even when the probability measures lie in a very high dimensional space. However, SW requires many unnecessary projection samples to approximate its value while Max-SW only uses the most important projection, which ignores the information of other useful directions. In order to account for these weaknesses, we propose a novel distance, named Distributional Sliced-Wasserstein distance (DSW), that finds an optimal distribution over projections that can balance between exploring distinctive projecting directions and the informativeness of projections themselves. We show that the DSW is a generalization of Max-SW, and it can be computed efficiently by searching for the optimal push-forward measure over a set of probability measures over the unit sphere satisfying certain regularizing constraints that favor distinct directions. Finally, we conduct extensive experiments with large-scale datasets to demonstrate the favorable performances of the proposed distances over the previous sliced-based distances in generative modeling applications.", "keywords": "Deep generative models;Sliced Wasserstein;Optimal Transport", "primary_area": "", "supplementary_material": "/attachment/461a5b60a709d5f81b079c43792796bae1d9aa8e.zip", "author": "Khai Nguyen;Nhat Ho;Tung Pham;Hung Bui", "authorids": "~Khai_Nguyen1;~Nhat_Ho1;v.tungph4@vinai.io;~Hung_Bui1", "gender": "M;M;;M", "homepage": "https://khainb.com;https://nhatptnk8912.github.io/;;https://sites.google.com/site/buihhung/home", "dblp": "120/4308;203/4479;;", "google_scholar": "im5fNaQAAAAJ;https://scholar.google.ca/citations?user=Xs7cKMwAAAAJ;;mDLwSZAAAAAJ", "orcid": ";;;", "linkedin": ";nhat-pham-minh-ho-267b8164/;;", "or_profile": "~Khai_Nguyen1;~Nhat_Ho1;v.tungph4@vinai.io;~Hung_Bui1", "aff": "VinAI Research, Vietnam;University of Texas, Austin;;VinAI Research", "aff_domain": "vinai.io;utexas.edu;;vinai.io", "position": "AI Research Resident;Assistant Professor;;Principal Researcher", "bibtex": "@inproceedings{\nnguyen2021distributional,\ntitle={Distributional Sliced-Wasserstein and Applications to Generative Modeling},\nauthor={Khai Nguyen and Nhat Ho and Tung Pham and Hung Bui},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QYjO70ACDK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "7;7;9", "confidence": "5;4;5", "wc_review": "812;625;92", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "1517;542;208", "reply_reviewers": "0;0;0", "reply_authors": "3;1;1", "rating_avg": [ 7.666666666666667, 0.9428090415820634 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 509.6666666666667, 305.04243784904565 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 755.6666666666666, 555.3439374737865 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.9428090415820634 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.4999999999999999, "gs_citation": 113, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=14136000504171332345&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 11, "pdf": "https://openreview.net/pdf?id=QYjO70ACDK", "email": "vinai.io;utexas.edu;;vinai.io", "author_num": 4, "aff_unique_index": "0;1;0", "aff_unique_norm": "VinAI Research;University of Texas at Austin", "aff_unique_dep": ";", "aff_unique_url": "https://www.vin.ai;https://www.utexas.edu", "aff_unique_abbr": "VinAI;UT Austin", "aff_campus_unique_index": "1", "aff_campus_unique": ";Austin", "aff_country_unique_index": "0;1;0", "aff_country_unique": "Vietnam;United States" }, { "id": "QZaeLBDU03", "title": "Learning Movement Strategies for Moving Target Defense", "track": "main", "status": "Reject", "tldr": "", "abstract": "The field of cybersecurity has mostly been a cat-and-mouse game with the discovery of new attacks leading the way. To take away an attacker's advantage of reconnaissance, researchers have proposed proactive defense methods such as Moving Target Defense (MTD). To find good movement strategies, researchers have modeled MTD as leader-follower games between the defender and a cyber-adversary. We argue that existing models are inadequate in sequential settings when there is incomplete information about rational adversary and yield sub-optimal movement strategies. Further, while there exists an array of work on learning defense policies in sequential settings for cyber-security, they are either unpopular due to scalability issues arising out of incomplete information or tend to ignore the strategic nature of the adversary simplifying the scenario to use single-agent reinforcement learning techniques. To address these concerns, we propose (1) a unifying game-theoretic model, called the Bayesian Stackelberg Markov Games (BSMGs), that can model uncertainty over attacker types and the nuances of an MTD system and (2) a Bayesian Strong Stackelberg Q-learning (BSS-Q) approach that can, via interaction, learn the optimal movement policy for BSMGs within a reasonable time. We situate BSMGs in the landscape of incomplete-information Markov games and characterize the notion of Strong Stackelberg Equilibrium (SSE) in them. We show that our learning approach converges to an SSE of a BSMG and then highlight that the learned movement policy (1) improves the state-of-the-art in MTD for web-application security and (2) converges to an optimal policy in MTD domains with incomplete information about adversaries even when prior information about rewards and transitions is absent.", "keywords": "Multi-agent Reinforcement Learning;Moving Target Defense;Stackelberg Security", "primary_area": "", "supplementary_material": "/attachment/6df89989b308cc924124abc9cbf8ecffda6c7c6d.zip", "author": "Sailik Sengupta;Subbarao Kambhampati", "authorids": "~Sailik_Sengupta1;~Subbarao_Kambhampati1", "gender": "M;M", "homepage": "https://sailik1991.github.io/;http://rakaposhi.eas.asu.edu", "dblp": "139/7992;k/SKambhampati", "google_scholar": "Hlm-ti8AAAAJ;yl3L07sAAAAJ", "orcid": ";", "linkedin": "sailiks/;", "or_profile": "~Sailik_Sengupta1;~Subbarao_Kambhampati1", "aff": "Amazon;Arizona State University", "aff_domain": "amazon.com;asu.edu", "position": "Researcher;Full Professor", "bibtex": "@misc{\nsengupta2021learning,\ntitle={Learning Movement Strategies for Moving Target Defense},\nauthor={Sailik Sengupta and Subbarao Kambhampati},\nyear={2021},\nurl={https://openreview.net/forum?id=QZaeLBDU03}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=QZaeLBDU03", "pdf_size": 0, "rating": "4;4;5;5", "confidence": "2;4;3;4", "wc_review": "245;953;390;302", "wc_reply_reviewers": "0;22;0;0", "wc_reply_authors": "422;896;659;266", "reply_reviewers": "0;1;0;0", "reply_authors": "1;2;1;1", "rating_avg": [ 4.5, 0.5 ], "confidence_avg": [ 3.25, 0.82915619758885 ], "wc_review_avg": [ 472.5, 282.18477988722213 ], "wc_reply_reviewers_avg": [ 5.5, 9.526279441628825 ], "wc_reply_authors_avg": [ 560.75, 238.83820360235504 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:7UgD9fjwIP4J:scholar.google.com/&scioq=Learning+Movement+Strategies+for+Moving+Target+Defense&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "Amazon;Arizona State University", "aff_unique_dep": "Amazon.com, Inc.;", "aff_unique_url": "https://www.amazon.com;https://www.asu.edu", "aff_unique_abbr": "Amazon;ASU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "QcjTNc_afvH", "title": "A StyleMap-Based Generator for Real-Time Image Projection and Local Editing", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Generative adversarial networks (GANs) have been successful in synthesizing and manipulating synthetic but realistic images from latent vectors. However, it is still challenging for GANs to manipulate real images, especially in real-time. State-of-the-art GAN-based methods for editing real images suffer from time-consuming operations in projecting real images to latent vectors. Alternatively, an encoder can be trained to embed real images to the latent space instantly, but it loses details drastically. We propose StyleMapGAN, which adopts a novel representation of latent space, called stylemap, incorporating spatial dimensions into embedding. Because each spatial location in the stylemap contributes to its corresponding region of the generated images, the real-time projection through the encoder becomes accurate as well as editing real images becomes spatially controllable. Experimental results demonstrate that our method significantly outperforms state-of-the-art models in various image manipulation tasks such as local editing and image interpolation. Especially, detailed comparisons show that our local editing method successfully reflects not only the color and texture but also the shape of a reference image while preserving untargeted regions. ", "keywords": "Generative Adversarial Network;Real-time Image Projection;Image Manipulation;Local Editing;Deep Learning", "primary_area": "", "supplementary_material": "/attachment/311ede9b3d444d8814c90cc17bd44e8db72f2edd.zip", "author": "Hyunsu Kim;Yunjey Choi;Junho Kim;Sungjoo Yoo;Youngjung Uh", "authorids": "~Hyunsu_Kim1;~Yunjey_Choi3;~Junho_Kim3;~Sungjoo_Yoo1;~Youngjung_Uh2", "gender": "M;M;M;;M", "homepage": "https://github.com/blandocs;http://bit.ly/jhkim_resume;http://cmalab.snu.ac.kr;https://vilab.yonsei.ac.kr/member/professor;https://yunjey.github.io/", "dblp": "239/8447;;82/6218;57/10511;210/0980", "google_scholar": "VY5PodkAAAAJ;WtjDugkAAAAJ;__waCuYAAAAJ;BWBGrEEAAAAJ;v_4lOaAAAAAJ", "orcid": ";0000-0003-3712-8510;;;", "linkedin": "blandocs/;taki0112/;;youngjung-uh-78b459b5/;", "or_profile": "~Hyunsu_Kim1;~Junho_Kim3;~Sungjoo_Yoo1;~Youngjung_Uh2;~yunjey_choi1", "aff": "Seoul National University;NAVER;Seoul National University;Yonsei University;NAVER", "aff_domain": "snu.ac.kr;navercorp.com;snu.ac.kr;yonsei.ac.kr;navercorp.com", "position": "MS student;Research Scientist;Full Professor;Associate Professor;Research Scientist", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2;AnonReviewer4", "site": "https://openreview.net/forum?id=QcjTNc_afvH", "pdf_size": 0, "rating": "3;5;5;6", "confidence": "5;4;3;5", "wc_review": "801;894;639;235", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 1.0897247358851685 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 642.25, 252.21159271532306 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.20751433915982243, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:DOsSuY7cCZIJ:scholar.google.com/&scioq=A+StyleMap-Based+Generator+for+Real-Time+Image+Projection+and+Local+Editing&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0;2;1", "aff_unique_norm": "Seoul National University;NAVER Corporation;Yonsei University", "aff_unique_dep": ";;", "aff_unique_url": "https://www.snu.ac.kr;https://www.naver.com;https://www.yonsei.ac.kr", "aff_unique_abbr": "SNU;NAVER;Yonsei", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "South Korea" }, { "id": "QcqsxI6rKDs", "title": "Meta Gradient Boosting Neural Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "Meta-optimization is an effective approach that learns a shared set of parameters across tasks for parameter initialization in meta-learning.\nA key challenge for meta-optimization based approaches is to determine whether an initialization condition can be generalized to tasks with diverse distributions to accelerate learning. \nTo address this issue, we design a meta-gradient boosting framework that uses a base learner to learn shared information across tasks and a series of gradient-boosted modules to capture task-specific information to fit diverse distributions.\nWe evaluate the proposed model on both regression and classification tasks with multi-mode distributions. \nThe results demonstrate both the effectiveness of our model in modulating task-specific meta-learned priors and its advantages on multi-mode distributions.", "keywords": "meta learning;deep learning", "primary_area": "", "supplementary_material": "", "author": "Manqing Dong;Lina Yao;Xianzhi Wang;Xiwei Xu;Liming Zhu", "authorids": "~Manqing_Dong1;~Lina_Yao2;xianzhi.wang@uts.edu.au;xiwei.xu@data61.csiro.au;liming.zhu@data61.csiro.au", "gender": "F;F;;;", "homepage": ";https://www.linayao.com/;;;", "dblp": "220/3088;56/6651-1;;;", "google_scholar": "nfjtRPYAAAAJ;https://scholar.google.com.au/citations?user=EU3snBgAAAAJ;;;", "orcid": ";;;;", "linkedin": ";linayao/;;;", "or_profile": "~Manqing_Dong1;~Lina_Yao2;xianzhi.wang@uts.edu.au;xiwei.xu@data61.csiro.au;liming.zhu@data61.csiro.au", "aff": "eBay Inc.;University of New South Wales;;;", "aff_domain": "ebay.com;unsw.edu.au;;;", "position": "Researcher;Associate Professor;;;", "bibtex": "@misc{\ndong2021meta,\ntitle={Meta Gradient Boosting Neural Networks},\nauthor={Manqing Dong and Lina Yao and Xianzhi Wang and Xiwei Xu and Liming Zhu},\nyear={2021},\nurl={https://openreview.net/forum?id=QcqsxI6rKDs}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=QcqsxI6rKDs", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;5;4;3", "wc_review": "371;1185;474;129", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "483;565;262;156", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 539.75, 393.02123034258597 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 366.5, 164.47264210196175 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.8528028654224417, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4183131694779177273&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1", "aff_unique_norm": "eBay Inc.;University of New South Wales", "aff_unique_dep": ";", "aff_unique_url": "https://www.ebayinc.com;https://www.unsw.edu.au", "aff_unique_abbr": "eBay;UNSW", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1", "aff_country_unique": "United States;Australia" }, { "id": "Qe_de8HpWK", "title": "GenQu: A Hybrid System for Learning Classical Data in Quantum States", "track": "main", "status": "Reject", "tldr": "", "abstract": "Deep neural network-powered artificial intelligence has rapidly changed our daily life with various applications. However, as one of the essential steps of deep neural networks, training a heavily-weighted network requires a tremendous amount of computing resources. Especially in the post Moore's Law era, the limit of semiconductor fabrication technology has restricted the development of learning algorithms to cope with the increasing high-intensity training data. Meanwhile, quantum computing has exhibited its significant potential in terms of speeding up the traditionally compute-intensive workloads. For example, Google illustrates quantum supremacy by completing a sampling calculation task in 200 seconds, which is otherwise impracticable on the world's largest supercomputers. To this end, quantum-based learning becomes an area of interest, with the promising of a quantum speedup. In this paper, we propose GenQu, a hybrid and general-purpose quantum framework for learning classical data through quantum states. We evaluate GenQu with real datasets and conduct experiments on both simulations and real quantum computer IBM-Q. Our evaluation demonstrates that, comparing with classical solutions, the proposed models running on GenQu framework achieve similar accuracy with a much smaller number of qubits, while significantly reducing the parameter size by up to 95.8\\% and converging speedup by 66.67% faster. ", "keywords": "Quantum Machine Learning;Qubits;Kernel Methods;Deep Neural Network", "primary_area": "", "supplementary_material": "/attachment/871793eda50016a533517637016bcb751fa46d5a.zip", "author": "Samuel A. Stein;Ray Marie Tischio;Betis Baheri;Yiwen Chen;Ying Mao;Qiang Guan;Ang Li;Bo Fang", "authorids": "sstein17@fordham.edu;rtischio@fordham.edu;bbaheri@kent.edu;ychen638@fordham.edu;~Ying_Mao1;qguan@kent.edu;ang.li@pnnl.gov;bo.fang@pnnl.gov", "gender": ";;;;M;;;", "homepage": ";;;;https://www.linkedin.com/in/ying-mao-03a63824/;;;", "dblp": ";;;;;;;", "google_scholar": ";;;;s_oeuQUAAAAJ;;;", "orcid": ";;;;;;;", "linkedin": ";;;;;;;", "or_profile": "sstein17@fordham.edu;rtischio@fordham.edu;bbaheri@kent.edu;ychen638@fordham.edu;~Ying_Mao1;qguan@kent.edu;ang.li@pnnl.gov;bo.fang@pnnl.gov", "aff": ";;;;Fordham University;;;", "aff_domain": ";;;;fordham.edu;;;", "position": ";;;;Assistant Professor;;;", "bibtex": "@misc{\nstein2021genqu,\ntitle={GenQu: A Hybrid System for Learning Classical Data in Quantum States},\nauthor={Samuel A. Stein and Ray Marie Tischio and Betis Baheri and Yiwen Chen and Ying Mao and Qiang Guan and Ang Li and Bo Fang},\nyear={2021},\nurl={https://openreview.net/forum?id=Qe_de8HpWK}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer2;AnonReviewer1", "site": "https://openreview.net/forum?id=Qe_de8HpWK", "pdf_size": 0, "rating": "2;3;3;4", "confidence": "5;5;4;4", "wc_review": "234;229;401;524", "wc_reply_reviewers": "0;0;0;107", "wc_reply_authors": "231;129;232;159", "reply_reviewers": "0;0;0;2", "reply_authors": "1;1;1;1", "rating_avg": [ 3.0, 0.7071067811865476 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 347.0, 123.42811673196671 ], "wc_reply_reviewers_avg": [ 26.75, 46.332359102467464 ], "wc_reply_authors_avg": [ 187.75, 45.018746095376756 ], "reply_reviewers_avg": [ 0.5, 0.8660254037844386 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 8, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:fL7wTickiH0J:scholar.google.com/&scioq=GenQu:+A+Hybrid+System+for+Learning+Classical+Data+in+Quantum+States&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Fordham University", "aff_unique_dep": "", "aff_unique_url": "https://www.fordham.edu", "aff_unique_abbr": "Fordham", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "QfEssgaXpm", "title": "Reinforcement Learning for Control with Probabilistic Stability Guarantee", "track": "main", "status": "Reject", "tldr": "", "abstract": "Reinforcement learning is promising to control dynamical systems for which the traditional control methods are hardly applicable. However, in control theory, the stability of a closed-loop system can be hardly guaranteed using the policy/controller learned solely from samples. In this paper, we will combine Lyapunov's method in control theory and stochastic analysis to analyze the mean square stability of MDP in a model-free manner. Furthermore, the finite sample bounds on the probability of stability are derived as a function of the number M and length T of the sampled trajectories. And we show that there is a lower bound on T and the probability is much more demanding for M than T. Based on the theoretical results, a REINFORCE like algorithm is proposed to learn the controller and the Lyapunov function simultaneously. ", "keywords": "control;Lyapunov stability;REINFORCE;finite-sample bounds", "primary_area": "", "supplementary_material": "/attachment/a655ad05fafff704acfae2e915dfb096186d968d.zip", "author": "Minghao Han;Zhipeng Zhou;Lixian Zhang;Jun Wang;Wei Pan", "authorids": "~Minghao_Han2;~Zhipeng_Zhou3;lixianzhang@hit.edu.cn;~Jun_Wang2;~Wei_Pan2", "gender": "M;M;;M;M", "homepage": "https://hithmh.github.io/MinghaoHan/;;;http://www0.cs.ucl.ac.uk/staff/jun.wang/;http://panweihit.github.io", "dblp": ";;;w/JunWang12;", "google_scholar": "vSFTX1AAAAAJ;Ot0PPAcAAAAJ;;https://scholar.google.co.uk/citations?user=wIE1tY4AAAAJ;GqryWPsAAAAJ", "orcid": ";;;;0000-0003-1121-9879", "linkedin": ";;;;wei-pan-6b558b17/", "or_profile": "~Minghao_Han2;~Zhipeng_Zhou3;lixianzhang@hit.edu.cn;~Jun_Wang2;~Wei_Pan2", "aff": "Harbin Institue of Technology;Alibaba Group;;University College London;Delft University of Technology", "aff_domain": "hit.edu.cn;alibaba-inc.com;;ucl.ac.uk;tudelft.nl", "position": "PhD student;Researcher;;Professor;Assistant Professor", "bibtex": "@misc{\nhan2021reinforcement,\ntitle={Reinforcement Learning for Control with Probabilistic Stability Guarantee},\nauthor={Minghao Han and Zhipeng Zhou and Lixian Zhang and Jun Wang and Wei Pan},\nyear={2021},\nurl={https://openreview.net/forum?id=QfEssgaXpm}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3;AnonReviewer4", "site": "https://openreview.net/forum?id=QfEssgaXpm", "pdf_size": 0, "rating": "5;5;6;6", "confidence": "3;4;3;4", "wc_review": "292;670;258;616", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 5.5, 0.5 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 459.0, 185.37799222129902 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:wS5HrDcYQMQJ:scholar.google.com/&scioq=Reinforcement+Learning+for+Control+with+Probabilistic+Stability+Guarantee&hl=en&as_sdt=0,14", "gs_version_total": 0, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "Harbin Institute of Technology;Alibaba Group;University College London;Delft University of Technology", "aff_unique_dep": ";;;", "aff_unique_url": "http://www.hit.edu.cn/;https://www.alibaba.com;https://www.ucl.ac.uk;https://www.tudelft.nl", "aff_unique_abbr": "HIT;Alibaba;UCL;TU Delft", "aff_campus_unique_index": "0", "aff_campus_unique": "Harbin;", "aff_country_unique_index": "0;0;1;2", "aff_country_unique": "China;United Kingdom;Netherlands" }, { "title": "Stabilized Medical Image Attacks", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3344", "id": "QfTXQiGYudJ", "poster": "", "openreview": "https://openreview.net/forum?id=QfTXQiGYudJ", "slides": "https://iclr.cc/virtual/2021/poster/3344", "video": "https://iclr.cc/virtual/2021/poster/3344", "author_site": "Gege Qi, Lijun GONG, Yibing Song, Kai Ma, Yefeng Zheng", "tldr": "", "abstract": "Convolutional Neural Networks (CNNs) have advanced existing medical systems for automatic disease diagnosis. However, a threat to these systems arises that adversarial attacks make CNNs vulnerable. Inaccurate diagnosis results make a negative influence on human healthcare. There is a need to investigate potential adversarial attacks to robustify deep medical diagnosis systems. On the other side, there are several modalities of medical images (e.g., CT, fundus, and endoscopic image) of which each type is significantly different from others. It is more challenging to generate adversarial perturbations for different types of medical images. In this paper, we propose an image-based medical adversarial attack method to consistently produce adversarial perturbations on medical images. The objective function of our method consists of a loss deviation term and a loss stabilization term. The loss deviation term increases the divergence between the CNN prediction of an adversarial example and its ground truth label. Meanwhile, the loss stabilization term ensures similar CNN predictions of this example and its smoothed input. From the perspective of the whole iterations for perturbation generation, the proposed loss stabilization term exhaustively searches the perturbation space to smooth the single spot for local optimum escape. We further analyze the KL-divergence of the proposed loss function and find that the loss stabilization term makes the perturbations updated towards a fixed objective spot while deviating from the ground truth. This stabilization ensures the proposed medical attack effective for different types of medical images while producing perturbations in small variance. Experiments on several medical image analysis benchmarks including the recent COVID-19 dataset show the stability of the proposed method.", "keywords": "Healthcare;Biometrics", "primary_area": "", "supplementary_material": "", "author": "Gege Qi;Lijun GONG;Yibing Song;Kai Ma;Yefeng Zheng", "authorids": "~Gege_Qi1;~Lijun_GONG2;~Yibing_Song1;~Kai_Ma2;~Yefeng_Zheng2", "gender": "F;F;;M;M", "homepage": ";;https://ybsong00.github.io/;;https://en.westlake.edu.cn/faculty/yefeng-zheng.html", "dblp": "258/7021;;77/2117;86/7113-2;44/6510", "google_scholar": ";CvmpmS0AAAAJ;oRhJHmIAAAAJ;https://scholar.google.com/citations?hl=en;vAIECxgAAAAJ", "orcid": ";;;;0000-0003-2195-2847", "linkedin": ";;;;yefeng-zheng-bb45641/?originalSubdomain=cn", "or_profile": "~Gege_Qi1;~Lijun_GONG2;~Yibing_Song1;~Kai_Ma2;~Yefeng_Zheng2", "aff": "Peking University;Tencent Jarvis Lab;Tencent AI Lab;Tencent;Tencent Jarvis Lab", "aff_domain": "pku.edu.cn;tencent.com;tencent.com;tencent.com;tencent.com", "position": "MS student;Researcher;Senior Researcher;Principal Scientist;Director", "bibtex": "@inproceedings{\nqi2021stabilized,\ntitle={Stabilized Medical Image Attacks},\nauthor={Gege Qi and Lijun GONG and Yibing Song and Kai Ma and Yefeng Zheng},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QfTXQiGYudJ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "7;7;8", "confidence": "4;3;5", "wc_review": "268;517;348", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 7.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 4.0, 0.816496580927726 ], "wc_review_avg": [ 377.6666666666667, 103.79573958287284 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.8660254037844385, "gs_citation": 44, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5943786222126044204&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=QfTXQiGYudJ", "email": "pku.edu.cn;tencent.com;tencent.com;tencent.com;tencent.com", "author_num": 5, "aff_unique_index": "0;1;1;1;1", "aff_unique_norm": "Peking University;Tencent", "aff_unique_dep": ";Jarvis Lab", "aff_unique_url": "http://www.pku.edu.cn;https://www.tencent.com", "aff_unique_abbr": "Peking U;Tencent", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;0;0", "aff_country_unique": "China" }, { "id": "QjINdYOfq0b", "title": "ABS: Automatic Bit Sharing for Model Compression", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present Automatic Bit Sharing (ABS) to automatically search for optimal model compression configurations (e.g., pruning ratio and bitwidth). Unlike previous works that consider model pruning and quantization separately, we seek to optimize them jointly. To deal with the resultant large designing space, we propose a novel super-bit model, a single-path method, to encode all candidate compression configurations, rather than maintaining separate paths for each configuration. Specifically, we first propose a novel decomposition of quantization that encapsulates all the candidate bitwidths in the search space. Starting from a low bitwidth, we sequentially consider higher bitwidths by recursively adding re-assignment offsets. We then introduce learnable binary gates to encode the choice of bitwidth, including filter-wise 0-bit for pruning. By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined. Our ABS brings two benefits for model compression: 1) It avoids the combinatorially large design space, with a reduced number of trainable parameters and search costs. 2) It also averts directly fitting an extremely low bit quantizer to the data, hence greatly reducing the optimization difficulty due to the non-differentiable quantization. Experiments on CIFAR-100 and ImageNet show that our methods achieve significant computational cost reduction while preserving promising performance. ", "keywords": "Quantization;Pruning;Model Compression;AutoML", "primary_area": "", "supplementary_material": "", "author": "Jing Liu;Bohan Zhuang;Peng Chen;Yong Guo;Chunhua Shen;Jianfei Cai;Mingkui Tan", "authorids": "~Jing_Liu8;~Bohan_Zhuang1;~Peng_Chen2;~Yong_Guo1;~Chunhua_Shen1;~Jianfei_Cai1;~Mingkui_Tan2", "gender": "M;M;M;M;;M;", "homepage": "https://www.jing-liu.com/;https://bohanzhuang.github.io/;;http://www.guoyongcs.com/;;https://jianfei-cai.github.io/;", "dblp": "72/2590-48;145/1096;;;;83/6096;", "google_scholar": "-lHaZH4AAAAJ;https://scholar.google.com.au/citations?user=DFuDBBwAAAAJ;;https://scholar.google.com/citations?hl=en;;https://scholar.google.com.tw/citations?user=N6czCoUAAAAJ;", "orcid": "0000-0002-6745-3050;;;0000-0002-3444-4588;;;", "linkedin": "jing-liu-619688133/;bohan-zhuang/;;;;;", "or_profile": "~Jing_Liu8;~Bohan_Zhuang1;~Peng_Chen2;~Yong_Guo1;~Chunhua_Shen1;~Jianfei_Cai1;~Mingkui_Tan2", "aff": "Monash University;Monash University;The University of Adelaide;South China University of Technology;;Monash University;", "aff_domain": "monash.edu.au;monash.edu;adelaide.edu.au;scut.edu.cn;;monash.edu;", "position": "PhD student;Assistant Professor;Postdoc;PhD student;;Full Professor;", "bibtex": "@misc{\nliu2021abs,\ntitle={{\\{}ABS{\\}}: Automatic Bit Sharing for Model Compression},\nauthor={Jing Liu and Bohan Zhuang and Peng Chen and Yong Guo and Chunhua Shen and Jianfei Cai and Mingkui Tan},\nyear={2021},\nurl={https://openreview.net/forum?id=QjINdYOfq0b}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3", "site": "https://openreview.net/forum?id=QjINdYOfq0b", "pdf_size": 0, "rating": "4;6;6", "confidence": "4;4;3", "wc_review": "392;434;296", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "900;814;634", "reply_reviewers": "0;0;0", "reply_authors": "2;2;1", "rating_avg": [ 5.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.6666666666666665, 0.4714045207910317 ], "wc_review_avg": [ 374.0, 57.758116312774604 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 782.6666666666666, 110.83120298704493 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.6666666666666667, 0.4714045207910317 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 7, 0 ], "corr_rating_confidence": -0.49999999999999983, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9010705340826246300&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;0;1;2;0", "aff_unique_norm": "Monash University;University of Adelaide;South China University of Technology", "aff_unique_dep": ";;", "aff_unique_url": "https://www.monash.edu;https://www.adelaide.edu.au;https://www.scut.edu.cn", "aff_unique_abbr": "Monash;Adelaide;SCUT", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1;0", "aff_country_unique": "Australia;China" }, { "title": "PAC Confidence Predictions for Deep Neural Network Classifiers", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3264", "id": "Qk-Wq5AIjpq", "poster": "", "openreview": "https://openreview.net/forum?id=Qk-Wq5AIjpq", "slides": "https://iclr.cc/virtual/2021/poster/3264", "video": "https://iclr.cc/virtual/2021/poster/3264", "author_site": "Sangdon Park, Shuo Li, Insup Lee, Osbert Bastani", "tldr": "", "abstract": "A key challenge for deploying deep neural networks (DNNs) in safety critical settings is the need to provide rigorous ways to quantify their uncertainty. In this paper, we propose a novel algorithm for constructing predicted classification confidences for DNNs that comes with provable correctness guarantees. Our approach uses Clopper-Pearson confidence intervals for the Binomial distribution in conjunction with the histogram binning approach to calibrated prediction. In addition, we demonstrate how our predicted confidences can be used to enable downstream guarantees in two settings: (i) fast DNN inference, where we demonstrate how to compose a fast but inaccurate DNN with an accurate but slow DNN in a rigorous way to improve performance without sacrificing accuracy, and (ii) safe planning, where we guarantee safety when using a DNN to predict whether a given action is safe based on visual observations. In our experiments, we demonstrate that our approach can be used to provide guarantees for state-of-the-art DNNs.", "keywords": "classification;calibration;probably approximated correct guarantee;fast DNN inference;safe planning", "primary_area": "", "supplementary_material": "", "author": "Sangdon Park;Shuo Li;Insup Lee;Osbert Bastani", "authorids": "~Sangdon_Park1;lishuo1@seas.upenn.edu;~Insup_Lee1;~Osbert_Bastani1", "gender": "M;;;M", "homepage": "https://sangdon.github.io/;;https://www.cis.upenn.edu/~lee/;http://obastani.github.io", "dblp": "119/1530-1;;l/InsupLee.html;21/11275", "google_scholar": "Vi2E2F4AAAAJ;;qPlUgrgAAAAJ;cxYepGkAAAAJ", "orcid": ";;0000-0003-2672-1132;", "linkedin": ";;;", "or_profile": "~Sangdon_Park1;lishuo1@seas.upenn.edu;~Insup_Lee1;~Osbert_Bastani1", "aff": "University of Pennsylvania;;University of Pennsylvania;University of Pennsylvania", "aff_domain": "upenn.edu;;upenn.edu;upenn.edu", "position": "PhD student;;Full Professor;Assistant Professor", "bibtex": "@inproceedings{\npark2021pac,\ntitle={{\\{}PAC{\\}} Confidence Predictions for Deep Neural Network Classifiers},\nauthor={Sangdon Park and Shuo Li and Insup Lee and Osbert Bastani},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Qk-Wq5AIjpq}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7", "confidence": "4;4;2", "wc_review": "997;417;674", "wc_reply_reviewers": "170;0;0", "wc_reply_authors": "1488;563;661", "reply_reviewers": "3;0;0", "reply_authors": "4;2;1", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.3333333333333335, 0.9428090415820634 ], "wc_review_avg": [ 696.0, 237.29447247390038 ], "wc_reply_reviewers_avg": [ 56.666666666666664, 80.13876853447539 ], "wc_reply_authors_avg": [ 904.0, 414.883919508417 ], "reply_reviewers_avg": [ 1.0, 1.4142135623730951 ], "reply_authors_avg": [ 2.3333333333333335, 1.247219128924647 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 36, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9717117550760835486&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=Qk-Wq5AIjpq", "email": "upenn.edu;;upenn.edu;upenn.edu", "author_num": 4, "aff_unique_index": "0;0;0", "aff_unique_norm": "University of Pennsylvania", "aff_unique_dep": "", "aff_unique_url": "https://www.upenn.edu", "aff_unique_abbr": "UPenn", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "QkQCcFsUtk", "title": "Unsupervised Word Translation Pairing using Refinement based Point Set Registration", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages, for improving machine translation and other multi-lingual applications. Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages, to learn structure-preserving linear transformations using adversarial networks and refinement strategies. However, such techniques, in practice, tend to suffer from instability and convergence issues, requiring tedious fine-tuning for precise parameter setting. This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space, by combining adversarial initialization and refinement procedure with point set registration algorithm used in image processing. We show that our framework alleviates the shortcomings of existing methodologies, and is relatively invariant to variable adversarial learning performance, depicting robustness in terms of parameter choices and training losses. Experimental evaluation on parallel dictionary induction task demonstrates state-of-the-art results for our framework on diverse language pairs.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Silviu Oprea;Sourav Dutta;Haytham Assem", "authorids": "silviu.oprea@ed.ac.uk;~Sourav_Dutta1;haytham.assem@huawei.com", "gender": ";M;", "homepage": ";;", "dblp": ";62/8171;", "google_scholar": ";9y1l5IoAAAAJ;", "orcid": ";0000-0002-8934-9166;", "linkedin": ";;", "or_profile": "silviu.oprea@ed.ac.uk;~Sourav_Dutta1;haytham.assem@huawei.com", "aff": ";Huawei Research Center;", "aff_domain": ";huawei.com;", "position": ";Principal Scientist;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=QkQCcFsUtk", "pdf_size": 0, "rating": "3;4;4", "confidence": "5;5;4", "wc_review": "536;397;219", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "0;0;0", "reply_reviewers": "0;0;0", "reply_authors": "0;0;0", "rating_avg": [ 3.6666666666666665, 0.4714045207910317 ], "confidence_avg": [ 4.666666666666667, 0.4714045207910317 ], "wc_review_avg": [ 384.0, 129.7407671731082 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 4, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.4999999999999999, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15394081616155963584&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "aff_unique_index": "0", "aff_unique_norm": "Huawei", "aff_unique_dep": "Research Center", "aff_unique_url": "https://www.huawei.com/en/", "aff_unique_abbr": "Huawei", "aff_country_unique_index": "0", "aff_country_unique": "China" }, { "title": "AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2536", "id": "QkRbdiiEjM", "poster": "", "openreview": "https://openreview.net/forum?id=QkRbdiiEjM", "slides": "https://iclr.cc/virtual/2021/poster/2536", "video": "https://iclr.cc/virtual/2021/poster/2536", "author_site": "Ke Sun, Zhanxing Zhu, Zhouchen Lin", "tldr": "", "abstract": "The design of deep graph models still remains to be investigated and the crucial part is how to explore and exploit the knowledge from different hops of neighbors in an efficient way. In this paper, we propose a novel RNN-like deep graph neural network architecture by incorporating AdaBoost into the computation of network; and the proposed graph convolutional network called AdaGCN~(Adaboosting Graph Convolutional Network) has the ability to efficiently extract knowledge from high-order neighbors of current nodes and then integrates knowledge from different hops of neighbors into the network in an Adaboost way. Different from other graph neural networks that directly stack many graph convolution layers, AdaGCN shares the same base neural network architecture among all ``layers'' and is recursively optimized, which is similar to an RNN. Besides, We also theoretically established the connection between AdaGCN and existing graph convolutional methods, presenting the benefits of our proposal. Finally, extensive experiments demonstrate the consistent state-of-the-art prediction performance on graphs across different label rates and the computational advantage of our approach AdaGCN~\\footnote{Code is available at \\url{https://github.com/datake/AdaGCN}.}.", "keywords": "Graph Neural Networks;AdaBoost", "primary_area": "", "supplementary_material": "", "author": "Ke Sun;Zhanxing Zhu;Zhouchen Lin", "authorids": "~Ke_Sun3;~Zhanxing_Zhu1;~Zhouchen_Lin1", "gender": "M;M;M", "homepage": "https://zhanxingzhu.github.io/;https://zhouchenlin.github.io;https://sites.google.com/view/kesun", "dblp": "87/7756.html;l/ZhouchenLin;69/476-13", "google_scholar": "a2sHceIAAAAJ;https://scholar.google.com.tw/citations?user=TanjFwoAAAAJ;lYdNhFQAAAAJ", "orcid": ";0000-0003-1493-7569;", "linkedin": ";;", "or_profile": "~Zhanxing_Zhu1;~Zhouchen_Lin1;~Ke_Sun6", "aff": "Peking University;Peking University;University of Alberta", "aff_domain": "pku.edu.cn;pku.edu.cn;ualberta.ca", "position": "Assistant Professor;Professor;PhD student", "bibtex": "@inproceedings{\nsun2021adagcn,\ntitle={Ada{\\{}GCN{\\}}: Adaboosting Graph Convolutional Networks into Deep Models},\nauthor={Ke Sun and Zhanxing Zhu and Zhouchen Lin},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QkRbdiiEjM}\n}", "github": "[![github](/images/github_icon.svg) datake/AdaGCN](https://github.com/datake/AdaGCN)", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;5;3;3", "wc_review": "411;203;178;199", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "294;228;101;77", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 247.75, 94.72954924415085 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 175.0, 89.51256894984078 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.6363636363636364, "gs_citation": 117, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=9537937835922263498&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=QkRbdiiEjM", "email": "pku.edu.cn;pku.edu.cn;ualberta.ca", "author_num": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Peking University;University of Alberta", "aff_unique_dep": ";", "aff_unique_url": "http://www.pku.edu.cn;https://www.ualberta.ca", "aff_unique_abbr": "Peking U;UAlberta", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "China;Canada" }, { "title": "Diverse Video Generation using a Gaussian Process Trigger", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2565", "id": "Qm7R_SdqTpT", "poster": "", "openreview": "https://openreview.net/forum?id=Qm7R_SdqTpT", "slides": "https://iclr.cc/virtual/2021/poster/2565", "video": "https://iclr.cc/virtual/2021/poster/2565", "author_site": "Gaurav Shrivastava, Abhinav Shrivastava", "tldr": "", "abstract": "Generating future frames given a few context (or past) frames is a challenging task. It requires modeling the temporal coherence of videos as well as multi-modality in terms of diversity in the potential future states. Current variational approaches for video generation tend to marginalize over multi-modal future outcomes. Instead, we propose to explicitly model the multi-modality in the future outcomes and leverage it to sample diverse futures. Our approach, Diverse Video Generator, uses a GP to learn priors on future states given the past and maintains a probability distribution over possible futures given a particular sample. We leverage the changes in this distribution over time to control the sampling of diverse future states by estimating the end of on-going sequences. In particular, we use the variance of GP over the output function space to trigger a change in the action sequence. We achieve state-of-the-art results on diverse future frame generation in terms of reconstruction quality and diversity of the generated sequences.", "keywords": "video synthesis;future frame generation;video generation;gaussian process priors;diverse video generation", "primary_area": "", "supplementary_material": "/attachment/d15b5d1b4c89a26709aa65f918bed1002209fff5.zip", "author": "Gaurav Shrivastava;Abhinav Shrivastava", "authorids": "~Gaurav_Shrivastava1;~Abhinav_Shrivastava2", "gender": "M;M", "homepage": "http://www.cs.umd.edu/~gauravsh/;http://abhinavsh.info", "dblp": "225/6433;65/10572", "google_scholar": ";mIF9BowAAAAJ", "orcid": ";0000-0001-8928-8554", "linkedin": "gshrivastava1/;", "or_profile": "~Gaurav_Shrivastava1;~Abhinav_Shrivastava2", "aff": "Department of Computer Science, University of Maryland, College Park;Department of Computer Science, University of Maryland, College Park", "aff_domain": "cs.umd.edu;cs.umd.edu", "position": "MS student;Assistant Professor", "bibtex": "@inproceedings{\nshrivastava2021diverse,\ntitle={Diverse Video Generation using a Gaussian Process Trigger},\nauthor={Gaurav Shrivastava and Abhinav Shrivastava},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Qm7R_SdqTpT}\n}", "github": "[![github](/images/github_icon.svg) shgaurav1/DVG](https://github.com/shgaurav1/DVG)", "project": "", "reviewers": "AnonReviewer1;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;6", "confidence": "4;3;3", "wc_review": "187;377;295", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "430;332;418", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.0, 0.0 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 286.3333333333333, 77.80888266915431 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 393.3333333333333, 43.64503280888776 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 21, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=4423790628235777527&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=Qm7R_SdqTpT", "email": "cs.umd.edu;cs.umd.edu", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "University of Maryland, College Park", "aff_unique_dep": "Department of Computer Science", "aff_unique_url": "https://www/umd.edu", "aff_unique_abbr": "UMD", "aff_campus_unique_index": "0;0", "aff_campus_unique": "College Park", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "What Can You Learn From Your Muscles? Learning Visual Representation from Human Interactions", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2586", "id": "Qm8UNVCFdh", "poster": "", "openreview": "https://openreview.net/forum?id=Qm8UNVCFdh", "slides": "https://iclr.cc/virtual/2021/poster/2586", "video": "https://iclr.cc/virtual/2021/poster/2586", "author_site": "Kiana Ehsani, Daniel Gordon, Thomas H Nguyen, Roozbeh Mottaghi, Ali Farhadi", "tldr": "", "abstract": "Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision. Most representation learning approaches rely solely on visual data such as images or videos. In this paper, we explore a novel approach, where we use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations. For this study, we collect a dataset of human interactions capturing body part movements and gaze in their daily lives. Our experiments show that our ``\"muscly-supervised\" representation that encodes interaction and attention cues outperforms a visual-only state-of-the-art method MoCo (He et al.,2020), on a variety of target tasks: scene classification (semantic), action recognition (temporal), depth estimation (geometric), dynamics prediction (physics) and walkable surface estimation (affordance). Our code and dataset are available at: https://github.com/ehsanik/muscleTorch.", "keywords": "representation learning;computer vision", "primary_area": "", "supplementary_material": "/attachment/b6d8ee701764c517d880dbf83799ae046525b62a.zip", "author": "Kiana Ehsani;Daniel Gordon;Thomas Hai Dang Nguyen;Roozbeh Mottaghi;Ali Farhadi", "authorids": "~Kiana_Ehsani1;~Daniel_Gordon1;~Thomas_Hai_Dang_Nguyen1;~Roozbeh_Mottaghi1;~Ali_Farhadi3", "gender": "F;M;M;;M", "homepage": "https://ehsanik.github.io/;https://homes.cs.washington.edu/~xkcd/;https://homes.cs.washington.edu/~tomn/;http://roozbehm.info;https://homes.cs.washington.edu/~ali/", "dblp": "198/0910;59/6084;;36/633;37/5826", "google_scholar": "RScZCLEAAAAJ;MlxWsaIAAAAJ;;CCV58dgAAAAJ;jeOFRDsAAAAJ", "orcid": ";0000-0001-8515-0523;;;", "linkedin": "kiana-ehsani-1b81b0162/;;;roozbeh-mottaghi-63397aa0;", "or_profile": "~Kiana_Ehsani1;~Daniel_Gordon1;~Thomas_Hai_Dang_Nguyen1;~Roozbeh_Mottaghi1;~Ali_Farhadi3", "aff": "Allen Institute for Artificial Intelligence;;;Allen Institute for AI;University of Washington", "aff_domain": "allenai.org;;;allenai.org;cs.uw.edu", "position": "Researcher;;;Research Manager;Full Professor", "bibtex": "@inproceedings{\nehsani2021what,\ntitle={What Can You Learn From Your Muscles? Learning Visual Representation from Human Interactions},\nauthor={Kiana Ehsani and Daniel Gordon and Thomas Hai Dang Nguyen and Roozbeh Mottaghi and Ali Farhadi},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Qm8UNVCFdh}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer2;AnonReviewer4", "pdf_size": 0, "rating": "4;6;8;9", "confidence": "5;5;4;4", "wc_review": "141;537;777;418", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "134;207;331;118", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 1.920286436967152 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 468.25, 228.94909368678444 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 197.5, 84.06098976338549 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.911322376865767, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=550456704334967809&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=Qm8UNVCFdh", "email": "allenai.org;;;allenai.org;cs.uw.edu", "author_num": 5, "aff_unique_index": "0;1;2", "aff_unique_norm": "Allen Institute for Artificial Intelligence;Allen Institute for AI;University of Washington", "aff_unique_dep": ";;", "aff_unique_url": "https://allenai.org;https://allenai.org;https://www.washington.edu", "aff_unique_abbr": "AI2;AI2;UW", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "QnzSSoqmAvB", "title": "Playing Nondeterministic Games through Planning with a Learned Model", "track": "main", "status": "Reject", "tldr": "", "abstract": "The MuZero algorithm is known for achieving high-level performance on traditional zero-sum two-player games of perfect information such as chess, Go, and shogi, as well as visual, non-zero sum, single-player environments such as the Atari suite. Despite lacking a perfect simulator and employing a learned model of environmental dynamics, MuZero produces game-playing agents comparable to its predecessor AlphaZero. However, the current implementation of MuZero is restricted only to deterministic environments. This paper presents Nondeterministic MuZero (NDMZ), an extension of MuZero for nondeterministic, two-player, zero-sum games of perfect information. Borrowing from Nondeterministic Monte Carlo Tree Search and the theory of extensive-form games, NDMZ formalizes chance as a player in the game and incorporates it into the MuZero network architecture and tree search. Experiments show that NDMZ is capable of learning effective strategies and an accurate model of the game.", "keywords": "reinforcement learning;alphazero;muzero;mcts;planning;search", "primary_area": "", "supplementary_material": "/attachment/e191bce40e7e3ad9a47deb7523752104aa204a21.zip", "author": "Thomas Willkens;Jordan Pollack", "authorids": "~Thomas_Willkens1;~Jordan_Pollack1", "gender": "M;M", "homepage": "https://www.brandeis.edu/computer-science/phd/students.html;http://www.cs.brandeis.edu/~pollack/", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": "tom-willkens-92114111/;", "or_profile": "~Thomas_Willkens1;~Jordan_Pollack1", "aff": "Brandeis University;Brandeis University", "aff_domain": "brandeis.edu;", "position": "PhD student;Full Professor", "bibtex": "@misc{\nwillkens2021playing,\ntitle={Playing Nondeterministic Games through Planning with a Learned Model},\nauthor={Thomas Willkens and Jordan Pollack},\nyear={2021},\nurl={https://openreview.net/forum?id=QnzSSoqmAvB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=QnzSSoqmAvB", "pdf_size": 0, "rating": "3;4;5;6;7", "confidence": "4;4;4;4;1", "wc_review": "338;310;463;308;156", "wc_reply_reviewers": "0;0;0;0;0", "wc_reply_authors": "187;135;257;95;50", "reply_reviewers": "0;0;0;0;0", "reply_authors": "1;1;1;1;1", "rating_avg": [ 5.0, 1.4142135623730951 ], "confidence_avg": [ 3.4, 1.2 ], "wc_review_avg": [ 315.0, 97.76297867802515 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 144.8, 72.01777558353216 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 11, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.7071067811865475, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:M8yh-3w8AOMJ:scholar.google.com/&scioq=Playing+Nondeterministic+Games+through+Planning+with+a+Learned+Model&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Brandeis University", "aff_unique_dep": "", "aff_unique_url": "https://www.brandeis.edu", "aff_unique_abbr": "Brandeis", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Kanerva++: Extending the Kanerva Machine With Differentiable, Locally Block Allocated Latent Memory", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2961", "id": "QoWatN-b8T", "poster": "", "openreview": "https://openreview.net/forum?id=QoWatN-b8T", "slides": "https://iclr.cc/virtual/2021/poster/2961", "video": "https://iclr.cc/virtual/2021/poster/2961", "author_site": "Jason Ramapuram, Yan Wu, Alexandros Kalousis", "tldr": "", "abstract": "Episodic and semantic memory are critical components of the human memory model. The theory of complementary learning systems (McClelland et al., 1995) suggests that the compressed representation produced by a serial event (episodic memory) is later restructured to build a more generalized form of reusable knowledge (semantic memory). In this work, we develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory via a hierarchical latent variable model. We take inspiration from traditional heap allocation and extend the idea of locally contiguous memory to the Kanerva Machine, enabling a novel differentiable block allocated latent memory. In contrast to the Kanerva Machine, we simplify the process of memory writing by treating it as a fully feed forward deterministic process, relying on the stochasticity of the read key distribution to disperse information within the memory. We demonstrate that this allocation scheme improves performance in memory conditional image generation, resulting in new state-of-the-art conditional likelihood values on binarized MNIST (\u226441.58 nats/image) , binarized Omniglot (\u226466.24 nats/image), as well as presenting competitive performance on CIFAR10, DMLab Mazes, Celeb-A and ImageNet32\u00d732.", "keywords": "memory;generative model;latent variable;heap allocation", "primary_area": "", "supplementary_material": "/attachment/a9ff91111d5dd3c5f4b9d58849ce86d2f6c5cdee.zip", "author": "Jason Ramapuram;Yan Wu;Alexandros Kalousis", "authorids": "~Jason_Ramapuram1;~Yan_Wu1;~Alexandros_Kalousis1", "gender": "M;M;M", "homepage": "http://jramapuram.github.io;;http://dmml.ch/alexandros-kalousis/", "dblp": "200/8958;;68/6004", "google_scholar": "U-MT4IsAAAAJ;https://scholar.google.co.uk/citations?user=vYmSd0UAAAAJ;uVkn9UEAAAAJ", "orcid": ";;", "linkedin": "jramapuram/;;", "or_profile": "~Jason_Ramapuram1;~Yan_Wu1;~Alexandros_Kalousis1", "aff": "Google DeepMind;Google DeepMind;University of Applied Sciences Western Switzerland", "aff_domain": "deepmind.com;google.com;hesge.ch", "position": "Intern;Researcher;Full Professor", "bibtex": "@inproceedings{\nramapuram2021kanerva,\ntitle={Kanerva++: Extending the Kanerva Machine With Differentiable, Locally Block Allocated Latent Memory},\nauthor={Jason Ramapuram and Yan Wu and Alexandros Kalousis},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QoWatN-b8T}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;6;6;7", "confidence": "4;4;4;3", "wc_review": "552;302;587;396", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "755;212;485;155", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.25, 0.4330127018922193 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 459.25, 115.8131577153477 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 401.75, 239.06419117048875 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 10, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -1.0, "gs_citation": 4, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12298635307217182070&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=QoWatN-b8T", "email": "deepmind.com;google.com;hesge.ch", "author_num": 3, "aff_unique_index": "0;0;1", "aff_unique_norm": "Google;University of Applied Sciences Western Switzerland", "aff_unique_dep": "Google DeepMind;", "aff_unique_url": "https://deepmind.com;https://www.hes-so.ch/en", "aff_unique_abbr": "DeepMind;HES-SO", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1", "aff_country_unique": "United Kingdom;Switzerland" }, { "title": "Representation Balancing Offline Model-based Reinforcement Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2653", "id": "QpNz8r_Ri2Y", "poster": "", "openreview": "https://openreview.net/forum?id=QpNz8r_Ri2Y", "slides": "https://iclr.cc/virtual/2021/poster/2653", "video": "https://iclr.cc/virtual/2021/poster/2653", "author_site": "Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim", "tldr": "", "abstract": "One of the main challenges in offline and off-policy reinforcement learning is to cope with the distribution shift that arises from the mismatch between the target policy and the data collection policy. In this paper, we focus on a model-based approach, particularly on learning the representation for a robust model of the environment under the distribution shift, which has been first studied by Representation Balancing MDP (RepBM). Although this prior work has shown promising results, there are a number of shortcomings that still hinder its applicability to practical tasks. In particular, we address the curse of horizon exhibited by RepBM, rejecting most of the pre-collected data in long-term tasks. We present a new objective for model learning motivated by recent advances in the estimation of stationary distribution corrections. This effectively overcomes the aforementioned limitation of RepBM, as well as naturally extending to continuous action spaces and stochastic policies. We also present an offline model-based policy optimization using this new objective, yielding the state-of-the-art performance in a representative set of benchmark offline RL tasks.", "keywords": "Reinforcement Learning;Model-based Reinforcement Learning;Offline Reinforcement Learning;Batch Reinforcement Learning;Off-policy policy evaluation", "primary_area": "", "supplementary_material": "", "author": "Byung-Jun Lee;Jongmin Lee;Kee-Eung Kim", "authorids": "~Byung-Jun_Lee1;~Jongmin_Lee1;~Kee-Eung_Kim4", "gender": "M;M;M", "homepage": "https://dmlab.korea.ac.kr/professor.html;https://www.jmlee.kr;http://ailab.kaist.ac.kr", "dblp": "130/1678-1;68/222-4.html;35/6703", "google_scholar": "FwoohI4AAAAJ;https://scholar.google.co.kr/citations?user=rFcK8EEAAAAJ;https://scholar.google.com/citations?hl=ko", "orcid": ";;", "linkedin": ";jmlee123/;", "or_profile": "~Byung-Jun_Lee1;~Jongmin_Lee1;~Kee-Eung_Kim2", "aff": "Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology;Korea Advanced Institute of Science & Technology", "aff_domain": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "position": "PhD student;PhD student;Full Professor", "bibtex": "@inproceedings{\nlee2021representation,\ntitle={Representation Balancing Offline Model-based Reinforcement Learning},\nauthor={Byung-Jun Lee and Jongmin Lee and Kee-Eung Kim},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QpNz8r_Ri2Y}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer5", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;4;4", "wc_review": "262;422;290;283", "wc_reply_reviewers": "16;16;0;0", "wc_reply_authors": "306;263;223;189", "reply_reviewers": "1;1;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 314.25, 63.057017848927806 ], "wc_reply_reviewers_avg": [ 8.0, 8.0 ], "wc_reply_authors_avg": [ 245.25, 43.77427897750002 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 61, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18110133788778015799&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=QpNz8r_Ri2Y", "email": "kaist.ac.kr;kaist.ac.kr;kaist.ac.kr", "author_num": 3, "aff_unique_index": "0;0;0", "aff_unique_norm": "Korea Advanced Institute of Science and Technology", "aff_unique_dep": "", "aff_unique_url": "https://www.kaist.ac.kr", "aff_unique_abbr": "KAIST", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "South Korea" }, { "id": "QpT9Q_NNfQL", "title": "NeurWIN: Neural Whittle Index Network for Restless Bandits via Deep RL", "track": "main", "status": "Reject", "tldr": "", "abstract": "Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems. This property motivates using deep reinforcement learning for the training of NeurWIN. We demonstrate the utility of NeurWIN by evaluating its performance for three recently studied restless bandit problems. Our experiment results show that the performance of NeurWIN is either better than, or as good as, state-of-the-art policies for all three problems.", "keywords": "deep reinforcement learning;restless bandits;Whittle index", "primary_area": "", "supplementary_material": "/attachment/dc4571e6e0d355d22e22ec2890080bce72a771c1.zip", "author": "Khaled Nakhleh;Santosh Ganji;Ping-Chun Hsieh;I-Hong Hou;Srinivas Shakkottai", "authorids": "~Khaled_Nakhleh1;~Santosh_Ganji1;~Ping-Chun_Hsieh1;~I-Hong_Hou1;~Srinivas_Shakkottai1", "gender": "M;M;M;;M", "homepage": "https://sites.google.com/tamu.edu/santosh/;https://pinghsieh.github.io/;https://cesg.tamu.edu/people-2/faculty/i-hong-hou/;https://cesg.tamu.edu/faculty/sshakkot/;https://khalednakhleh.com/", "dblp": "241/0417;163/7352;21/1392.html;03/353.html;303/4623", "google_scholar": "https://scholar.google.co.in/citations?user=MA3CvN0AAAAJ;ix38JgoAAAAJ;o3xoRqoAAAAJ;https://scholar.google.com/citations?hl=en;", "orcid": "0000-0002-9443-0719;;0000-0002-1166-8773;0000-0002-5882-6433;0000-0002-9769-3071", "linkedin": "santosh-ganji/;;;;khalednakhleh/", "or_profile": "~Santosh_Ganji1;~Ping-Chun_Hsieh1;~I-Hong_Hou1;~Srinivas_Shakkottai1;~Khaled_Jamal_Nakhleh1", "aff": "Texas A&M;National Yang Ming Chiao Tung University;Texas A&M;Texas A&M;Texas A&M University", "aff_domain": "tamu.edu;nycu.edu.tw;tamu.edu;tamu.edu;tamu.edu", "position": "PhD student;Assistant Professor;Associate Professor;Full Professor;PhD student", "bibtex": "@misc{\nnakhleh2021neurwin,\ntitle={Neur{\\{}WIN{\\}}: Neural Whittle Index Network for Restless Bandits via Deep {\\{}RL{\\}}},\nauthor={Khaled Nakhleh and Santosh Ganji and Ping-Chun Hsieh and I-Hong Hou and Srinivas Shakkottai},\nyear={2021},\nurl={https://openreview.net/forum?id=QpT9Q_NNfQL}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer3;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=QpT9Q_NNfQL", "pdf_size": 0, "rating": "4;4;7;7", "confidence": "4;4;4;3", "wc_review": "549;418;373;200", "wc_reply_reviewers": "425;204;0;0", "wc_reply_authors": "1290;1344;549;61", "reply_reviewers": "2;1;0;0", "reply_authors": "3;2;2;1", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 385.0, 124.85391463626601 ], "wc_reply_reviewers_avg": [ 157.25, 175.5923902109656 ], "wc_reply_authors_avg": [ 811.0, 534.9471936555981 ], "reply_reviewers_avg": [ 0.75, 0.82915619758885 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.5773502691896258, "gs_citation": 51, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15114473685997353079&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 13, "aff_unique_index": "0;1;0;0;0", "aff_unique_norm": "Texas A&M University;National Yang Ming Chiao Tung University", "aff_unique_dep": ";", "aff_unique_url": "https://www.tamu.edu;https://www.nycu.edu.tw", "aff_unique_abbr": "TAMU;NYCU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Taiwan", "aff_country_unique_index": "0;1;0;0;0", "aff_country_unique": "United States;China" }, { "id": "QpU7n-6l0n", "title": "On the Consistency Loss for Leveraging Augmented Data to Learn Robust and Invariant Representations", "track": "main", "status": "Reject", "tldr": "", "abstract": "Data augmentation is one of the most popular techniques for improving the robustness of neural networks. In addition to directly training the model with original samples and augmented samples, a torrent of methods regularizing the distance between embeddings/representations of the original samples and their augmented counterparts have been introduced. In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings. Our analysis suggests how the ideal choices of regularization correspond to various assumptions. With an invariance test, we show that regularization is important if the model is to be used in a broader context than the in-lab setting because non-regularized approaches are limited in learning the concept of invariance, despite equally high accuracy. Finally, we also show that the generic approach we identified (squared $\\ell_2$ norm regularized augmentation) performs better than several recent methods, which are each specially designed for one task and significantly more complicated than ours, over three different tasks.", "keywords": "robustness;invariance;data augmentation;consistency loss", "primary_area": "", "supplementary_material": "/attachment/7312f2857d89ab81fca2fabe43e453c2bfe93abf.zip", "author": "Haohan Wang;Zeyi Huang;Xindi Wu;Eric Xing", "authorids": "~Haohan_Wang1;~Zeyi_Huang3;~Xindi_Wu1;~Eric_Xing1", "gender": "M;;F;M", "homepage": "http://cs.cmu.edu/~haohanw;;https://xindiwu.github.io/;http://www.cs.cmu.edu/~epxing/", "dblp": "132/4066;142/5094;235/0784;36/3855", "google_scholar": "nZxJGeUAAAAJ;rMvdp7oAAAAJ;hvnUnrUAAAAJ;https://scholar.google.com.tw/citations?user=5pKTRxEAAAAJ", "orcid": ";;;", "linkedin": "haohanwang/;;;", "or_profile": "~Haohan_Wang1;~Zeyi_Huang3;~Xindi_Wu1;~Eric_Xing1", "aff": "Carnegie Mellon University;University of Wisconsin - Madison;Carnegie Mellon University;School of Computer Science, Carnegie Mellon University", "aff_domain": "cmu.edu;wisc.edu;andrew.cmu.edu;cs.cmu.edu", "position": "PhD student;PhD student;MS student;Full Professor", "bibtex": "@misc{\nwang2021on,\ntitle={On the Consistency Loss for Leveraging Augmented Data to Learn Robust and Invariant Representations},\nauthor={Haohan Wang and Zeyi Huang and Xindi Wu and Eric Xing},\nyear={2021},\nurl={https://openreview.net/forum?id=QpU7n-6l0n}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=QpU7n-6l0n", "pdf_size": 0, "rating": "4;6;6", "confidence": "3;3;4", "wc_review": "455;555;212", "wc_reply_reviewers": "274;0;0", "wc_reply_authors": "1564;652;461", "reply_reviewers": "1;0;0", "reply_authors": "2;1;1", "rating_avg": [ 5.333333333333333, 0.9428090415820634 ], "confidence_avg": [ 3.3333333333333335, 0.4714045207910317 ], "wc_review_avg": [ 407.3333333333333, 144.02854655318237 ], "wc_reply_reviewers_avg": [ 91.33333333333333, 129.1648386967427 ], "wc_reply_authors_avg": [ 892.3333333333334, 481.298475192081 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.49999999999999983, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:COOPxJnModkJ:scholar.google.com/&scioq=On+the+Consistency+Loss+for+Leveraging+Augmented+Data+to+Learn+Robust+and+Invariant+Representations&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;1;0;0", "aff_unique_norm": "Carnegie Mellon University;University of Wisconsin-Madison", "aff_unique_dep": ";", "aff_unique_url": "https://www.cmu.edu;https://www.wisc.edu", "aff_unique_abbr": "CMU;UW-Madison", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Madison;Pittsburgh", "aff_country_unique_index": "0;0;0;0", "aff_country_unique": "United States" }, { "id": "Qpik5XBv_1-", "title": "Language Controls More Than Top-Down Attention: Modulating Bottom-Up Visual Processing with Referring Expressions", "track": "main", "status": "Reject", "tldr": "", "abstract": "How to best integrate linguistic and perceptual processing in multimodal tasks is an important open problem. In this work we argue that the common technique of using language to direct visual attention over high-level visual features may not be optimal. Using language throughout the bottom-up visual pathway, going from pixels to high-level features, may be necessary. Our experiments on several English referring expression datasets show significant improvements when language is used to control the filters for bottom-up visual processing in addition to top-down attention.", "keywords": "Referring Expression Understanding;Language-Vision Problems;Grounded Language Understanding", "primary_area": "", "supplementary_material": "", "author": "Ozan Arkan Can;Ilker Kesen;Deniz Yuret", "authorids": "~Ozan_Arkan_Can1;~Ilker_Kesen1;~Deniz_Yuret1", "gender": "M;M;M", "homepage": "https://ozanarkancan.github.io/;https://github.com/ilkerkesen;http://www.denizyuret.com/", "dblp": ";228/2036;84/4160", "google_scholar": "https://scholar.google.com/citations?hl=en;;https://scholar.google.com.tw/citations?user=EJurXJ4AAAAJ", "orcid": ";;", "linkedin": "ozan-arkan-can-69aba876;;", "or_profile": "~Ozan_Arkan_Can1;~Ilker_Kesen1;~Deniz_Yuret1", "aff": ";Koc University;Koc University", "aff_domain": ";ku.edu.tr;ku.edu.tr", "position": ";PhD student;Full Professor", "bibtex": "@misc{\ncan2021language,\ntitle={Language Controls More Than Top-Down Attention: Modulating Bottom-Up Visual Processing with Referring Expressions},\nauthor={Ozan Arkan Can and Ilker Kesen and Deniz Yuret},\nyear={2021},\nurl={https://openreview.net/forum?id=Qpik5XBv_1-}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer4;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=Qpik5XBv_1-", "pdf_size": 0, "rating": "2;4;5;10", "confidence": "4;3;4;4", "wc_review": "675;357;391;149", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "422;333;322;27", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 2.947456530637899 ], "confidence_avg": [ 3.75, 0.4330127018922193 ], "wc_review_avg": [ 393.0, 187.32325002519042 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 276.0, 148.89761583047596 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.24485105343719588, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:8puZH3BanhEJ:scholar.google.com/&scioq=Language+Controls+More+Than+Top-Down+Attention:+Modulating+Bottom-Up+Visual+Processing+with+Referring+Expressions&hl=en&as_sdt=0,33", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Koc University", "aff_unique_dep": "", "aff_unique_url": "https://www.koc.edu.tr", "aff_unique_abbr": "Koc", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "T\u00fcrkiye" }, { "title": "Simple Augmentation Goes a Long Way: ADRL for DNN Quantization", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3162", "id": "Qr0aRliE_Hb", "poster": "", "openreview": "https://openreview.net/forum?id=Qr0aRliE_Hb", "slides": "https://iclr.cc/virtual/2021/poster/3162", "video": "https://iclr.cc/virtual/2021/poster/3162", "author_site": "Lin Ning, Guoyang Chen, Weifeng Zhang, Xipeng Shen", "tldr": "", "abstract": "Mixed precision quantization improves DNN performance by assigning different layers with different bit-width values. Searching for the optimal bit-width for each layer, however, remains a challenge. Deep Reinforcement Learning (DRL) shows some recent promise. It however suffers instability due to function approximation errors, causing large variances in the early training stages, slow convergence, and suboptimal policies in the mixed-precision quantization problem. This paper proposes augmented DRL (ADRL) as a way to alleviate these issues. This new strategy augments the neural networks in DRL with a complementary scheme to boost the performance of learning. The paper examines the effectiveness of ADRL both analytically and empirically, showing that it can produce more accurate quantized models than the state of the art DRL-based quantization while improving the learning speed by 4.5-64 times. ", "keywords": "Reinforcement Learning;Quantization;mixed precision;augmented deep reinforcement learning;DNN", "primary_area": "", "supplementary_material": "", "author": "Lin Ning;Guoyang Chen;Weifeng Zhang;Xipeng Shen", "authorids": "~Lin_Ning1;~Guoyang_Chen1;weifeng.z@alibaba-inc.com;~Xipeng_Shen1", "gender": "F;M;;M", "homepage": ";;;https://research.csc.ncsu.edu/picture/xshen5/index.htm", "dblp": "38/3526-1;;;36/4172.html", "google_scholar": "FCY4vUEAAAAJ;wqH_U3YAAAAJ;;0DC5oGQAAAAJ", "orcid": "0000-0001-9458-7946;;;0000-0003-3599-8010", "linkedin": ";;;", "or_profile": "~Lin_Ning1;~Guoyang_Chen1;weifeng.z@alibaba-inc.com;~Xipeng_Shen1", "aff": "Google;;;North Carolina State University", "aff_domain": "google.com;;;ncsu.edu", "position": "Software Engineer;;;Professor", "bibtex": "@inproceedings{\nning2021simple,\ntitle={Simple Augmentation Goes a Long Way: {\\{}ADRL{\\}} for {\\{}DNN{\\}} Quantization},\nauthor={Lin Ning and Guoyang Chen and Weifeng Zhang and Xipeng Shen},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Qr0aRliE_Hb}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1", "pdf_size": 0, "rating": "6;6;7", "confidence": "3;3;3", "wc_review": "218;259;557", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "268;579;978", "reply_reviewers": "0;0;0", "reply_authors": "1;1;2", "rating_avg": [ 6.333333333333333, 0.4714045207910317 ], "confidence_avg": [ 3.0, 0.0 ], "wc_review_avg": [ 344.6666666666667, 151.07246237779037 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 608.3333333333334, 290.5974688273951 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.3333333333333333, 0.4714045207910317 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 8, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5992330335256867267&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 7, "pdf": "https://openreview.net/pdf?id=Qr0aRliE_Hb", "email": "google.com;;;ncsu.edu", "author_num": 4, "aff_unique_index": "0;1", "aff_unique_norm": "Google;North Carolina State University", "aff_unique_dep": "Google;", "aff_unique_url": "https://www.google.com;https://www.ncsu.edu", "aff_unique_abbr": "Google;NCSU", "aff_campus_unique_index": "0", "aff_campus_unique": "Mountain View;", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "QtLSvKvm5Po", "title": "Backdoor Attacks to Graph Neural Networks", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "In this work, we propose the first backdoor attack to graph neural networks (GNN). Specifically, we propose a \\emph{subgraph based backdoor attack} to GNN for graph classification. In our backdoor attack, a GNN classifier predicts an attacker-chosen target label for a testing graph once a predefined subgraph is injected to the testing graph. Our empirical results on three real-world graph datasets show that our backdoor attacks are effective with a small impact on a GNN's prediction accuracy for clean testing graphs.", "keywords": "", "primary_area": "", "supplementary_material": "", "author": "Zaixi Zhang;Jinyuan Jia;Binghui Wang;Neil Zhenqiang Gong", "authorids": "zaixi.zhang@duke.edu;~Jinyuan_Jia2;~Binghui_Wang2;~Neil_Zhenqiang_Gong1", "gender": ";;M;", "homepage": ";https://jinyuan-jia.github.io/;https://wangbinghui.net;", "dblp": ";24/5124-1.html;123/7149;", "google_scholar": ";iyg4ytkAAAAJ;SoOztcEAAAAJ;", "orcid": ";0000-0002-9785-7769;0000-0001-5616-060X;", "linkedin": ";;;", "or_profile": "zaixi.zhang@duke.edu;~Jinyuan_Jia2;~Binghui_Wang2;~Neil_Zhenqiang_Gong1", "aff": ";Duke University;Duke University;", "aff_domain": ";duke.edu;duke.edu;", "position": ";PhD student;Postdoc;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=QtLSvKvm5Po", "pdf_size": 0, "rating": "4;5;5;5", "confidence": "4;5;4;3", "wc_review": "552;877;338;585", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.75, 0.4330127018922193 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 588.0, 191.9153459210597 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 4, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 269, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=3338175636076778177&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0", "aff_unique_norm": "Duke University", "aff_unique_dep": "", "aff_unique_url": "https://www.duke.edu", "aff_unique_abbr": "Duke", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "title": "Random Feature Attention", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3213", "id": "QtTKTdVrFBB", "poster": "", "openreview": "https://openreview.net/forum?id=QtTKTdVrFBB", "slides": "https://iclr.cc/virtual/2021/poster/3213", "video": "https://iclr.cc/virtual/2021/poster/3213", "author_site": "Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah Smith, Lingpeng Kong", "tldr": "", "abstract": "Transformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models pairwise interactions between the inputs at every timestep. While attention is powerful, it does not scale efficiently to long sequences due to its quadratic time and space complexity in the sequence length. We propose RFA, a linear time and space attention that uses random feature methods to approximate the softmax function, and explore its application in transformers. RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism. Experiments on language modeling and machine translation demonstrate that RFA achieves similar or better performance compared to strong transformer baselines. In the machine translation experiment, RFA decodes twice as fast as a vanilla transformer. Compared to existing efficient transformer variants, RFA is competitive in terms of both accuracy and efficiency on three long text classification datasets. Our analysis shows that RFA\u2019s efficiency gains are especially notable on long sequences, suggesting that RFA will be particularly useful in tasks that require working with large inputs, fast decoding speed, or low memory footprints.", "keywords": "Attention;transformers;machine translation;language modeling", "primary_area": "", "supplementary_material": "/attachment/0e42a49f44b1fc4fcf74de3913b30659cbddbe12.zip", "author": "Hao Peng;Nikolaos Pappas;Dani Yogatama;Roy Schwartz;Noah Smith;Lingpeng Kong", "authorids": "~Hao_Peng4;~Nikolaos_Pappas1;~Dani_Yogatama2;~Roy_Schwartz1;~Noah_Smith1;~Lingpeng_Kong1", "gender": "M;M;M;M;;M", "homepage": "http://nik0spapp.github.io/;https://schwartz-lab-huji.github.io/;https://homes.cs.washington.edu/~nasmith/;https://ikekonglp.github.io/;;https://haopeng-nlp.github.io/", "dblp": "36/8968-2.html;19/376-1;90/5204.html;144/7656;08/8178;", "google_scholar": "https://scholar.google.ch/citations?user=daiFj_cAAAAJ;wvfWo9IAAAAJ;https://scholar.google.com/citations?hl=en;f1hBi5wAAAAJ;;6Y37nm0AAAAJ", "orcid": "0000-0002-2004-8111;;0000-0002-2310-6380;;;", "linkedin": "nik0spapp/;;;;;", "or_profile": "~Nikolaos_Pappas1;~Roy_Schwartz1;~Noah_Smith1;~Lingpeng_Kong1;~Dani_Yogatama1;~Hao_Peng1", "aff": "University of Washington;Hebrew University, Hebrew University of Jerusalem;Allen Institute for Artificial Intelligence;Department of Computer Science, The University of Hong Kong;Google DeepMind;Department of Computer Science, University of Washington", "aff_domain": "cs.washington.edu;cs.huji.ac.il;allenai.org;cs.hku.hk;google.com;cs.washington.edu", "position": "Postdoc;Assistant Professor;Senior Director of NLP Research;Assistant Professor;Research Scientist;PhD student", "bibtex": "@inproceedings{\npeng2021random,\ntitle={Random Feature Attention},\nauthor={Hao Peng and Nikolaos Pappas and Dani Yogatama and Roy Schwartz and Noah Smith and Lingpeng Kong},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QtTKTdVrFBB}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "pdf_size": 0, "rating": "4;8;8;8", "confidence": "5;4;3;3", "wc_review": "706;584;251;489", "wc_reply_reviewers": "410;127;0;0", "wc_reply_authors": "1703;621;118;734", "reply_reviewers": "1;1;0;0", "reply_authors": "3;2;1;1", "rating_avg": [ 7.0, 1.7320508075688772 ], "confidence_avg": [ 3.75, 0.82915619758885 ], "wc_review_avg": [ 507.5, 166.87495318351404 ], "wc_reply_reviewers_avg": [ 134.25, 167.43412883877647 ], "wc_reply_authors_avg": [ 794.0, 573.7564814448722 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.75, 0.82915619758885 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.8703882797784891, "gs_citation": 401, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=11277878896311252185&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "pdf": "https://openreview.net/pdf?id=QtTKTdVrFBB", "email": "cs.washington.edu;cs.huji.ac.il;allenai.org;cs.hku.hk;google.com;cs.washington.edu", "author_num": 6, "aff_unique_index": "0;1;2;3;4;0", "aff_unique_norm": "University of Washington;Hebrew University of Jerusalem;Allen Institute for Artificial Intelligence;University of Hong Kong;Google", "aff_unique_dep": ";;;Department of Computer Science;Google DeepMind", "aff_unique_url": "https://www.washington.edu;https://www.huji.ac.il;https://allenai.org;https://www.hku.hk;https://deepmind.com", "aff_unique_abbr": "UW;HUJI;AI2;HKU;DeepMind", "aff_campus_unique_index": "1;2", "aff_campus_unique": ";Hong Kong SAR;Seattle", "aff_country_unique_index": "0;1;0;2;3;0", "aff_country_unique": "United States;Israel;China;United Kingdom" }, { "title": "Domain-Robust Visual Imitation Learning with Mutual Information Constraints", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2896", "id": "QubpWYfdNry", "poster": "", "openreview": "https://openreview.net/forum?id=QubpWYfdNry", "slides": "https://iclr.cc/virtual/2021/poster/2896", "video": "https://iclr.cc/virtual/2021/poster/2896", "author_site": "Edoardo Cetin, Oya Celiktutan", "tldr": "", "abstract": "Human beings are able to understand objectives and learn by simply observing others perform a task. Imitation learning methods aim to replicate such capabilities, however, they generally depend on access to a full set of optimal states and actions taken with the agent's actuators and from the agent's point of view. In this paper, we introduce a new algorithm - called Disentangling Generative Adversarial Imitation Learning (DisentanGAIL) - with the purpose of bypassing such constraints. Our algorithm enables autonomous agents to learn directly from high dimensional observations of an expert performing a task, by making use of adversarial learning with a latent representation inside the discriminator network. Such latent representation is regularized through mutual information constraints to incentivize learning only features that encode information about the completion levels of the task being demonstrated. This allows to obtain a shared feature space to successfully perform imitation while disregarding the differences between the expert's and the agent's domains. Empirically, our algorithm is able to efficiently imitate in a diverse range of control problems including balancing, manipulation and locomotive tasks, while being robust to various domain differences in terms of both environment appearance and agent embodiment.", "keywords": "Imitation Learning;Reinforcement Learning;Observational Imitation;Third-Person Imitation;Mutual Information;Domain Adaption;Machine Learning", "primary_area": "", "supplementary_material": "/attachment/a2f45d7daa76f701540a30f84a144a328ff20c3a.zip", "author": "Edoardo Cetin;Oya Celiktutan", "authorids": "~Edoardo_Cetin1;~Oya_Celiktutan2", "gender": ";F", "homepage": "https://aladoro.github.io/;https://nms.kcl.ac.uk/oya.celiktutan/", "dblp": "287/4615;05/4947", "google_scholar": "https://scholar.google.it/citations?hl=en;https://scholar.google.co.uk/citations?user=CCCoMqcAAAAJ", "orcid": ";0000-0002-7213-6359", "linkedin": "edoardo-cetin-916b68195/;oya-celiktutan-5249104/?originalSubdomain=uk", "or_profile": "~Edoardo_Cetin1;~Oya_Celiktutan2", "aff": "King's College London;King's College London", "aff_domain": "kcl.ac.uk;kcl.ac.uk", "position": "PhD student;Assistant Professor", "bibtex": "@inproceedings{\ncetin2021domainrobust,\ntitle={Domain-Robust Visual Imitation Learning with Mutual Information Constraints},\nauthor={Edoardo Cetin and Oya Celiktutan},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=QubpWYfdNry}\n}", "github": "[![github](/images/github_icon.svg) Aladoro/domain-robust-visual-il](https://github.com/Aladoro/domain-robust-visual-il)", "project": "", "reviewers": "AnonReviewer2;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "6;7;7;7", "confidence": "4;4;3;3", "wc_review": "675;1179;589;432", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1073;2086;524;809", "reply_reviewers": "0;0;0;0", "reply_authors": "2;3;1;2", "rating_avg": [ 6.75, 0.4330127018922193 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 718.75, 279.6447523197959 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 1123.0, 588.9112836412629 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 19, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=18023915525010137640&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 3, "pdf": "https://openreview.net/pdf?id=QubpWYfdNry", "email": "kcl.ac.uk;kcl.ac.uk", "author_num": 2, "aff_unique_index": "0;0", "aff_unique_norm": "King's College London", "aff_unique_dep": "", "aff_unique_url": "https://www.kcl.ac.uk", "aff_unique_abbr": "KCL", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United Kingdom" }, { "title": "Transient Non-stationarity and Generalisation in Deep Reinforcement Learning", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3156", "id": "Qun8fv4qSby", "poster": "", "openreview": "https://openreview.net/forum?id=Qun8fv4qSby", "slides": "https://iclr.cc/virtual/2021/poster/3156", "video": "https://iclr.cc/virtual/2021/poster/3156", "author_site": "Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson", "tldr": "", "abstract": "Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually updated. However, we find evidence that neural networks exhibit a memory effect, where these transient non-stationarities can permanently impact the latent representation and adversely affect generalisation performance. Consequently, to improve generalisation of deep RL agents, we propose Iterated Relearning (ITER). ITER augments standard RL training by repeated knowledge transfer of the current policy into a freshly initialised network, which thereby experiences less non-stationarity during training. Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks ProcGen and Multiroom.", "keywords": "Reinforcement Learning;Generalization", "primary_area": "", "supplementary_material": "/attachment/bad50287646932670b54b4f988fa9fba96c4e603.zip", "author": "Maximilian Igl;Gregory Farquhar;Jelena Luketina;Wendelin Boehmer;Shimon Whiteson", "authorids": "~Maximilian_Igl1;~Gregory_Farquhar1;~Jelena_Luketina1;~Wendelin_Boehmer1;~Shimon_Whiteson1", "gender": "M;M;F;M;", "homepage": "https://maximilianigl.com;https://greg-farquhar.github.io/;https://whirl.cs.ox.ac.uk/member/jelena-luketina/;https://reinforceAI.net;", "dblp": "207/8245.html;195/5653;172/1124;08/9988;https://dblp.uni-trier.de/pers/w/Whiteson:Shimon.html", "google_scholar": "https://scholar.google.com/citations?hl=en;6Z-RC-QAAAAJ;zpil5xkAAAAJ;https://scholar.google.de/citations?user=wI5MV8IAAAAJ;", "orcid": ";;;0000-0002-4398-6792;", "linkedin": "maximilian-igl-21116992/;;;wendelin-boehmer;", "or_profile": "~Maximilian_Igl1;~Gregory_Farquhar1;~Jelena_Luketina1;~Wendelin_Boehmer1;~Shimon_Whiteson1", "aff": "University of Oxford;Google DeepMind;Google DeepMind;Delft University of Technology;University of Oxford", "aff_domain": "oxford.ac.uk;google.com;deepmind.com;tudelft.nl;ox.ac.uk", "position": "PhD student;Research Scientist;Intern;Assistant Professor;Professor", "bibtex": "@inproceedings{\nigl2021transient,\ntitle={Transient Non-stationarity and Generalisation in Deep Reinforcement Learning},\nauthor={Maximilian Igl and Gregory Farquhar and Jelena Luketina and Wendelin Boehmer and Shimon Whiteson},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=Qun8fv4qSby}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "pdf_size": 0, "rating": "5;5;7;8", "confidence": "3;5;4;4", "wc_review": "176;236;339;194", "wc_reply_reviewers": "0;183;105;0", "wc_reply_authors": "326;864;189;61", "reply_reviewers": "0;1;1;0", "reply_authors": "1;2;1;1", "rating_avg": [ 6.25, 1.299038105676658 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 236.25, 63.1916727108881 ], "wc_reply_reviewers_avg": [ 72.0, 77.10058365537839 ], "wc_reply_authors_avg": [ 360.0, 305.7016519418892 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 104, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=7487709415799366671&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 9, "pdf": "https://openreview.net/pdf?id=Qun8fv4qSby", "email": "oxford.ac.uk;google.com;deepmind.com;tudelft.nl;ox.ac.uk", "author_num": 5, "aff_unique_index": "0;1;1;2;0", "aff_unique_norm": "University of Oxford;Google;Delft University of Technology", "aff_unique_dep": ";Google DeepMind;", "aff_unique_url": "https://www.ox.ac.uk;https://deepmind.com;https://www.tudelft.nl", "aff_unique_abbr": "Oxford;DeepMind;TU Delft", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1;0", "aff_country_unique": "United Kingdom;Netherlands" }, { "id": "QxQkG-gIKJM", "title": "Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning", "track": "main", "status": "Reject", "tldr": "", "abstract": "Optimism in the face of uncertainty is a principled approach for provably efficient exploration for reinforcement learning in tabular and linear settings. However, such an approach is challenging in developing practical exploration algorithms for Deep Reinforcement Learning (DRL). To address this problem, we propose an Optimistic Exploration algorithm with Backward Bootstrapped Bonus (OEB3) for DRL by following these two principles. OEB3 is built on bootstrapped deep $Q$-learning, a non-parametric posterior sampling method for temporally-extended exploration. Based on such a temporally-extended exploration, we construct an UCB-bonus indicating the uncertainty of $Q$-functions. The UCB-bonus is further utilized to estimate an optimistic $Q$-value, which encourages the agent to explore the scarcely visited states and actions to reduce uncertainty. In the estimation of $Q$-function, we adopt an episodic backward update strategy to propagate the future uncertainty to the estimated $Q$-function consistently. Extensive evaluations show that OEB3 outperforms several state-of-the-art exploration approaches in Mnist maze and 49 Atari games.", "keywords": "optimistic exploration;backward bootstrapped bonus;posterior sampling;reinforcement learning", "primary_area": "", "supplementary_material": "/attachment/a31cce40ee4a847fbff7763023090a64d57aa87b.zip", "author": "Chenjia Bai;Lingxiao Wang;Peng Liu;Zhaoran Wang;Jianye HAO;Yingnan Zhao", "authorids": "~Chenjia_Bai2;~Lingxiao_Wang6;~Peng_Liu5;~Zhaoran_Wang1;~Jianye_HAO1;~Yingnan_Zhao1", "gender": "M;M;M;Not Specified;M;M", "homepage": "https://baichenjia.github.io/;;https://homepage.hit.edu.cn/liupeng;https://zhaoranwang.github.io/;http://www.icdai.org/jianye.html;", "dblp": "247/1943;140/1229;21/6121-8;117/2756;21/7664.html;", "google_scholar": "Rm_1y2kAAAAJ;;;https://scholar.google.com.tw/citations?user=HSx0BgQAAAAJ;;NMgYY5cAAAAJ", "orcid": ";;;;0000-0002-0422-8235;", "linkedin": ";;;;;", "or_profile": "~Chenjia_Bai2;~Lingxiao_Wang6;~Peng_Liu5;~Zhaoran_Wang1;~Jianye_HAO1;~Yingnan_Zhao1", "aff": "Harbin institute of technology;Northwestern University;Harbin Institute of Technology;;Tianjin University;", "aff_domain": "hit.edu.cn;northwestern.edu;hit.edu.cn;;tju.edu.cn;", "position": "PhD student;PhD student;Professor;;Associate Professor;", "bibtex": "@misc{\nbai2021optimistic,\ntitle={Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning},\nauthor={Chenjia Bai and Lingxiao Wang and Peng Liu and Zhaoran Wang and Jianye HAO and Yingnan Zhao},\nyear={2021},\nurl={https://openreview.net/forum?id=QxQkG-gIKJM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer4;AnonReviewer1;AnonReviewer2;AnonReviewer3", "site": "https://openreview.net/forum?id=QxQkG-gIKJM", "pdf_size": 0, "rating": "4;6;6;6;7", "confidence": "4;3;4;4;4", "wc_review": "1321;693;250;377;418", "wc_reply_reviewers": "198;0;0;0;0", "wc_reply_authors": "3477;278;960;933;1107", "reply_reviewers": "1;0;0;0;0", "reply_authors": "6;1;2;2;2", "rating_avg": [ 5.8, 0.9797958971132712 ], "confidence_avg": [ 3.8, 0.39999999999999997 ], "wc_review_avg": [ 611.8, 382.92213307668703 ], "wc_reply_reviewers_avg": [ 39.6, 79.2 ], "wc_reply_authors_avg": [ 1351.0, 1100.758465786205 ], "reply_reviewers_avg": [ 0.2, 0.4000000000000001 ], "reply_authors_avg": [ 2.6, 1.7435595774162693 ], "replies_avg": [ 24, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.10206207261596574, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:cXM67jEGjUMJ:scholar.google.com/&scioq=Optimistic+Exploration+with+Backward+Bootstrapped+Bonus+for+Deep+Reinforcement+Learning&hl=en&as_sdt=0,33", "gs_version_total": 2, "aff_unique_index": "0;1;0;2", "aff_unique_norm": "Harbin Institute of Technology;Northwestern University;Tianjin University", "aff_unique_dep": ";;", "aff_unique_url": "http://www.hit.edu.cn/;https://www.northwestern.edu;http://www.tju.edu.cn", "aff_unique_abbr": "HIT;NU;TJU", "aff_campus_unique_index": "0;0", "aff_campus_unique": "Harbin;", "aff_country_unique_index": "0;1;0;0", "aff_country_unique": "China;United States" }, { "id": "QzKDLiosEd", "title": "Can one hear the shape of a neural network?: Snooping the GPU via Magnetic Side Channel", "track": "main", "status": "Reject", "tldr": "", "abstract": "We examine the magnetic flux emanating from a graphics processing unit\u2019s (GPU\u2019s) power cable, as acquired by a cheap $3 induction sensor, and find that this signal betrays the detailed topology and hyperparameters of a black-box neural network model. The attack acquires the magnetic signal for one query with unknown input values, but known input dimension and batch size. The reconstruction is possible due to the modular layer sequence in which deep neural networks are evaluated. We find that each layer component\u2019s evaluation produces an identifiable magnetic signal signature, from which layer topology, width, function type, and sequence order can be inferred using a suitably trained classifier and an optimization based on integer programming. We study the extent to which network specifications can be recovered, and consider metrics for comparing network similarity. We demonstrate the potential accuracy of this side channel attack in recovering the details for a broad range of network architectures including also random designs. We consider applications that may exploit this novel side channel exposure, such as adversarial transfer attacks. In response, we discuss countermeasures to protect against our method and other similar snooping techniques.", "keywords": "side channel;model extraction;GPU;magnetic induction;sensors", "primary_area": "", "supplementary_material": "/attachment/ba60e747781a114f7d76497c4feedfba29754cc9.zip", "author": "Henrique Teles Maia;Chang Xiao;Dingzeyu Li;Eitan Grinspun;Changxi Zheng", "authorids": "~Henrique_Teles_Maia1;~Chang_Xiao1;~Dingzeyu_Li2;~Eitan_Grinspun3;~Changxi_Zheng1", "gender": "M;M;M;;M", "homepage": "http://henrique.is/here;http://chang.engineer;http://dingzeyu.li/;http://www.dgp.toronto.edu/~eitan;http://www.cs.columbia.edu/~cxz", "dblp": ";66/10110;https://dblp.uni-trier.de/pers/hd/l/Li:Dingzeyu;;92/5285", "google_scholar": "9oRqw5YAAAAJ;QghjQNYAAAAJ;BmaJwicAAAAJ;-HyEryoAAAAJ;-0rEuLgAAAAJ", "orcid": ";;;;", "linkedin": "henrique-t-maia;;;;", "or_profile": "~Henrique_Teles_Maia1;~Chang_Xiao1;~Dingzeyu_Li2;~Eitan_Grinspun3;~Changxi_Zheng1", "aff": "Columbia University;Columbia University;Adobe Research;University of Toronto;Columbia University", "aff_domain": "columbia.edu;columbia.edu;adobe.com;toronto.edu;cs.columbia.edu", "position": "PhD student;PhD student;Research Scientist;Full Professor;Associate Professor", "bibtex": "@misc{\nmaia2021can,\ntitle={Can one hear the shape of a neural network?: Snooping the {\\{}GPU{\\}} via Magnetic Side Channel},\nauthor={Henrique Teles Maia and Chang Xiao and Dingzeyu Li and Eitan Grinspun and Changxi Zheng},\nyear={2021},\nurl={https://openreview.net/forum?id=QzKDLiosEd}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=QzKDLiosEd", "pdf_size": 0, "rating": "4;5;7", "confidence": "4;4;4", "wc_review": "162;344;860", "wc_reply_reviewers": "0;0;157", "wc_reply_authors": "272;1256;1537", "reply_reviewers": "0;0;1", "reply_authors": "2;3;3", "rating_avg": [ 5.333333333333333, 1.247219128924647 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 455.3333333333333, 295.63190325508214 ], "wc_reply_reviewers_avg": [ 52.333333333333336, 74.01050976419197 ], "wc_reply_authors_avg": [ 1021.6666666666666, 542.3653954874169 ], "reply_reviewers_avg": [ 0.3333333333333333, 0.4714045207910317 ], "reply_authors_avg": [ 2.6666666666666665, 0.4714045207910317 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 33, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12832339623990344031&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 6, "aff_unique_index": "0;0;1;2;0", "aff_unique_norm": "Columbia University;Adobe;University of Toronto", "aff_unique_dep": ";Adobe Research;", "aff_unique_url": "https://www.columbia.edu;https://research.adobe.com;https://www.utoronto.ca", "aff_unique_abbr": "Columbia;Adobe;U of T", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0;1;0", "aff_country_unique": "United States;Canada" }, { "title": "Adaptive Extra-Gradient Methods for Min-Max Optimization and Games", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2773", "id": "R0a0kFI3dJx", "poster": "", "openreview": "https://openreview.net/forum?id=R0a0kFI3dJx", "slides": "https://iclr.cc/virtual/2021/poster/2773", "video": "https://iclr.cc/virtual/2021/poster/2773", "author_site": "Kimon ANTONAKOPOULOS, E. Belmega, Panayotis Mertikopoulos", "tldr": "", "abstract": "We present a new family of min-max optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations to perform more informative extra-gradient steps in later ones.\nThanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not, without requiring any prior tuning by the optimizer.\nAs a result, the algorithm simultaneously achieves order-optimal convergence rates, \\ie it converges to an $\\varepsilon$-optimal solution within $\\mathcal{O}(1/\\varepsilon)$ iterations in smooth problems, and within $\\mathcal{O}(1/\\varepsilon^2)$ iterations in non-smooth ones. Importantly, these guarantees do not require any of the standard boundedness or Lipschitz continuity conditions that are typically assumed in the literature; in particular, they apply even to problems with singularities (such as resource allocation problems and the like). This adaptation is achieved through the use of a geometric apparatus based on Finsler metrics and a suitably chosen mirror-prox template that allows us to derive sharp convergence rates for the methods at hand.", "keywords": "min-max optimization;games;mirror-prox;adaptive methods;regime agnostic methods", "primary_area": "", "supplementary_material": "/attachment/bc077ec0a7b6179cd4888ef11d808bdf2580db7a.zip", "author": "Kimon Antonakopoulos;Veronica Belmega;Panayotis Mertikopoulos", "authorids": "~Kimon_Antonakopoulos1;~Veronica_Belmega1;~Panayotis_Mertikopoulos1", "gender": "M;M;", "homepage": ";http://polaris.imag.fr/panayotis.mertikopoulos/;https://sites.google.com/site/evbelmega/", "dblp": "https://dblp.org/pers/hd/a/Antonakopoulos:Kimon;49/6721;https://dblp.uni-trier.de/pers/hd/b/Belmega:Elena_Veronica", "google_scholar": ";xsusqPYAAAAJ;https://scholar.google.fr/citations?user=ODy3eccAAAAJ", "orcid": ";0000-0003-2026-9616;", "linkedin": ";;elena-veronica-belmega-0844262a/en", "or_profile": "~Kimon_Antonakopoulos1;~Panayotis_Mertikopoulos1;~E._Veronica_Belmega1", "aff": ";French National Center for Scientific Research;ETIS", "aff_domain": ";imag.fr;ensea.fr", "position": ";Principal Researcher;Associate Professor", "bibtex": "@inproceedings{\nantonakopoulos2021adaptive,\ntitle={Adaptive Extra-Gradient Methods for Min-Max Optimization and Games},\nauthor={Kimon Antonakopoulos and Veronica Belmega and Panayotis Mertikopoulos},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=R0a0kFI3dJx}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer3;AnonReviewer1;AnonReviewer4", "pdf_size": 0, "rating": "5;6;7;7", "confidence": "4;2;4;4", "wc_review": "887;339;495;1942", "wc_reply_reviewers": "964;0;128;0", "wc_reply_authors": "2143;390;541;1356", "reply_reviewers": "3;0;1;0", "reply_authors": "4;1;2;2", "rating_avg": [ 6.25, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.8660254037844386 ], "wc_review_avg": [ 915.75, 625.2373049490889 ], "wc_reply_reviewers_avg": [ 273.0, 402.3568068269754 ], "wc_reply_authors_avg": [ 1107.5, 701.7373083996604 ], "reply_reviewers_avg": [ 1.0, 1.224744871391589 ], "reply_authors_avg": [ 2.25, 1.0897247358851685 ], "replies_avg": [ 18, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.17407765595569782, "gs_citation": 58, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=5985799898732689606&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 15, "pdf": "https://openreview.net/pdf?id=R0a0kFI3dJx", "email": ";imag.fr;ensea.fr", "author_num": 3, "aff_unique_index": "0;1", "aff_unique_norm": "French National Center for Scientific Research;ETIS", "aff_unique_dep": ";", "aff_unique_url": "https://www.cnrs.fr;", "aff_unique_abbr": "CNRS;", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0", "aff_country_unique": "France;" }, { "title": "DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation", "status": "Poster", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2650", "id": "R2ZlTVPx0Gk", "poster": "", "openreview": "https://openreview.net/forum?id=R2ZlTVPx0Gk", "slides": "https://iclr.cc/virtual/2021/poster/2650", "video": "https://iclr.cc/virtual/2021/poster/2650", "author_site": "Alexandre Rame, MATTHIEU CORD", "tldr": "", "abstract": "Deep ensembles perform better than a single network thanks to the diversity among their members. Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members\u2019 performances. In this paper, we argue that learning strategies for deep ensembles need to tackle the trade-off between ensemble diversity and individual accuracies. Motivated by arguments from information theory and leveraging recent advances in neural estimation of conditional mutual information, we introduce a novel training criterion called DICE: it increases diversity by reducing spurious correlations among features. The main idea is that features extracted from pairs of members should only share information useful for target class prediction without being conditionally redundant. Therefore, besides the classification loss with information bottleneck, we adversarially prevent features from being conditionally predictable from each other. We manage to reduce simultaneous errors while protecting class information. We obtain state-of-the-art accuracy results on CIFAR-10/100: for example, an ensemble of 5 networks trained with DICE matches an ensemble of 7 networks trained independently. We further analyze the consequences on calibration, uncertainty estimation, out-of-distribution detection and online co-distillation.", "keywords": "Deep Learning;Deep Ensembles;Information Theory;Information Bottleneck;Adversarial Learning", "primary_area": "", "supplementary_material": "", "author": "Alexandre Rame;Matthieu Cord", "authorids": "~Alexandre_Rame1;~Matthieu_Cord1", "gender": "M;M", "homepage": "https://alexrame.github.io/;https://cord.isir.upmc.fr/", "dblp": ";68/3117", "google_scholar": "7znwivwAAAAJ;SpAotDcAAAAJ", "orcid": ";", "linkedin": "alexandre-ram%C3%A9-05259587;", "or_profile": "~Alexandre_Rame1;~Matthieu_Cord1", "aff": "Universit\u00e9 Pierre et Marie Curie - Paris 6, Sorbonne Universit\u00e9 - Facult\u00e9 des Sciences (Paris VI);Sorbonne Universit\u00e9", "aff_domain": "isir.upmc.fr;isir.upmc.fr", "position": "PhD student;Full Professor", "bibtex": "@inproceedings{\nrame2021dice,\ntitle={{\\{}DICE{\\}}: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation},\nauthor={Alexandre Rame and Matthieu Cord},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=R2ZlTVPx0Gk}\n}", "github": "", "project": "", "reviewers": "AnonReviewer1;AnonReviewer4;AnonReviewer2;AnonReviewer3", "pdf_size": 0, "rating": "6;6;7;8", "confidence": "3;3;4;4", "wc_review": "516;146;507;539", "wc_reply_reviewers": "0;0;13;64", "wc_reply_authors": "772;335;806;773", "reply_reviewers": "0;0;1;1", "reply_authors": "1;1;2;2", "rating_avg": [ 6.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 427.0, 162.65454189785171 ], "wc_reply_reviewers_avg": [ 19.25, 26.37588861062315 ], "wc_reply_authors_avg": [ 671.5, 194.7594670356232 ], "reply_reviewers_avg": [ 0.5, 0.5 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.9045340337332909, "gs_citation": 71, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=8505371482318657761&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 5, "pdf": "https://openreview.net/pdf?id=R2ZlTVPx0Gk", "email": "isir.upmc.fr;isir.upmc.fr", "author_num": 2, "aff_unique_index": "0;1", "aff_unique_norm": "Universit\u00e9 Pierre et Marie Curie - Paris 6;Sorbonne Universit\u00e9", "aff_unique_dep": "Facult\u00e9 des Sciences;", "aff_unique_url": "https://www.upmc.fr;https://www.sorbonne-universite.fr", "aff_unique_abbr": "UPMC;Sorbonne U", "aff_campus_unique_index": "0", "aff_campus_unique": "Paris;", "aff_country_unique_index": "0;0", "aff_country_unique": "France" }, { "id": "R3a2G2tSf3c", "title": "Graph-Graph Similarity Network", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Graph classification aims to predict the class label for an entire graph. Recently, Graph Neural Networks (GNNs)-based approaches become an essential strand to learn low-dimensional continuous embeddings of the entire graphs for graph label prediction. While GNNs explicitly aggregate the neighborhood information and implicitly capture the topological structure for graph representation, they ignore the relationships among graphs. In this paper, we propose a Graph-Graph Similarity Network to tackle the graph classification problem by constructing a SuperGraph through learning the relationships among graphs. Each node in the SuperGraph represents an input graph, and the weights of edges denote the similarity between graphs. By this means, the graph classification is then transformed into a classical node classification problem. Specifically, we employ an Adversarial Autoencoder to align embeddings of all the graphs to a same distribution. After the alignment, we design the Graph-Graph Similarity Network to learn the similarity between graphs, which function as the adjacency matrix of the SuperGraph. By running node classification algorithms on the SuperGraph, we can predict the labels of graphs. Experiments on five widely used benchmarks under a fair setting demonstrate the effectiveness of our method.", "keywords": "Machine Learning;Graph Classification", "primary_area": "", "supplementary_material": "", "author": "Han Yue;Pengyu Hong;Hongfu Liu", "authorids": "~Han_Yue2;~Pengyu_Hong1;~Hongfu_Liu2", "gender": "M;M;M", "homepage": ";http://www.cs.brandeis.edu/~hong/;http://hongfuliu.com/", "dblp": ";89/4734;32/9075-1", "google_scholar": "BkUhc7gAAAAJ;https://scholar.google.com.tw/citations?user=pvDa8pcAAAAJ;https://scholar.google.com/citations?hl=en", "orcid": "0000-0003-4146-0436;0000-0002-3177-2754;", "linkedin": ";;", "or_profile": "~Han_Yue2;~Pengyu_Hong1;~Hongfu_Liu2", "aff": "Brandeis University;Brandeis University;Brandeis University", "aff_domain": "brandeis.edu;brandeis.edu;brandeis.edu", "position": "PhD student;Full Professor;Assistant Professor", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=R3a2G2tSf3c", "pdf_size": 0, "rating": "2;4;5;5", "confidence": "5;4;5;4", "wc_review": "160;472;429;295", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.0, 1.224744871391589 ], "confidence_avg": [ 4.5, 0.5 ], "wc_review_avg": [ 339.0, 122.23542857944254 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": -0.40824829046386296, "gs_citation": 3, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=12666150387285524532&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 8, "aff_unique_index": "0;0;0", "aff_unique_norm": "Brandeis University", "aff_unique_dep": "", "aff_unique_url": "https://www.brandeis.edu", "aff_unique_abbr": "Brandeis", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;0", "aff_country_unique": "United States" }, { "id": "R43miizWtUN", "title": "Analysing the Update step in Graph Neural Networks via Sparsification", "track": "main", "status": "Reject", "tldr": "", "abstract": "In recent years, Message-Passing Neural Networks (MPNNs), the most prominent Graph Neural Network (GNN) framework, have celebrated much success in the analysis of graph-structured data. In MPNNs the computations are split into three steps, Aggregation, Update and Readout. In this paper a series of models to successively sparsify the linear transform in the Update step is proposed. Specifically, the ExpanderGNN model with a tuneable sparsification rate and the Activation-Only GNN, which has no linear transform in the Update step, are proposed. In agreement with a growing trend in the relevant literature the sparsification paradigm is changed by initialising sparse neural network architectures rather than expensively sparsifying already trained architectures. These novel benchmark models enable a better understanding of the influence of the Update step on model performance and outperform existing simplified benchmark models such as the Simple Graph Convolution (SGC). The ExpanderGNNs, and in some cases the Activation-Only models, achieve performance on par with their vanilla counterparts on several down-stream graph prediction tasks, while containing exponentially fewer trainable parameters. In experiments with matching parameter numbers our benchmark models outperform the state-of-the-art GNNs models. These observations enable us to conclude that in practice the Update step often makes no positive contribution to the model performance.", "keywords": "graph neural network architectures;message-passing neural networks;neural network sparsification;deep learning", "primary_area": "", "supplementary_material": "/attachment/b21c4960e360c3de76efb4cdf2017dd34f94064b.zip", "author": "changmin wu;Johannes F. Lutzeyer;Michalis Vazirgiannis", "authorids": "~changmin_wu1;~Johannes_F._Lutzeyer1;~Michalis_Vazirgiannis2", "gender": ";M;", "homepage": ";https://johanneslutzeyer.com/;", "dblp": ";253/8868;", "google_scholar": ";OfT4ns8AAAAJ;", "orcid": ";;", "linkedin": ";johannes-lutzeyer-213b7480/;", "or_profile": "~changmin_wu1;~Johannes_F._Lutzeyer1;~Michalis_Vazirgiannis2", "aff": ";Ecole Polytechnique;", "aff_domain": ";polytechnique.edu;", "position": ";Postdoc;", "bibtex": "@misc{\nwu2021analysing,\ntitle={Analysing the Update step in Graph Neural Networks via Sparsification},\nauthor={changmin wu and Johannes F. Lutzeyer and Michalis Vazirgiannis},\nyear={2021},\nurl={https://openreview.net/forum?id=R43miizWtUN}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer1;AnonReviewer4;AnonReviewer2", "site": "https://openreview.net/forum?id=R43miizWtUN", "pdf_size": 0, "rating": "4;4;5;6", "confidence": "4;3;3;4", "wc_review": "666;255;500;760", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1157;837;968;852", "reply_reviewers": "0;0;0;0", "reply_authors": "2;2;2;2", "rating_avg": [ 4.75, 0.82915619758885 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 545.25, 191.69686356328316 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 953.5, 127.96190839464688 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.0 ], "replies_avg": [ 13, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.30151134457776363, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:CyGX4bjwYG8J:scholar.google.com/&scioq=Analysing+the+Update+step+in+Graph+Neural+Networks+via+Sparsification&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Ecole Polytechnique", "aff_unique_dep": "", "aff_unique_url": "https://www.polytechnique.edu", "aff_unique_abbr": "X", "aff_country_unique_index": "0", "aff_country_unique": "France" }, { "title": "Iterative Empirical Game Solving via Single Policy Best Response", "status": "Spotlight", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/2662", "id": "R4aWTjmrEKM", "poster": "", "openreview": "https://openreview.net/forum?id=R4aWTjmrEKM", "slides": "https://iclr.cc/virtual/2021/poster/2662", "video": "https://iclr.cc/virtual/2021/poster/2662", "author_site": "Max Smith, Thomas Anthony, Michael Wellman", "tldr": "", "abstract": "Policy-Space Response Oracles (PSRO) is a general algorithmic framework for learning policies in multiagent systems by interleaving empirical game analysis with deep reinforcement learning (DRL).\nAt each iteration, DRL is invoked to train a best response to a mixture of opponent policies.\nThe repeated application of DRL poses an expensive computational burden as we look to apply this algorithm to more complex domains.\nWe introduce two variations of PSRO designed to reduce the amount of simulation required during DRL training.\nBoth algorithms modify how PSRO adds new policies to the empirical game, based on learned responses to a single opponent policy.\nThe first, Mixed-Oracles, transfers knowledge from previous iterations of DRL, requiring training only against the opponent's newest policy.\nThe second, Mixed-Opponents, constructs a pure-strategy opponent by mixing existing strategy's action-value estimates, instead of their policies.\nLearning against a single policy mitigates conflicting experiences on behalf of a learner facing an unobserved distribution of opponents.\nWe empirically demonstrate that these algorithms substantially reduce the amount of simulation during training required by PSRO, while producing equivalent or better solutions to the game.", "keywords": "Empirical Game Theory;Reinforcement Learning;Multiagent Learning", "primary_area": "", "supplementary_material": "/attachment/75a3f6e06a8bfbaddae1098cef55e57139dca562.zip", "author": "Max Smith;Thomas Anthony;Michael Wellman", "authorids": "~Max_Smith1;~Thomas_Anthony1;~Michael_Wellman1", "gender": "M;;M", "homepage": "https://www.maxosmith.com;;https://strategicreasoning.org/michael-p-wellman", "dblp": "275/3418;169/3283;w/MichaelPWellman", "google_scholar": "gc1jnZ4AAAAJ;;https://scholar.google.com.tw/citations?user=UruIct4AAAAJ", "orcid": ";;0000-0002-1691-6844", "linkedin": ";;https://linkedin.com/in/michael-wellman-23ab1", "or_profile": "~Max_Smith1;~Thomas_Anthony1;~Michael_Wellman1", "aff": "University of Michigan;Google DeepMind;University of Michigan", "aff_domain": "umich.edu;deepmind.com;umich.edu", "position": "PhD student;Research Scientist;Full Professor", "bibtex": "@inproceedings{\nsmith2021iterative,\ntitle={Iterative Empirical Game Solving via Single Policy Best Response},\nauthor={Max Smith and Thomas Anthony and Michael Wellman},\nbooktitle={International Conference on Learning Representations},\nyear={2021},\nurl={https://openreview.net/forum?id=R4aWTjmrEKM}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "pdf_size": 0, "rating": "7;7;7;7", "confidence": "2;4;2;4", "wc_review": "146;309;204;350", "wc_reply_reviewers": "0;42;25;254", "wc_reply_authors": "232;407;366;771", "reply_reviewers": "0;1;1;2", "reply_authors": "1;2;1;2", "rating_avg": [ 7.0, 0.0 ], "confidence_avg": [ 3.0, 1.0 ], "wc_review_avg": [ 252.25, 81.2292281140231 ], "wc_reply_reviewers_avg": [ 80.25, 101.42084351847997 ], "wc_reply_authors_avg": [ 444.0, 199.57830543423302 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 1.5, 0.5 ], "replies_avg": [ 15, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=132394109201382150&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 4, "pdf": "https://openreview.net/pdf?id=R4aWTjmrEKM", "email": "umich.edu;deepmind.com;umich.edu", "author_num": 3, "aff_unique_index": "0;1;0", "aff_unique_norm": "University of Michigan;Google", "aff_unique_dep": ";Google DeepMind", "aff_unique_url": "https://www.umich.edu;https://deepmind.com", "aff_unique_abbr": "UM;DeepMind", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;1;0", "aff_country_unique": "United States;United Kingdom" }, { "id": "R5M7Mxl1xZ", "title": "Minimal Geometry-Distortion Constraint for Unsupervised Image-to-Image Translation", "track": "main", "status": "Reject", "tldr": "", "abstract": "Unsupervised image-to-image (I2I) translation, which aims to learn a domain mapping function without paired data, is very challenging because the function is highly under-constrained. Despite the significant progress in constraining the mapping function, current methods suffer from the \\textit{geometry distortion} problem: the geometry structure of the translated image is inconsistent with the input source image, which may cause the undesired distortions in the translated images. To remedy this issue, we propose a novel I2I translation constraint, called \\textit{Minimal Geometry-Distortion Constraint} (MGC), which promotes the consistency of geometry structures and reduce the unwanted distortions in translation by reducing the randomness of color transformation in the translation process. To facilitate estimation and maximization of MGC, we propose an approximate representation of mutual information called relative Squared-loss Mutual Information (rSMI) that can be efficiently estimated analytically. We demonstrate the effectiveness of our MGC by providing quantitative and qualitative comparisons with the state-of-the-art methods on several benchmark datasets.\n", "keywords": "Unsupervised image translation;Geometry distortion", "primary_area": "", "supplementary_material": "/attachment/6811c3d4c0d1e0ce40a2ba0f1996b226beb98cdd.zip", "author": "Jiaxian Guo;Jiachen Li;Mingming Gong;Huan Fu;Kun Zhang;Dacheng Tao", "authorids": "~Jiaxian_Guo2;~Jiachen_Li4;~Mingming_Gong1;~Huan_Fu1;~Kun_Zhang1;~Dacheng_Tao1", "gender": "M;;M;M;M;", "homepage": ";;https://mingming-gong.github.io/;https://huan-fu.github.io/;http://www.andrew.cmu.edu/user/kunz1/;", "dblp": "206/6264;;98/8479;139/8082;96/3115-1;", "google_scholar": "wQgPocEAAAAJ;;https://scholar.google.com.au/citations?user=6BmiCJIAAAAJ;https://scholar.google.com/citations?hl=en;RGoypN4AAAAJ;", "orcid": ";;0000-0001-7147-5589;;;", "linkedin": ";;;;;", "or_profile": "~Jiaxian_Guo2;~Jiachen_Li4;~Mingming_Gong1;~Huan_Fu1;~Kun_Zhang1;~Dacheng_Tao1", "aff": "University of Sydney;;University of Melbourne;Alibaba Group;Carnegie Mellon University;", "aff_domain": "sydney.edu.au;;unimelb.edu.au;alibaba-inc.com;cmu.edu;", "position": "PhD student;;Assistant Professor;Researcher;Associate Professor;", "bibtex": "@misc{\nguo2021minimal,\ntitle={Minimal Geometry-Distortion Constraint for Unsupervised Image-to-Image Translation},\nauthor={Jiaxian Guo and Jiachen Li and Mingming Gong and Huan Fu and Kun Zhang and Dacheng Tao},\nyear={2021},\nurl={https://openreview.net/forum?id=R5M7Mxl1xZ}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=R5M7Mxl1xZ", "pdf_size": 0, "rating": "4;4;7;7", "confidence": "5;4;4;3", "wc_review": "403;404;289;256", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "1457;1047;837;405", "reply_reviewers": "0;0;0;0", "reply_authors": "3;2;2;1", "rating_avg": [ 5.5, 1.5 ], "confidence_avg": [ 4.0, 0.7071067811865476 ], "wc_review_avg": [ 338.0, 66.53194721335007 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 936.5, 379.3161610055654 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 2.0, 0.7071067811865476 ], "replies_avg": [ 14, 0 ], "authors#_avg": [ 6, 0 ], "corr_rating_confidence": -0.7071067811865476, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=17023048567190622443&as_sdt=2005&sciodt=0,5&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;3", "aff_unique_norm": "University of Sydney;University of Melbourne;Alibaba Group;Carnegie Mellon University", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.sydney.edu.au;https://www.unimelb.edu.au;https://www.alibaba.com;https://www.cmu.edu", "aff_unique_abbr": "USYD;UniMelb;Alibaba;CMU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0;1;2", "aff_country_unique": "Australia;China;United States" }, { "id": "R6tNszN_QfA", "title": "Adversarial Problems for Generative Networks", "track": "main", "status": "Reject", "tldr": "", "abstract": "We are interested in the design of generative networks. The training of these mathematical structures is mostly performed with the help of adversarial (min-max) optimization problems. We propose a simple methodology for constructing such problems assuring, at the same time, consistency of the corresponding solution. We give characteristic examples developed by our method, some of which can be recognized from other applications and some are introduced here for the first time. We compare various possibilities by applying them to well known datasets using neural networks of different configurations and sizes.", "keywords": "generative networks;adversarial generative networks", "primary_area": "", "supplementary_material": "", "author": "Kalliopi Basioti;George V. Moustakides", "authorids": "~Kalliopi_Basioti1;moustaki@upatras.gr", "gender": ";", "homepage": ";", "dblp": ";", "google_scholar": ";", "orcid": ";", "linkedin": ";", "or_profile": ";", "aff": ";", "aff_domain": ";", "position": ";", "bibtex": "@misc{\nbasioti2021adversarial,\ntitle={Adversarial Problems for Generative Networks},\nauthor={Kalliopi Basioti and George V. Moustakides},\nyear={2021},\nurl={https://openreview.net/forum?id=R6tNszN_QfA}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer1", "site": "https://openreview.net/forum?id=R6tNszN_QfA", "pdf_size": 0, "rating": "4;4;6;7", "confidence": "3;4;4;3", "wc_review": "648;406;199;483", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "280;268;276;101", "reply_reviewers": "0;0;0;0", "reply_authors": "1;1;1;1", "rating_avg": [ 5.25, 1.299038105676658 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 434.0, 161.40477068537967 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 231.25, 75.32388399438786 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 9, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.19245008972987526, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:O-JaAaNXSlIJ:scholar.google.com/&scioq=Adversarial+Problems+for+Generative+Networks&hl=en&as_sdt=0,5", "gs_version_total": 0 }, { "id": "R7aFOrR0b2", "title": "Dataset Curation Beyond Accuracy", "track": "main", "status": "Reject", "tldr": "", "abstract": "Neural networks are known to be data-hungry, and collecting large labeled datasets is often a crucial step in deep learning deployment. Researchers have studied dataset aspects such as distributional shift and labeling cost, primarily using downstream prediction accuracy for evaluation. In sensitive real-world applications such as medicine and self-driving cars, not only is the accuracy important, but also the calibration -- the extent that model uncertainty reflects the actual correctness likelihood. It has recently been shown that modern neural networks are ill-calibrated. In this work, we take a complementary approach -- studying how dataset properties, rather than architecture, affects calibration. For the common issue of dataset imbalance, we show that calibration varies significantly among classes, even when common strategies to mitigate class imbalance are employed. We also study the effects of label quality, showing how label noise dramatically increases calibration error. Furthermore, poor calibration can come from small dataset sizes, which we motive via results on network expressivity. Our experiments demonstrate that dataset properties can significantly affect calibration and suggest that calibration should be measured during dataset curation.", "keywords": "crowd-sourcing;calibration;dataset;uncertainty", "primary_area": "", "supplementary_material": "", "author": "Johan Bjorck;Carla P Gomes", "authorids": "~Johan_Bjorck2;~Carla_P_Gomes1", "gender": "M;", "homepage": "https://nilsjohanbjorck.github.io/;", "dblp": "188/6399;", "google_scholar": "https://scholar.google.com/citations?hl=en;", "orcid": ";", "linkedin": ";", "or_profile": "~Johan_Bjorck2;~Carla_P_Gomes1", "aff": "Cornell University;", "aff_domain": "cornell.edu;", "position": "PhD student;", "bibtex": "@misc{\nbjorck2021dataset,\ntitle={Dataset Curation Beyond Accuracy},\nauthor={Johan Bjorck and Carla P Gomes},\nyear={2021},\nurl={https://openreview.net/forum?id=R7aFOrR0b2}\n}", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer2;AnonReviewer1;AnonReviewer4", "site": "https://openreview.net/forum?id=R7aFOrR0b2", "pdf_size": 0, "rating": "4;4;4;6", "confidence": "4;3;4;3", "wc_review": "352;300;711;137", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 4.5, 0.8660254037844386 ], "confidence_avg": [ 3.5, 0.5 ], "wc_review_avg": [ 375.0, 209.57934058489639 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 6, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:6H9UlICBBcMJ:scholar.google.com/&scioq=Dataset+Curation+Beyond+Accuracy&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0", "aff_unique_norm": "Cornell University", "aff_unique_dep": "", "aff_unique_url": "https://www.cornell.edu", "aff_unique_abbr": "Cornell", "aff_country_unique_index": "0", "aff_country_unique": "United States" }, { "id": "RB0iNPXIj60", "title": "BBRefinement: an universal scheme to improve precision of box object detectors", "track": "main", "status": "Reject", "tldr": "", "abstract": "We present a conceptually simple yet powerful and flexible scheme for refining predictions of bounding boxes. Our approach is trained standalone on GT boxes and can then be combined with an object detector to improve its predictions. The method, called BBRefinement, uses mixture data of image information and the object's class and center. Due to the transformation of the problem into a domain where BBRefinement does not care about multiscale detection, recognition of the object's class, computing confidence, or multiple detections, the training is much more effective. It results in the ability to refine even COCO's ground truth labels into a more precise form. BBRefinement improves the performance of SOTA architectures up to 2mAP points on the COCO dataset in the benchmark. The refinement process is fast; it adds 50-80ms overhead to a standard detector using RTX2080, so it can run in real-time on standard hardware. The code is available at https://gitlab.com/irafm-ai/bb-refinement.", "keywords": "object detection;deep neural networks;refinement", "primary_area": "", "supplementary_material": "", "author": "Petr Hurtik;Marek Vajgl", "authorids": "~Petr_Hurtik1;~Marek_Vajgl1", "gender": "M;M", "homepage": "https://gitlab.com/irafm-ai;http://www.osu.cz", "dblp": ";", "google_scholar": "https://scholar.google.com/citations?hl=cs;", "orcid": "0000-0003-4349-9705;0000-0002-4751-7117", "linkedin": ";", "or_profile": "~Petr_Hurtik1;~Marek_Vajgl1", "aff": "University of Ostrava;University of Ostrava", "aff_domain": "osu.cz;osu.cz", "position": "Postdoc;Postdoc", "bibtex": "@misc{\nhurtik2021bbrefinement,\ntitle={{\\{}BBR{\\}}efinement: an universal scheme to improve precision of box object detectors},\nauthor={Petr Hurtik and Marek Vajgl},\nyear={2021},\nurl={https://openreview.net/forum?id=RB0iNPXIj60}\n}", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer2;AnonReviewer3;AnonReviewer5", "site": "https://openreview.net/forum?id=RB0iNPXIj60", "pdf_size": 0, "rating": "2;2;4;4", "confidence": "5;5;5;4", "wc_review": "529;101;360;419", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "918;122;442;593", "reply_reviewers": "0;0;0;0", "reply_authors": "2;1;1;1", "rating_avg": [ 3.0, 1.0 ], "confidence_avg": [ 4.75, 0.4330127018922193 ], "wc_review_avg": [ 352.25, 157.22813838495958 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 518.75, 286.4501483679141 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.25, 0.4330127018922193 ], "replies_avg": [ 12, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": -0.5773502691896257, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:nPr0PlN78J4J:scholar.google.com/&scioq=BBRefinement:+an+universal+scheme+to+improve+precision+of+box+object+detectors&hl=en&as_sdt=0,5", "gs_version_total": 5, "aff_unique_index": "0;0", "aff_unique_norm": "University of Ostrava", "aff_unique_dep": "", "aff_unique_url": "https://www.osu.cz", "aff_unique_abbr": "", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "Czech Republic" }, { "id": "RCGBA1i5MF", "title": "The Unreasonable Effectiveness of the Class-reversed Sampling in Tail Sample Memorization", "track": "main", "status": "Reject", "tldr": "", "abstract": "Long-tailed visual recognition poses significant challenges to traditional machine learning and emerging deep networks due to its inherent class imbalance. A common belief is that tail classes with few samples cannot exhibit enough regularity for pattern extraction. What makes things worse, the limited cardinality may lead to low exposure of tail classes in the training stage. Re-sampling methods, especially those who naively enlarge the exposure frequency, eventually fail with head classes under-represented and tail classes overfitted.\nArguing that long-tailed learning involves a trade-off between head class pattern extraction and tail class memorizing, we first empirically identify the regularity of classes under long-tailed distributions and find that regularity of the same training samples will be sharply decreased with the reduction of class cardinality. Motivated by the recent success of a series works on the memorization-generalization mechanism, we propose a simple yet effective training strategy by switching from instance-balanced sampling to class-reversed sampling to memorize tail classes without seriously damaging the representation of head classes. Closely after- wards, we give the theoretical generalization error upper bound to prove that class- reversed sampling is better than instance-balanced sampling during the last train- ing stage. In our experiments, the proposed method can reach the state-of-the-art performance more efficiently than current methods, on several datasets. Further experiments also validate the superior performance of the proposed sampling strategy, implying that the long-tailed learning trade-off could be effectively tackled only in the memorization stage with a small learning rate and over-exposure of tail samples.", "keywords": "long-tailed learning;re-sampling;sample memorization", "primary_area": "", "supplementary_material": "", "author": "Benyi Hu;Chi Zhang;Yuehu Liu;Le Wang;Li Liu", "authorids": "~Benyi_Hu1;colorzc@stu.xjtu.edu.cn;~Yuehu_Liu1;~Le_Wang1;~Li_Liu12", "gender": ";;M;;M", "homepage": "https://github.com/hby96;;https://gr.xjtu.edu.cn/en/web/liuyh;;", "dblp": ";;https://dblp.uni-trier.de/pid/50/6184.html;;", "google_scholar": ";;;;https://scholar.google.com/citations?hl=en", "orcid": ";;;;", "linkedin": ";;;;", "or_profile": "~Benyi_Hu1;colorzc@stu.xjtu.edu.cn;~Yuehu_Liu1;~Le_Wang1;~Li_Liu12", "aff": "Xi'an Jiaotong University;;Xi'an Jiaotong University;;", "aff_domain": "xjtu.edu.cn;;xjtu.edu.cn;;", "position": "MS student;;Full Professor;;", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer4;AnonReviewer1;AnonReviewer3;AnonReviewer2", "site": "https://openreview.net/forum?id=RCGBA1i5MF", "pdf_size": 0, "rating": "2;5;5;6", "confidence": "5;5;4;3", "wc_review": "243;474;588;294", "wc_reply_reviewers": "104;0;0;0", "wc_reply_authors": "476;669;683;321", "reply_reviewers": "1;0;0;0", "reply_authors": "3;2;2;2", "rating_avg": [ 4.5, 1.5 ], "confidence_avg": [ 4.25, 0.82915619758885 ], "wc_review_avg": [ 399.75, 138.4781119888627 ], "wc_reply_reviewers_avg": [ 26.0, 45.033320996790806 ], "wc_reply_authors_avg": [ 537.25, 149.26214355957777 ], "reply_reviewers_avg": [ 0.25, 0.4330127018922193 ], "reply_authors_avg": [ 2.25, 0.4330127018922193 ], "replies_avg": [ 19, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": -0.7035264706814485, "gs_citation": 0, "gs_cited_by_link": "https://scholar.google.com/scholar?q=related:wh4mqKkzGbMJ:scholar.google.com/&scioq=The+Unreasonable+Effectiveness+of+the+Class-reversed+Sampling+in+Tail+Sample+Memorization&hl=en&as_sdt=0,5", "gs_version_total": 0, "aff_unique_index": "0;0", "aff_unique_norm": "Xi'an Jiao Tong University", "aff_unique_dep": "", "aff_unique_url": "https://www.xjtu.edu.cn", "aff_unique_abbr": "XJTU", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "China" }, { "id": "RDiiCiIH3_B", "title": "A framework for learned CountSketch", "track": "main", "status": "Reject", "tldr": "", "abstract": "Sketching is a compression technique that can be applied to many problems to solve them quickly and approximately. The matrices used to project data to smaller dimensions are called \"sketches\". In this work, we consider the problem of optimizing sketches to obtain low approximation error over a data distribution. \n\nWe introduce a general framework for \"learning\" and applying CountSketch, a type of sparse sketch. The sketch optimization procedure has two stages: one for optimizing the placements of the sketch's non-zero entries and another for optimizing their values. Next, we provide a way to apply learned sketches that has worst-case guarantees for approximation error. \n\nWe instantiate this framework with three sketching applications: least-squares regression, low-rank approximation (LRA), and k-means clustering. Our experiments demonstrate that our approach substantially decreases approximation error compared to classical and naively learned sketches. \n\nFinally, we investigate the theoretical aspects of our approach. For regression and LRA, we show that our method obtains state-of-the art accuracy for fixed time complexity. For LRA, we prove that it is strictly better to include the first optimization stage for two standard input distributions. For k-means, we derive a more straightforward means of retaining approximation guarantees.", "keywords": "Compression;sketching", "primary_area": "", "supplementary_material": "", "author": "Simin Liu;Tianrui Liu;Ali Vakilian;Yulin Wan;David Woodruff", "authorids": "~Simin_Liu1;~Tianrui_Liu3;~Ali_Vakilian1;~Yulin_Wan2;~David_Woodruff1", "gender": ";F;;M;F", "homepage": "https://www.ri.cmu.edu/ri-people/simin-liu/;https://arxiv.org/search/cs?searchtype=author&query=Liu%2C+T;http://www.mit.edu/~vakilian/;http://www.cs.cmu.edu/~dwoodruf/;https://github.com/Entropy999", "dblp": ";;116/4679;w/DPWoodruff;", "google_scholar": ";;uXZaVaAAAAAJ;https://scholar.google.com.tw/citations?user=0G2t-6sAAAAJ;", "orcid": ";;0000-0001-5049-7594;;", "linkedin": ";;;;", "or_profile": "~Simin_Liu1;~Tianrui_Liu3;~Ali_Vakilian1;~David_Woodruff1;~Yulin_Wan1", "aff": "Carnegie Mellon University;Nankai University,;Toyota Technological Institute at Chicago;Carnegie Mellon University;Anhui University", "aff_domain": "cmu.edu;nankai.edu.cn;ttic.edu;cmu.edu;ahu.edu.cn", "position": "PhD student;Undergrad student;Postdoc;Associate Professor;Undergrad student", "bibtex": "@misc{\nliu2021a,\ntitle={A framework for learned CountSketch},\nauthor={Simin Liu and Tianrui Liu and Ali Vakilian and Yulin Wan and David Woodruff},\nyear={2021},\nurl={https://openreview.net/forum?id=RDiiCiIH3_B}\n}", "github": "", "project": "", "reviewers": "AnonReviewer5;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=RDiiCiIH3_B", "pdf_size": 0, "rating": "5;6;7", "confidence": "3;2;4", "wc_review": "311;110;765", "wc_reply_reviewers": "0;0;0", "wc_reply_authors": "544;199;296", "reply_reviewers": "0;0;0", "reply_authors": "1;1;1", "rating_avg": [ 6.0, 0.816496580927726 ], "confidence_avg": [ 3.0, 0.816496580927726 ], "wc_review_avg": [ 395.3333333333333, 273.9712069218629 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 346.3333333333333, 145.27292322460585 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 1.0, 0.0 ], "replies_avg": [ 8, 0 ], "authors#_avg": [ 5, 0 ], "corr_rating_confidence": 0.5, "gs_citation": 1, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=15195287297188141040&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 0, "aff_unique_index": "0;1;2;0;3", "aff_unique_norm": "Carnegie Mellon University;Nankai University;Toyota Technological Institute at Chicago;Anhui University", "aff_unique_dep": ";;;", "aff_unique_url": "https://www.cmu.edu;http://www.nankai.edu.cn;https://www.tti-chicago.org;http://www.ahu.edu.cn/", "aff_unique_abbr": "CMU;NKU;TTI Chicago;AHU", "aff_campus_unique_index": "1", "aff_campus_unique": ";Chicago", "aff_country_unique_index": "0;1;0;0;1", "aff_country_unique": "United States;China" }, { "id": "RDpTZpubOh7", "title": "Safety Aware Reinforcement Learning (SARL)", "track": "main", "status": "Reject", "tldr": "", "abstract": "As reinforcement learning agents become increasingly integrated into complex, real-world environments, designing for safety becomes a critical consideration. We specifically focus on researching scenarios where agents can cause undesired side effects while executing a policy on a primary task. Since one can define multiple tasks for a given environment dynamics, there are two important challenges. First, we need to abstract the concept of safety that applies broadly to that environment independent of the specific task being executed. Second, we need a mechanism for the abstracted notion of safety to modulate the actions of agents executing different policies to minimize their side-effects. In this work, we propose Safety Aware Reinforcement Learning (SARL) - a framework where a virtual safe agent modulates the actions of a main reward-based agent to minimize side effects. The safe agent learns a task-independent notion of safety for a given environment. The main agent is then trained with a regularization loss given by the distance between the native action probabilities of the two agents. Since the safe agent effectively abstracts a task-independent notion of safety via its action probabilities, it can be ported to modulate multiple policies solving different tasks within the given environment without further training. We contrast this with solutions that rely on task-specific regularization metrics and test our framework on the SafeLife Suite, based on Conway's Game of Life, comprising a number of complex tasks in dynamic environments. We show that our solution is able to match the performance of solutions that rely on task-specific side-effect penalties on both the primary and safety objectives while additionally providing the benefit of generalizability and portability. ", "keywords": "Reinforcement Learning;Safe RL;Probabilistic Distance Metrics", "primary_area": "", "supplementary_material": "/attachment/9ce988bf4d08327beaead4b02aa3d0e8f3dde750.zip", "author": "Santiago Miret;Somdeb Majumdar;Carroll Wainwright", "authorids": "~Santiago_Miret1;~Somdeb_Majumdar1;carroll@partnershiponai.org", "gender": "M;M;", "homepage": "https://www.intel.ai/bio/santiago-miret/;https://www.intel.ai/bio/somdeb-majumdar/;", "dblp": "241/5030;63/8320;", "google_scholar": "HLQ_te4AAAAJ;;", "orcid": "0000-0002-5121-3853;;", "linkedin": "santiago-miret/;somdebmajumdar/;", "or_profile": "~Santiago_Miret1;~Somdeb_Majumdar1;carroll@partnershiponai.org", "aff": "Intel;Intel;", "aff_domain": "intel.com;intel.com;", "position": "Researcher;AI/ML Researcher;", "bibtex": "@misc{\nmiret2021safety,\ntitle={Safety Aware Reinforcement Learning ({\\{}SARL{\\}})},\nauthor={Santiago Miret and Somdeb Majumdar and Carroll Wainwright},\nyear={2021},\nurl={https://openreview.net/forum?id=RDpTZpubOh7}\n}", "github": "", "project": "", "reviewers": "AnonReviewer2;AnonReviewer1;AnonReviewer4;AnonReviewer3", "site": "https://openreview.net/forum?id=RDpTZpubOh7", "pdf_size": 0, "rating": "3;4;6;6", "confidence": "4;4;4;4", "wc_review": "456;484;383;1631", "wc_reply_reviewers": "181;59;0;418", "wc_reply_authors": "788;690;197;1237", "reply_reviewers": "1;1;0;2", "reply_authors": "2;2;1;2", "rating_avg": [ 4.75, 1.299038105676658 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 738.5, 516.6026035551893 ], "wc_reply_reviewers_avg": [ 164.5, 160.25370510537346 ], "wc_reply_authors_avg": [ 728.0, 369.48815948552397 ], "reply_reviewers_avg": [ 1.0, 0.7071067811865476 ], "reply_authors_avg": [ 1.75, 0.4330127018922193 ], "replies_avg": [ 17, 0 ], "authors#_avg": [ 3, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 2, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10825303210989928419&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 4, "aff_unique_index": "0;0", "aff_unique_norm": "Intel", "aff_unique_dep": "Intel Corporation", "aff_unique_url": "https://www.intel.com", "aff_unique_abbr": "Intel", "aff_campus_unique_index": "", "aff_campus_unique": "", "aff_country_unique_index": "0;0", "aff_country_unique": "United States" }, { "id": "REKvFYIgwz9", "title": "Deep Reinforcement Learning for Optimal Stopping with Application in Financial Engineering", "track": "main", "status": "Withdraw", "tldr": "", "abstract": "Optimal stopping is the problem of deciding the right time at which to take a particular action in a stochastic system, in order to maximize an expected reward. It has many applications in areas such as finance, healthcare, and statistics. In this paper, we employ deep Reinforcement Learning (RL) to learn optimal stopping policies in two financial engineering applications: namely option pricing, and optimal option exercise. We present for the first time a comprehensive empirical evaluation of the quality of optimal stopping policies identified by three state of the art deep RL algorithms: double deep Q-learning (DDQN), categorical distributional RL (C51), and Implicit Quantile Networks (IQN). In the case of option pricing, our findings indicate that in a theoretical Black-Schole environment, IQN successfully identifies nearly optimal prices. On the other hand, it is slightly outperformed by C51 when confronted to real stock data movements in a put option exercise problem that involves assets from the S&P500 index. More importantly, the C51 algorithm is able to identify an optimal stopping policy that achieves 8% more out-of-sample returns than the best of four natural benchmark policies. We conclude with a discussion of our findings which should pave the way for relevant future research.", "keywords": "Reinforcement learning;deep learning;financial engineering;optimal stopping.", "primary_area": "", "supplementary_material": "", "author": "Abderrahim Fathan;Erick Delage", "authorids": "abderrahim.fathan@gmail.com;~Erick_Delage2", "gender": ";M", "homepage": ";http://web.hec.ca/pages/erick.delage/", "dblp": ";26/1546", "google_scholar": ";https://scholar.google.ca/citations?user=ciH2ROgAAAAJ", "orcid": ";0000-0002-6740-3600", "linkedin": ";erick-delage-2105361/?originalSubdomain=ca", "or_profile": "abderrahim.fathan@gmail.com;~Erick_Delage2", "aff": ";Computer Science Department", "aff_domain": ";cs.stanford.edu", "position": ";Researcher", "bibtex": "", "github": "", "project": "", "reviewers": "AnonReviewer3;AnonReviewer4;AnonReviewer1;AnonReviewer2", "site": "https://openreview.net/forum?id=REKvFYIgwz9", "pdf_size": 0, "rating": "2;4;4;5", "confidence": "4;4;4;4", "wc_review": "249;302;257;206", "wc_reply_reviewers": "0;0;0;0", "wc_reply_authors": "0;0;0;0", "reply_reviewers": "0;0;0;0", "reply_authors": "0;0;0;0", "rating_avg": [ 3.75, 1.0897247358851685 ], "confidence_avg": [ 4.0, 0.0 ], "wc_review_avg": [ 253.5, 34.0624426605022 ], "wc_reply_reviewers_avg": [ 0, 0 ], "wc_reply_authors_avg": [ 0, 0 ], "reply_reviewers_avg": [ 0, 0 ], "reply_authors_avg": [ 0, 0 ], "replies_avg": [ 5, 0 ], "authors#_avg": [ 2, 0 ], "corr_rating_confidence": 0.0, "gs_citation": 24, "gs_cited_by_link": "https://scholar.google.com/scholar?cites=10228035567037895853&as_sdt=5,33&sciodt=0,33&hl=en", "gs_version_total": 11, "aff_unique_index": "0", "aff_unique_norm": "Computer Science Department", "aff_unique_dep": "Computer Science", "aff_unique_url": "", "aff_unique_abbr": "" }, { "title": "On the mapping between Hopfield networks and Restricted Boltzmann Machines", "status": "Oral", "track": "main", "site": "https://iclr.cc/virtual/2021/poster/3045", "id": "RGJbergVIoO", "poster": "", "openreview": "https://openreview.net/forum?id=RGJbergVIoO", "slides": "https://iclr.cc/virtual/2021/poster/3045", "video": "https://iclr.cc/virtual/2021/poster/3045", "author_site": "Matthew Smart, Anton Zilman", "tldr": "", "abstract": "Hopfield networks (HNs) and Restricted Boltzmann Machines (RBMs) are two important models at the interface of statistical physics, machine learning, and neuroscience. Recently, there has been interest in the relationship between HNs and RBMs, due to their similarity under the statistical mechanics formalism. An exact mapping between HNs and RBMs has been previously noted for the special case of orthogonal (\u201cuncorrelated\u201d) encoded patterns. We present here an exact mapping in the case of correlated pattern HNs, which are more broadly applicable to existing datasets. Specifically, we show that any HN with $N$ binary variables and $p